Commit Graph

14764 Commits

Author SHA1 Message Date
Fangrui Song eb37330ac7 [ELF] Change mipsGotIndex to uint32_t
This does not decrease sizeof(InputSection) (important for memory usage) on
ELF64 by itself but allows we to add another uint32_t.
2021-12-21 20:19:51 -08:00
Fangrui Song 48161b7490 [ELF] --gc-sections: Work around SHT_PROGBITS .init_array
Older Go cmd/link used SHT_PROGBITS for .init_array .
Work around the lack of https://golang.org/cl/373734 for a while.
It does not generate .fini_array or .preinit_array
2021-12-21 10:44:29 -08:00
Fangrui Song 6683099a0d [ELF] Optimize RelocationSection<ELFT>::writeTo
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured

```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```

This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.

* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.

With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D115993
2021-12-21 09:43:44 -08:00
Fangrui Song c2f2bb066b [ELF] Remove unneeded SectionBase::repl indirection
sec->repl equals sec after rL371216.
2021-12-21 00:39:16 -08:00
Esme-Yi b66328701a [PowerPC][llvm-objdump] enable --symbolize-operands for PowerPC ELF/XCOFF.
Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D114492
2021-12-21 04:17:57 +00:00
Xu Mingjie cb63ad8d1d [LTO] Fix incomplete optimization remarks for dead functions when PreOptModuleHook or PostInternalizeModuleHook is defined
In 20a895c4be, we introduce `finalizeOptimizationRemarks()` to make sure we flush the diagnostic remarks file in case the linker doesn't call the global destructors before exiting.
In https://reviews.llvm.org/D73597, we add optimization remarks for removed functions for debugging or for detecting dead code.
But there is a case, if PreOptModuleHook or PostInternalizeModuleHook is defined (e.g. `--plugin-opt=emit-llvm` is passed to linker), we do not call `finalizeOptimizationRemarks()`, therefore we will get an incomplete optimization remarks file.
This patch make sure we flush the diagnostic remarks file when PreOptModuleHook or PostInternalizeModuleHook is defined.

Reviewed By: tejohnson, MaskRay

Differential Revision: https://reviews.llvm.org/D115417
2021-12-20 18:16:09 -08:00
Fangrui Song 8825ffdbde [ELF] --time-trace: Trace "Write sections"
writeSections is typically a bottleneck.
This was used to track down the following bottlenecks:

* Output section .rela.dyn (9115d75117)
* Output section .debug_str (3aae04c744)
* posix_fallocate is slow for Linux tmpfs: D115957

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D115984
2021-12-20 10:51:24 -08:00
Fangrui Song bee5bc9075 [ELF] #undef PPC to support GCC powerpc32 build
GCC's powerpc32 port predefines `PPC` as a macro in GNU C++ mode in some configurations (Linux,
FreeBSD, and some others. See `builtin_define_std ("PPC"); ` in gcc/config/rs6000).

```
% powerpc-linux-gnu-g++ -E -dM -xc++ /dev/null -o - | grep -w PPC
#define PPC 1
```

Fixes https://bugs.gentoo.org/829599

Reviewed By: thesamesam

Differential Revision: https://reviews.llvm.org/D116017
2021-12-20 10:12:51 -08:00
Fangrui Song 3aae04c744 [ELF] Parallelize MergeNoTailSection::writeTo
With this patch, writing .debug_str is significantly for a program with
1.5G .debug_str:

* .debug_info 1.22s
* .debug_str 2.57s decreases to 0.66
2021-12-17 23:30:42 -08:00
Fangrui Song 552d84414d [ELF] Use SmallVector for many SyntheticSections. NFC
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
2021-12-17 19:22:16 -08:00
Vy Nguyen 4f90e67e2f [lld-macho] Handle $ld$hide[$os] symbols.
PR/52708

Differential Revision: https://reviews.llvm.org/D115775
2021-12-17 16:40:07 -05:00
Nico Weber c4b45eeb44 [lld/mac] Don't lose "weak ref" bit when doing LTO
Fixes #52778.

Probably fixes Chromium crashing on startup on macOS 10.15 (and older) systems
when building with LTO, but I haven't verified that yet.

Differential Revision: https://reviews.llvm.org/D115949
2021-12-17 15:26:35 -05:00
Nico Weber a3096ca9b4 [lld/test] List one test dep per line
Matches llvm's and clang's /test/CMakeLists.txt, makes it easier to
see in diffs which deps get added, and makes it easier to see if
a given dependency is present or not.

No behavior change.
2021-12-17 09:51:01 -05:00
Fangrui Song aa27bab5a1 [ELF] InputSection::writeTo: reorder type checks and add LLVM_UNLIKELY 2021-12-16 23:42:50 -08:00
Fangrui Song 054cdb34a2 [ELF] Optimize MergeInputSection::splitNonStrings. NFC 2021-12-16 21:23:00 -08:00
Fangrui Song 4c98d08841 [ELF] Speed up MergeInputSection::split*. NFC 2021-12-16 21:17:02 -08:00
Fangrui Song bf4fa3036a [ELF] Use SmallVector for MergeInputSection::pieces. NFC
sizeof(pieces) decreases from 24 to 16 on ELF64.
One BumpPtrAllocator can store more MergeInputSections.
The lld executable becomes smaller.
2021-12-16 21:07:39 -08:00
Fangrui Song 93558e575e [ELF] Internalize createMergeSynthetic. NFC
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
2021-12-16 20:50:06 -08:00
Daniel Kiss 2b4e6052b3 [lld] Add cet-report and bti-report flags
Implement cet-report as supported in binutils.
bti-report has the same behaviour for AArch64-BTI.

Fixes https://github.com/llvm/llvm-project/issues/44828

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D113901
2021-12-16 16:26:26 +01:00
Fangrui Song 8617996ac1 [ELF] maybeReportUndefined: move sym.isUndefined() check to the caller. NFC
Avoid a function call in the majority of cases.
2021-12-16 00:27:19 -08:00
Fangrui Song 101407bfaa [ELF] parseSymbolVersion: remove unussed pos == 0 check 2021-12-15 23:59:55 -08:00
Fangrui Song 60f5614931 [ELF] SharedFile::parse: cache symbols size for a loop. NFC 2021-12-15 22:45:28 -08:00
Fangrui Song 7b265e9791 [ELF] Move -l -L canonical and --library-path --library aliases
Everyone uses -l -L instead of the long option counterparts.
Make help messages attach to -L -l and (--reproduce) use them for response.txt
command line options.
2021-12-15 21:49:53 -08:00
Fangrui Song 159b948e43 [ELF] ObjFile<ELFT>::initializeSymbols: don't call Allocate when firstGlobal==0
Calling `Allocate` with 0 size (when .symtab is absent, e.g.
`invalid/mips-invalid-options-descriptor.test`) may return a nullptr, which will
crash with -fsanitize=null (the underlying `Allocate` function is
LLVM_ATTRIBUTE_RETURNS_NONNULL).
2021-12-15 18:21:48 -08:00
Fangrui Song b0211de5e3 [ELF] Change Symbol::verdefIndex from uint32_t to uint16_t
The SHT_GNU_version index is 16-bit, so the 32-bit value is a waste.
Technically non-default version index 0x7fff uses version index 0xffff,
but it is impossible in practice.

This change decreases sizeof(SymbolUnion) from 80 to 72 on ELF64 platforms.
Memory usage decreases by 1% when linking a large executable.
2021-12-15 17:59:30 -08:00
Fangrui Song 50187d2dd5 [ELF] Speed up ObjFile<ELFT>::createInputSection
* Group ".note" section name checks
* Move shouldMerge check to the caller
2021-12-15 17:15:32 -08:00
Vincent Lee d17b092fe6 [lld-macho] Make writing map file asynchronous
For large applications that write to map files, writing map files can take quite
a bit of time. Sorting the biggest contributors to link times, writing map files
ranks in at 2nd place, with load input files being the biggest contributor of
link times. Avoiding writing map files on the critical path (and having its own
thread) saves ~2-3 seconds when linking chromium framework on a 16-Core
Intel Xeon W.

```
           base            diff            difference (95% CI)
sys_time   1.617 ± 0.034   1.657 ± 0.026   [  +1.5% ..   +3.5%]
user_time  28.536 ± 0.245  28.609 ± 0.180  [  -0.1% ..   +0.7%]
wall_time  23.833 ± 0.271  21.684 ± 0.194  [  -9.5% ..   -8.5%]
samples    31              24
```

Reviewed By: #lld-macho, oontvoo, int3

Differential Revision: https://reviews.llvm.org/D115416
2021-12-15 16:37:04 -08:00
Fangrui Song 68009b78f2 [ELF] Symbol::replace: remove dead code 2021-12-15 16:08:18 -08:00
Fangrui Song b5805b7847 [ELF] ObjFile<ELFT>::initializeSymbols: avoid StringRefZ from undefined symbols 2021-12-15 15:30:18 -08:00
Fangrui Song 2bdad16303 [ELF] SymbolTable::insert: keep @@ in the name
* Avoid the name truncation quirk in SymbolTable::insert: the truncated name will be replaced by @@ again.
* Allow foo and foo@@v1 in different files to be diagnosed as duplicate definition error (GNU ld behavior)
* Avoid potential redundant strlen on symbol name due to StringRefZ in ObjFile<ELFT>::initializeSymbols
2021-12-15 15:19:35 -08:00
Fangrui Song a8d6d2614b [ELF] Replace make<Defined> with makeDefined. NFC
This removes SpecificAlloc<Defined> and makes my lld executable 1.5k smaller.
This drops the small memory waste due to the separate BumpPtrAllocator.
2021-12-15 13:15:03 -08:00
Fangrui Song a596a5fc12 [ELF] ObjFile<ELFT>::initializeSymbols: Simplify this->symbols[i]. NFC 2021-12-15 13:02:38 -08:00
Fangrui Song 509153f1e7 [ELF] ObjFile<ELFT>::initializeSymbols: Batch allocate local symbols
and detangle local/global symbol initialization.

My x86-64 lld executable is 8k smaller due to the removal of SpecificAlloc<Undefined>.
2021-12-15 12:54:39 -08:00
Fangrui Song 3534d26cc1 [ELF] Slightly speed up -z keep-text-section-prefix 2021-12-15 10:20:11 -08:00
Fangrui Song 7c0881a38f [ELF] --gc-sections: Change startwith(".jcr") to exact match
GNU ld's internal linker script keeps `.jcr`, but not other sections
starting with `.jcr`.
2021-12-15 01:27:08 -08:00
Fangrui Song 21dbfd4300 [ELF] --gc-sections: Change startwith(".init") (and ".fini") to exact match
GNU ld's internal linker script keeps `.init`, but not other sections starting
with `.init`. .fini is similar.
2021-12-15 01:16:26 -08:00
Fangrui Song 7a54ae9c1d [ELF] Change objectFiles to ELFFileBase *
This can sometimes avoid `cast<ObjFile<...>>`.

I intentionally do not touch postScanRelocations to wait for its stabilization.
2021-12-15 00:37:10 -08:00
Fangrui Song 3deb82cd07 [ELF] Adjust getOutputSectionName prefix order
Sorting the prefixes by decreasing frequency can improve performance.
.gcc_except_table is relatively frequent, so move it ahead.
.ctors and .dtors mostly disappear and should be the last.
2021-12-15 00:18:58 -08:00
Fangrui Song 5816f1855c [ELF] Slightly speed up getOutputSectionName. NFC 2021-12-14 23:43:00 -08:00
Fangrui Song 89661a0e89 [ELF] Remove dead code from SymbolTable::find 2021-12-14 22:41:52 -08:00
Fangrui Song c720b16aa5 [ELF] Use SmallVector for SharedFile and simplify parseVerdefs
SHT_GNU_verdef is typically small, so it's unnecessary to reserve the vector.

While here, fix a hypothetical issue when SHT_GNU_verdef has non-increasing
version indexes, which don't happen with GNU ld, gold, ld.lld's output.

My x86-64 lld executable is 256 bytes smaller.
2021-12-14 21:11:45 -08:00
Fangrui Song 1ff1d50d9f [ELF] Make InputFile smaller
sizeof(ObjFile<ELF64LE>) is decreased from 344 to 272 on an ELF64 system.
In a large link with 30000 ObjFiles, this may be 2+MiB saving.

Change std::vector members to SmallVector, and std::string members to
SmallString<0> (these members typically don't benefit from small string optimization).
On Linux x86-64 the lld executable is ~6k smaller.
2021-12-14 20:55:32 -08:00
Fangrui Song cf783be8d7 Reland D114783/D115603 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)

The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-14 16:28:41 -08:00
Fangrui Song 04cf411c94 [ELF][test] Test unreferenced non-preemptible ifunc
Add missing coverage exposed by D114783.
There should be no associated IRELATIVE, otherwise (a) glibc ld.so may
crash (b) it wastes space (c) unused IPLT causes confusion.
2021-12-14 16:25:50 -08:00
Fangrui Song ea15b862d7 Revert D114783 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
May cause a failure for non-preemptible `bcmp` in a glibc -static link.
2021-12-14 14:33:50 -08:00
Stephan T. Lavavej 8bd106a891 [NFC] Fix typos in release notes.
Reviewed By: ldionne, Mordante, MaskRay

Differential Revision: https://reviews.llvm.org/D115685
2021-12-14 14:19:42 -08:00
Fangrui Song 6a44013b0e [ELF] -Map: Print symbols which needs canonical PLT entry/copy relocation just once
If a copy related symbol (say `copy`) is referenced in two .o
files, this change removes a duplicated line from the -Map output:

```
          202470           202470        1     1 .bss.rel.ro
          202470           202470        1     1         <internal>:(.bss.rel.ro)
          202470           202470        1     1                 copy
removed   202470           202470        1     1                 copy
```

Differential Revision: https://reviews.llvm.org/D115697
2021-12-14 10:31:06 -08:00
Fangrui Song b79686c6dc [ELF] Remove needsPltAddr in favor of needsCopy
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D115603
2021-12-14 09:52:43 -08:00
Fangrui Song e7a95b0674 Reland [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)

The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-13 20:11:24 -08:00
Fangrui Song d1014d9e6d [ELF] Improve test for copy relocations on aliases 2021-12-13 20:04:24 -08:00