The new `lazy` state is the inverse of the previous `LazyObjFile::extracted`.
There are many advantages:
* previously when a LazyObjFile was extracted, a new ObjFile/BitcodeFile was created; now the file is reused, just with `lazy` cleared
* avoid the confusing transfer of `symbols` from LazyObjFile to the new file
* the `incompatible file:` diagnostic is unified with `is incompatible with`
* simpler code, smaller executable (6200+ bytes smaller on x86-64)
* make eager parsing feasible (for parallel section/symbol table initialization)
Currently the singleton `config` is assigned by `config = make<Configuration>()`
and (if `canExitEarly` is false) destroyed by `lld::freeArena`.
`make<Configuration>` allocates a stab with `malloc(4096)`. This both wastes
memory and bloats the executable (every type instantiates `BumpPtrAllocator`
which costs more than 1KiB code on x86-64).
(No need to worry about `clang::no_destroy`. Regular invocations (`canExitEarly`
is true) call `_Exit` via llvm::sys::Process::ExitNoCleanup.)
Reviewed By: lichray
Differential Revision: https://reviews.llvm.org/D116143
Older Go cmd/link used SHT_PROGBITS for .init_array .
Work around the lack of https://golang.org/cl/373734 for a while.
It does not generate .fini_array or .preinit_array
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
writeSections is typically a bottleneck.
This was used to track down the following bottlenecks:
* Output section .rela.dyn (9115d75117)
* Output section .debug_str (3aae04c744)
* posix_fallocate is slow for Linux tmpfs: D115957
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115984
GCC's powerpc32 port predefines `PPC` as a macro in GNU C++ mode in some configurations (Linux,
FreeBSD, and some others. See `builtin_define_std ("PPC"); ` in gcc/config/rs6000).
```
% powerpc-linux-gnu-g++ -E -dM -xc++ /dev/null -o - | grep -w PPC
#define PPC 1
```
Fixes https://bugs.gentoo.org/829599
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D116017
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
Everyone uses -l -L instead of the long option counterparts.
Make help messages attach to -L -l and (--reproduce) use them for response.txt
command line options.
Calling `Allocate` with 0 size (when .symtab is absent, e.g.
`invalid/mips-invalid-options-descriptor.test`) may return a nullptr, which will
crash with -fsanitize=null (the underlying `Allocate` function is
LLVM_ATTRIBUTE_RETURNS_NONNULL).
The SHT_GNU_version index is 16-bit, so the 32-bit value is a waste.
Technically non-default version index 0x7fff uses version index 0xffff,
but it is impossible in practice.
This change decreases sizeof(SymbolUnion) from 80 to 72 on ELF64 platforms.
Memory usage decreases by 1% when linking a large executable.
* Avoid the name truncation quirk in SymbolTable::insert: the truncated name will be replaced by @@ again.
* Allow foo and foo@@v1 in different files to be diagnosed as duplicate definition error (GNU ld behavior)
* Avoid potential redundant strlen on symbol name due to StringRefZ in ObjFile<ELFT>::initializeSymbols
Sorting the prefixes by decreasing frequency can improve performance.
.gcc_except_table is relatively frequent, so move it ahead.
.ctors and .dtors mostly disappear and should be the last.
SHT_GNU_verdef is typically small, so it's unnecessary to reserve the vector.
While here, fix a hypothetical issue when SHT_GNU_verdef has non-increasing
version indexes, which don't happen with GNU ld, gold, ld.lld's output.
My x86-64 lld executable is 256 bytes smaller.
sizeof(ObjFile<ELF64LE>) is decreased from 344 to 272 on an ELF64 system.
In a large link with 30000 ObjFiles, this may be 2+MiB saving.
Change std::vector members to SmallVector, and std::string members to
SmallString<0> (these members typically don't benefit from small string optimization).
On Linux x86-64 the lld executable is ~6k smaller.
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
If a copy related symbol (say `copy`) is referenced in two .o
files, this change removes a duplicated line from the -Map output:
```
202470 202470 1 1 .bss.rel.ro
202470 202470 1 1 <internal>:(.bss.rel.ro)
202470 202470 1 1 copy
removed 202470 202470 1 1 copy
```
Differential Revision: https://reviews.llvm.org/D115697
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
(Fixed an issue about GOT on a copy relocated alias.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
This reverts commit fc33861d48.
`replaceWithDefined` should copy needsGot, otherwise an alias for a copy
relocated symbol may not have GOT entry if its needsGot was originally true.
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make parallel relocation scanning possible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
* Make GOT deduplication feasible
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).
Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.