Commit Graph

689 Commits

Author SHA1 Message Date
Fangrui Song 196aedb843 [ELF] Change vector<InputSection *> to SmallVector. NFC
My x86-64 lld executable is 8KiB smaller.
2022-02-01 00:14:21 -08:00
Fangrui Song 73fd7d2304 [ELF] Change splitSections to objectFiles based parallelForEach. NFC
The work is more balanced.
2022-01-30 13:34:27 -08:00
Fangrui Song 988a03c585 [ELF] Add some Mips*Section to InStruct and change make<Mips*Section> to std::make_unique
Similar to D116143. My x86-64 lld executable is 20+KiB smaller.
2022-01-29 23:55:29 -08:00
Fangrui Song 469c4124ab [ELF] --gdb-index: switch to SmallVector. NFC 2022-01-29 15:24:56 -08:00
Fangrui Song da0e5b885b [ELF] Refactor -z combreloc
* `RelocationBaseSection::addReloc` increases `numRelativeRelocs`, which
  duplicates the work done by RelocationSection<ELFT>::writeTo.
* --pack-dyn-relocs=android has inappropropriate DT_RELACOUNT.
  AndroidPackedRelocationSection does not necessarily place relative relocations
  in the front and DT_RELACOUNT might cause semantics error (though our
  implementation doesn't and Android bionic doesn't use DT_RELACOUNT anyway.)

Move `llvm::partition` to a new function `partitionRels` and compute
`numRelativeRelocs` there. Now `RelocationBaseSection::addReloc` is trivial and
can be moved to the header to enable inlining.

The rest of DynamicReloc and `-z combreloc` handling is moved to the
non-template `RelocationBaseSection::computeRels` to decrease code size. My
x86-64 lld executable is 44+KiB smaller.

While here, rename `sort` to `combreloc`.
2022-01-29 14:45:58 -08:00
Fangrui Song 3704abaa16 [ELF] --gdb-index: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC 2022-01-25 23:53:23 -08:00
Fangrui Song 571d6a7120 [ELF] Optimize .relr.dyn to not grow vector<uint64_t>. NFC 2022-01-25 23:33:40 -08:00
Fangrui Song 9fac78d0e1 [ELF] Simplify and optimize .relr.dyn NFC 2022-01-25 22:50:03 -08:00
Alexandre Ganea 83d59e05b2 Re-land [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

The previous land f860fe3622 caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by 22ee510dac.

Differential Revision: https://reviews.llvm.org/D108850
2022-01-20 14:53:26 -05:00
Fangrui Song ac0986f880 [ELF] Change std::vector<InputSectionBase *> to SmallVector
There is no remaining std::vector<InputSectionBase> now. My x86-64 lld
executable is 2KiB small.
2022-01-17 10:25:07 -08:00
Fangrui Song f855074ed1 [ELF] GnuHashTableSection: replace stable_sort with 2-key sort. NFC
strTabOffset stabilizes llvm::sort. My x86-64 executable is 5+KiB smaller.
2022-01-17 00:34:42 -08:00
Fangrui Song 9c4292a59d [ELF] Remove unneeded SyntheticSection memset(*, 0, *)
After the D33630 fallout was properly fixed by a4c5db30be.

Tested by D37462/D44986 tests, the new --no-rosegment test in build-id.s, and a few --rosegment/--no-rosegment programs.
2022-01-16 22:51:57 -08:00
Fangrui Song a4c5db30be [ELF] Remove redundant fillTrap and memset(*, 0, *). NFC
The new tests in build-id.s would catch problems if we made a mistake here.
2022-01-16 22:37:31 -08:00
Fangrui Song aad90763d9 [ELF] RelocationSection<ELFT>::writeTo: use unstable partition 2022-01-16 21:44:19 -08:00
Fangrui Song e205445434 [ELF] StringTableSection: Use DenseMap<CachedHashStringRef> to avoid redundant hash computation
5~6% speedup when linking clang and chrome.
2022-01-16 21:02:05 -08:00
Alexandre Ganea e6b153947d Revert [LLD] Remove global state in lldCommon
It seems to be causing issues on https://lab.llvm.org/buildbot/#/builders/123/builds/8383
2022-01-16 11:03:06 -05:00
Alexandre Ganea f860fe3622 [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

Differential Revision: https://reviews.llvm.org/D108850
2022-01-16 08:57:57 -05:00
Fangrui Song 3736d0854a [ELF] Optimize -z combreloc
Sorting dynamic relocations is a bottleneck. Simplifying the comparator improves
performance. Linking clang is 4~5% faster with --threads=8.

This change may shuffle R_MIPS_REL32 for Mips and is a NFC for non-Mips.
2022-01-15 22:33:51 -08:00
Fangrui Song 43d927984c [ELF] Refactor how .gnu.hash and .hash are discarded
Switch to the D114180 approach which is simpler and allows gnuHashTab/hashTab to
switch to unique_ptr.
2022-01-12 12:47:07 -08:00
Fangrui Song bf9c8636f2 [ELF] Support discarding .relr.dyn
db08df0570 does not work because part.relrDyn is
a unique_ptr and `reset` destroys the object which may still be referenced.

This commit uses the D114180 approach. Also improve the test to check that there
is no R_X86_64_RELATIVE.
2022-01-12 11:55:22 -08:00
Fangrui Song 7f1955dc96 [ELF] Support mixed TLSDESC and TLS GD
We only support both TLSDESC and TLS GD for x86 so this is an x86-specific
problem. If both are used, only one R_X86_64_TLSDESC is produced and TLS GD
accesses will incorrectly reference R_X86_64_TLSDESC. Fix this by introducing
SymbolAux::tlsDescIdx.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116900
2022-01-10 10:03:21 -08:00
Fangrui Song 5d3bd7f360 [ELF] Move gotIndex/pltIndex/globalDynIndex to SymbolAux
to decrease sizeof(SymbolUnion) by 8 on ELF64 platforms.

Symbols needing such information are typically 1% or fewer (5134 out of 560520
when linking clang, 19898 out of 5550705 when linking chrome). Storing them
elsewhere can decrease memory usage and symbol initialization time.
There is a ~0.8% saving on max RSS when linking a large program.

Future direction:

* Move some of dynsymIndex/verdefIndex/versionId to SymbolAux
* Support mixed TLSDESC and TLS GD without increasing sizeof(SymbolUnion)

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D116281
2022-01-09 13:43:27 -08:00
Fangrui Song cb203f3f92 [ELF] Change InStruct/Partition pointers to unique_ptr
and remove associated make<XXX> calls.
gnuHash and sysvHash are unchanged, otherwise LinkerScript::discard would
destroy the objects which may be referenced by input section descriptions.

My x86-64 lld executable is 121+KiB smaller.
2021-12-27 18:15:23 -08:00
Fangrui Song 049cd480a0 [ELF] Use const reference. NFC 2021-12-27 17:05:48 -08:00
Fangrui Song b8a4780032 [ELF] Simplify and optimize SymbolTableSection<ELFT>::writeTo 2021-12-27 15:16:14 -08:00
Fangrui Song aabe901d57 [ELF] Remove one redundant computeBinding
This does resolve the redundancy in includeInDynsym().
2021-12-25 23:59:27 -08:00
Fangrui Song 2c8ebab32e [ELF] sortSymTabSymbols: change vector to SmallVector
This function may take ~1% time. SmallVector<SymbolTableEntry, 0> is smaller (16 bytes
instead of 24) and more efficient.
2021-12-25 23:16:27 -08:00
Fangrui Song e1b6b5be46 [ELF] Avoid referencing SectionBase::repl after ICF
It is fairly easy to forget SectionBase::repl after ICF.
Let ICF rewrite a Defined symbol's `section` field to avoid references to
SectionBase::repl in subsequent passes. This slightly improves the --icf=none
performance due to less indirection (maybe for --icf={safe,all} as well if most
symbols are Defined).

With this change, there is only one reference to `repl` (--gdb-index D89751).
We can undo f4fb5fd752 (`Move Repl to SectionBase.`)
but move `repl` to `InputSection` instead.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D116093
2021-12-24 12:09:48 -08:00
Fangrui Song bf45624ba0 [ELF][PPC32] Support .got2 in an output section description
I added `PPC32Got2Section` D62464 to support .got2 but did not implement .got2
in another output section.

PR52799 has a linker script placing .got2 in .rodata, which causes a null
pointer dereference because a MergeSyntheticSection's file is nullptr.
Add the support.
2021-12-23 11:32:44 -08:00
Fangrui Song ad26b0b233 Revert "[ELF] Make Partition/InStruct members unique_ptr and remove associate make<XXX>"
This reverts commit e48b1c8a27.
This reverts commit d019de23a1.

The changes caused memory leaks (non-final classes cannot use unique_ptr).
2021-12-22 23:55:11 -08:00
Fangrui Song ba6973c89b [ELF] Change nonnull pointer parameters to references 2021-12-22 22:02:29 -08:00
Fangrui Song e48b1c8a27 [ELF] Make Partition members unique_ptr and remove associate make<XXX>
See D116143 for benefits. My lld executable (x86-64) is 103+KiB smaller.
2021-12-22 21:34:26 -08:00
Fangrui Song d019de23a1 [ELF] Make InStruct members unique_ptr and remove associate make<XXX>
See D116143 for benefits. My lld executable (x86-64) is 24+KiB smaller.
2021-12-22 21:11:26 -08:00
Fangrui Song 5c75cc51b3 [ELF] Change nonnull pointer parameters to references. NFC 2021-12-22 21:09:57 -08:00
Fangrui Song baa3eb0dd9 [ELF] Change some non-null pointer parameters to references. NFC 2021-12-22 20:51:11 -08:00
Fangrui Song eb37330ac7 [ELF] Change mipsGotIndex to uint32_t
This does not decrease sizeof(InputSection) (important for memory usage) on
ELF64 by itself but allows we to add another uint32_t.
2021-12-21 20:19:51 -08:00
Fangrui Song 6683099a0d [ELF] Optimize RelocationSection<ELFT>::writeTo
When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured

```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```

This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.

* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.

With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D115993
2021-12-21 09:43:44 -08:00
Fangrui Song 3aae04c744 [ELF] Parallelize MergeNoTailSection::writeTo
With this patch, writing .debug_str is significantly for a program with
1.5G .debug_str:

* .debug_info 1.22s
* .debug_str 2.57s decreases to 0.66
2021-12-17 23:30:42 -08:00
Fangrui Song 552d84414d [ELF] Use SmallVector for many SyntheticSections. NFC
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
2021-12-17 19:22:16 -08:00
Fangrui Song 93558e575e [ELF] Internalize createMergeSynthetic. NFC
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
2021-12-16 20:50:06 -08:00
Fangrui Song a8d6d2614b [ELF] Replace make<Defined> with makeDefined. NFC
This removes SpecificAlloc<Defined> and makes my lld executable 1.5k smaller.
This drops the small memory waste due to the separate BumpPtrAllocator.
2021-12-15 13:15:03 -08:00
Fangrui Song cf783be8d7 Reland D114783/D115603 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)

The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:

* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice

Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.

For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.

Reviewed By: peter.smith, ikudrin

Differential Revision: https://reviews.llvm.org/D114783
2021-12-14 16:28:41 -08:00
Fangrui Song ea15b862d7 Revert D114783 [ELF] Split scanRelocations into scanRelocations/postScanRelocations
May cause a failure for non-preemptible `bcmp` in a glibc -static link.
2021-12-14 14:33:50 -08:00
Fangrui Song b79686c6dc [ELF] Remove needsPltAddr in favor of needsCopy
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D115603
2021-12-14 09:52:43 -08:00
Fangrui Song 9115d75117 [ELF] Use parallelSort for .rela.dyn
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).

Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.
2021-12-12 20:53:06 -08:00
Fangrui Song 09401dfcf1 [ELF] Rename fetch to extract
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)

Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
2021-11-26 10:58:50 -08:00
Fangrui Song 7051aeef7a [ELF] Rename BaseCommand to SectionCommand. NFC
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).

Also, improve naming of some BaseCommand variables (base -> cmd).
2021-11-25 20:24:23 -08:00
Fangrui Song 6188fd4957 [ELF] Rename OutputSection::sectionCommands to commands. NFC
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.

The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
2021-11-25 16:47:07 -08:00
Fangrui Song 5ca54c6686 [ELF] Simplify GnuHashSection::write. NFC 2021-11-25 14:23:25 -08:00
Fangrui Song 55c14d6dbf [ELF] Simplify DynamicSection content computation. NFC
The new code computes the content twice, but avoides the tricky
std::function<uint64_t()>. Removed 13KiB code in a Release build.
2021-11-25 14:12:34 -08:00