Commit Graph

6662 Commits

Author SHA1 Message Date
Fangrui Song 3704abaa16 [ELF] --gdb-index: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC 2022-01-25 23:53:23 -08:00
Fangrui Song 571d6a7120 [ELF] Optimize .relr.dyn to not grow vector<uint64_t>. NFC 2022-01-25 23:33:40 -08:00
Fangrui Song 9fac78d0e1 [ELF] Simplify and optimize .relr.dyn NFC 2022-01-25 22:50:03 -08:00
Fangrui Song 2a80c3dbe1 [ELF] Clarify that Z_BEST_SPEED==1 in a comment. NFC 2022-01-25 22:40:53 -08:00
Fangrui Song 07bd467643 [ELF] --build-id: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC
We can't use C++20 make_unique_for_overwrite yet.
2022-01-25 22:39:43 -08:00
Fangrui Song 7438dbe078 [ELF] Cast size to size_t. NFC
To fix

../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'unsigned long long')

on macOS arm64.
2022-01-25 22:38:24 -08:00
Fangrui Song 223f9dea3d [ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC
And mention that it is zero-initialized. I do not notice a speed-up if
changed to be uninitialized by forcing the zero filler in writeTo.
2022-01-25 22:15:44 -08:00
Fangrui Song 4cdc441690 [ELF] Parallelize --compress-debug-sections=zlib
When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed
debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and
in a --threads=8 link "Compress debug sections" takes ~70% time.

This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly.

DEFLATE blocks are a bit sequence. We need to ensure every shard starts
at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards
but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can
be used as well, but Z_FULL_FLUSH clears the hash table which just
wastes time.)

The last block requires the BFINAL flag. We call deflate with Z_FINISH
to set the flag as well as flush the output to a byte boundary. Under
the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a
non-compressed block (called stored block in zlib). RFC1951 says "Any
bits of input up to the next byte boundary are ignored."

In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total
speed is 2.54x. Because the hash table for one shard is not shared with the next
shard, the output is slightly larger. Better compression ratio can be achieved
by preloading the window size from the previous shard as dictionary
(`deflateSetDictionary`), but that is overkill.

```
# 1MiB shards
% bloaty clang.new -- clang.old
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.3%  +129Ki  [ = ]       0    .debug_str
  +0.1%  +105Ki  [ = ]       0    .debug_info
  +0.3%  +101Ki  [ = ]       0    .debug_line
  +0.2% +2.66Ki  [ = ]       0    .debug_abbrev
  +0.0% +1.19Ki  [ = ]       0    .debug_ranges
  +0.1%  +341Ki  [ = ]       0    TOTAL

# 2MiB shards
% bloaty clang.new -- clang.old
    FILE SIZE        VM SIZE
 --------------  --------------
  +0.2% +74.2Ki  [ = ]       0    .debug_line
  +0.1% +72.3Ki  [ = ]       0    .debug_str
  +0.0% +69.9Ki  [ = ]       0    .debug_info
  +0.1%    +976  [ = ]       0    .debug_abbrev
  +0.0%    +882  [ = ]       0    .debug_ranges
  +0.0%  +218Ki  [ = ]       0    TOTAL
```

Bonus in not using zlib::compress

* we can compress a debug section larger than 4GiB
* peak memory usage is lower because for most shards the output size is less
  than 50% input size (all less than 55% for a large binary I tested, but
  decreasing the initial output size does not decrease memory usage)

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D117853
2022-01-25 10:29:04 -08:00
Fangrui Song c03fdd3403 [ELF] Fix the branch range computation when reusing a thunk
Notation: dst is `t->getThunkTargetSym()->getVA()`

On AArch64, when `src-0x8000000-r_addend <= dst < src-0x8000000`, the condition
`target->inBranchRange(rel.type, src, rel.sym->getVA(rel.addend))` may
incorrectly consider a thunk reusable.
`rel.addend = -getPCBias(rel.type)` resets the addend to 0 for AArch64/PPC
and the zero addend is used by `rel.sym->getVA(rel.addend)` to check
out-of-range relocations.

See the test for a case this computation is wrong:
`error: a.o:(.text_high+0x4): relocation R_AARCH64_JUMP26 out of range: -134217732 is not in [-134217728, 134217727]`
I have seen a real world case with r_addend=19960.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D117734
2022-01-24 09:03:21 -08:00
serge-sans-paille 5f290c090a Move STLFunctionalExtras out of STLExtras
Only using that change in StringRef already decreases the number of
preoprocessed lines from 7837621 to 7776151 for LLVMSupport

Perhaps more interestingly, it shows that many files were relying on the
inclusion of StringRef.h to have the declaration from STLExtras.h. This
patch tries hard to patch relevant part of llvm-project impacted by this
hidden dependency removal.

Potential impact:
- "llvm/ADT/StringRef.h" no longer includes <memory>,
  "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h"

Related Discourse thread:
https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
2022-01-24 14:13:21 +01:00
Alexandre Ganea 83d59e05b2 Re-land [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

The previous land f860fe3622 caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by 22ee510dac.

Differential Revision: https://reviews.llvm.org/D108850
2022-01-20 14:53:26 -05:00
Fangrui Song a7a4115bf3 [ELF] Replace .zdebug string comparison with SHF_COMPRESSED check. NFC 2022-01-19 22:33:32 -08:00
Fangrui Song 03909c4400 [ELF] Remove StringRefZ
StringRefZ does not improve performance. Non-local symbols always have eagerly
computed nameSize. Most local symbols's lengths will be updated in either:

* shouldKeepInSymtab
* SymbolTableBaseSection::addSymbol

Its benefit is offsetted by strlen in every call site (sums up to 5KiB code in a
release x86-64 build), so using StringRefZ may be slower.

In a -s link (uncommon) there is minor speedup, like ~0.3% for clang and chrome.

Reviewed By: alexander-shaposhnikov

Differential Revision: https://reviews.llvm.org/D117644
2022-01-19 20:09:41 -08:00
Fangrui Song 5bd38a2826 [ELF] Fix split-stack caller with hidden non-split-stack callee
Fix a regression after aabe901d57 (`[ELF] Remove
one redundant computeBinding`): isLocal() does not indicate that the symbol is
originally local. For simplicity, just drop this optimization.
2022-01-19 12:25:01 -08:00
Fangrui Song d838bf2adc [ELF] Allow non-bitcode archive with an empty index
When an archive with an empty index contains only bitcode files, it is
handled as a group of lazy (--start-lib) object files. If there is a
non-bitcode file, there will be a diagnostic a la GNU ld.

For some programs, the archive member extraction ratio is high (e.g. for chrome,
79% archive members are extracted according to --print-archive-stats=). Because
symbol interning is cached for ObjFile::parseLazy but not for ArchiveFile,
parsing an archive as a group of --start-lib object files may be faster.

If the linker speculatively creates section representations for archive members,
the archive index will not be used.

If we take the above view, the archive index is essentially useless. If a user
wants a fast build without using --start-lib, they may just build thin archives
without index (`ar rcS --thin`).

Therefore, I suggest that we no longer treat the code as a hack, instead as a
supported feature. I believe we will do this anyway if we add parallel symbol
interning (parallel symbol interning for lazy object files is simpler than that
for archives).

Ecosystem issues:

* parseLazy actually has nearly the same behavior as ArchiveFile::parse, but the symbol order may be different.
* users may get addicted to the behavior and build archives not working with GNU ld and gold. I think it is easy to rebuild archives to be compatible.

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D117284
2022-01-19 10:01:53 -08:00
Fangrui Song 288082d45d [ELF] Move SHT_REL/SHT_RELA handling from createInputSection to initializeSections
This simplifies the code a bit. While here,

* change the `multiple relocation sections` diagnostic from `fatal` to `error` and include the relocated section name.
* drop less useful name from `getRelocTarget`. Without -r/--emit-relocs we don't need to get SHT_REL/SHT_RELA names.
2022-01-18 23:31:51 -08:00
Fangrui Song 84944b63f3 [ELF] Simplify ObjFile<ELFT>::initializeSections. NFC 2022-01-18 22:45:04 -08:00
Fangrui Song 5f404a749a [ELF] De-template InputSectionBase::getLocation. NFC 2022-01-18 17:33:58 -08:00
Fangrui Song eafd34581f [ELF] Simplify/optimize EhInputSection::split
and change some `fatal` to `errorOrWarn`.

EhFrame.cpp is a helper file. We don't place all .eh_frame implementation there,
so the code move is fine.
2022-01-18 17:03:23 -08:00
Alexander Shaposhnikov 2bb7f226af [lld] Fix typo. NFC 2022-01-18 02:33:27 +00:00
Fangrui Song 83c7f5d3fb [ELF] EhInputSection::split: remove unneeded check 2022-01-17 13:59:52 -08:00
Fangrui Song ac0986f880 [ELF] Change std::vector<InputSectionBase *> to SmallVector
There is no remaining std::vector<InputSectionBase> now. My x86-64 lld
executable is 2KiB small.
2022-01-17 10:25:07 -08:00
Fangrui Song f855074ed1 [ELF] GnuHashTableSection: replace stable_sort with 2-key sort. NFC
strTabOffset stabilizes llvm::sort. My x86-64 executable is 5+KiB smaller.
2022-01-17 00:34:42 -08:00
Fangrui Song 54fe70bfba [ELF] RelocationScanner::scanOne: replace rel.r_offset with offset. NFC 2022-01-17 00:05:27 -08:00
Fangrui Song 4c36567179 [ELF] Relocations: remove some cast<Undefined>. NFC 2022-01-17 00:02:47 -08:00
Fangrui Song b8d4eb84d7 [ELF] De-template getAlternativeSpelling. NFC 2022-01-16 23:56:25 -08:00
Fangrui Song 9c4292a59d [ELF] Remove unneeded SyntheticSection memset(*, 0, *)
After the D33630 fallout was properly fixed by a4c5db30be.

Tested by D37462/D44986 tests, the new --no-rosegment test in build-id.s, and a few --rosegment/--no-rosegment programs.
2022-01-16 22:51:57 -08:00
Fangrui Song a4c5db30be [ELF] Remove redundant fillTrap and memset(*, 0, *). NFC
The new tests in build-id.s would catch problems if we made a mistake here.
2022-01-16 22:37:31 -08:00
Fangrui Song aad90763d9 [ELF] RelocationSection<ELFT>::writeTo: use unstable partition 2022-01-16 21:44:19 -08:00
Fangrui Song 769057a5d0 [ELF] Change some DenseMap<StringRef, *> to DenseMap<CachedHashStringRef, *>. NFC 2022-01-16 21:19:01 -08:00
Fangrui Song e205445434 [ELF] StringTableSection: Use DenseMap<CachedHashStringRef> to avoid redundant hash computation
5~6% speedup when linking clang and chrome.
2022-01-16 21:02:05 -08:00
Alexandre Ganea e6b153947d Revert [LLD] Remove global state in lldCommon
It seems to be causing issues on https://lab.llvm.org/buildbot/#/builders/123/builds/8383
2022-01-16 11:03:06 -05:00
Alexandre Ganea f860fe3622 [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

Differential Revision: https://reviews.llvm.org/D108850
2022-01-16 08:57:57 -05:00
Fangrui Song e7c8cd4a93 [ELF] Remove forEachRelSec. NFC 2022-01-16 00:28:47 -08:00
Fangrui Song 9e885eac54 [ELF] Remove !isLazy() condition from computeBinding
Seems applicable since we demote lazy symbols to Undefined (D111365).
2022-01-15 23:58:15 -08:00
Fangrui Song c0fc09ab91 [ELF] Remove config->relocatable condition from Symbol::computeBinding 2022-01-15 23:49:48 -08:00
Fangrui Song b3cc47006b [ELF] Speed up Symbol::computeBinding. NFC
When computeBinding is inlined into includeInDynsym and computeIsPreemptible,
the optimizer can remove the config->gnuUnique load.
2022-01-15 23:40:44 -08:00
Fangrui Song 01a51629c2 [ELF] Slightly speed up Symbol::includeInDynsym. NFC 2022-01-15 23:32:48 -08:00
Fangrui Song 7330fd236e [ELF] Simplify Symbol::includeInDynsym 2022-01-15 23:27:45 -08:00
Fangrui Song 3736d0854a [ELF] Optimize -z combreloc
Sorting dynamic relocations is a bottleneck. Simplifying the comparator improves
performance. Linking clang is 4~5% faster with --threads=8.

This change may shuffle R_MIPS_REL32 for Mips and is a NFC for non-Mips.
2022-01-15 22:33:51 -08:00
Fangrui Song 102d0a2baf [ELF] Simplify elf::link exit. NFC 2022-01-15 17:59:05 -08:00
Fangrui Song 8b2f33231c [ELF] Make some diagnostics follow the convention 2022-01-15 10:46:25 -08:00
Igor Kudrin e00ac48df3 [ELF] Use tombstone values for discarded symbols in relocatable output
This extends D81784. Sections can be discarded when linking a
relocatable output. Before the patch, LLD did not update the content
of debug sections and only replaced the corresponding relocations with
R_*_NONE, which could break the debug information.

Differential Revision: https://reviews.llvm.org/D116946
2022-01-13 11:38:26 +07:00
Fangrui Song a5249c2dd2 [ELF] Change gnuHashTab/hashTab to unique_ptr. NFC
and remove associated make<XXX> calls.

My x86-64 `lld` is ~5KiB smaller.
2022-01-12 13:04:32 -08:00
Fangrui Song 43d927984c [ELF] Refactor how .gnu.hash and .hash are discarded
Switch to the D114180 approach which is simpler and allows gnuHashTab/hashTab to
switch to unique_ptr.
2022-01-12 12:47:07 -08:00
Fangrui Song bf9c8636f2 [ELF] Support discarding .relr.dyn
db08df0570 does not work because part.relrDyn is
a unique_ptr and `reset` destroys the object which may still be referenced.

This commit uses the D114180 approach. Also improve the test to check that there
is no R_X86_64_RELATIVE.
2022-01-12 11:55:22 -08:00
Fangrui Song 5014d6fc53 [ELF] -Map --why-extract=: print despite errors
Fix https://github.com/llvm/llvm-project/issues/53073

In case of a relocation error, GNU ld's link map includes
the archive member extraction information but not output sections.

Our -Map and --why-extract= are currently no-op in case of an error.
This change makes the two options work.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D116838
2022-01-12 10:40:33 -08:00
Fangrui Song db08df0570 [ELF] Support discarding .relr.dyn
to prepare for D116838, otherwise for linkerscript/discard-section-err.s,
there will be a null pointer dereference in `part.relrDyn->getParent()->size`
in `finalizeSynthetic(part.relrDyn.get())`.
2022-01-12 10:38:59 -08:00
Fangrui Song 37a1291885 [ELF] Add RelocationScanner. NFC
Currently the way some relocation-related static functions pass around
states is clumsy. Add a Resolver class to store some states as member
variables.

Advantages:

* Avoid the parameter `InputSectionBase &sec` (this offsets the cost passing around `this` paramemter)
* Avoid the parameter `end` (Mips and PowerPC hacks)
* `config` and `target` can be cached as member variables to reduce global state accesses. (potential speedup because the compiler didn't know `config`/`target` were not changed across function calls)
* If we ever want to reduce if-else costs (e.g. `config->emachine==EM_MIPS` for non-Mips) or introduce parallel relocation scan not handling some tricky arches (PPC/Mips), we can templatize Resolver

`target` isn't used as much as `config`, so I change it to a const reference
during the migration.

There is a minor performance inprovement for elf::scanRelocations.

Reviewed By: ikudrin, peter.smith

Differential Revision: https://reviews.llvm.org/D116881
2022-01-11 09:54:53 -08:00
Fangrui Song 5dbbd4eeb8 [ELF] Move OffsetGetter before some static functions. NFC
to prepare for D116881.
2022-01-10 20:16:02 -08:00