When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using `--threads=8`, I measured
```
9.131462 Total ExecuteLinker
1.449913 Total Write output file
1.445784 Total Write sections
0.657152 Write sections {"detail":".rela.dyn"}
```
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
* The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
* The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So no need to parallelize it.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D115993
This decreases struct sizes and usually decreases the lld executable
size (39KiB for my x86-64 executable) (unless in some cases smaller
SmallVector leads to more inlining, e.g. StringTableBuilder).
For --gdb-index, there may be memory usage saving.
Only called once. Moving to OutputSections.cpp can make it inlined.
finalizeInputSections can be very hot, especially in -O1 links with much debug info.
(Fixed an issue about GOT on a copy relocated alias.)
(Fixed an issue about not creating r_addend=0 IRELATIVE for unreferenced non-preemptible ifunc.)
The idea is to make scanRelocations mark some actions are needed (GOT/PLT/etc)
and postpone the real work to postScanRelocations. It gives some flexibility:
* Make it feasible to support .plt.got (PR32938): we need to know whether GLOB_DAT and JUMP_SLOT are both needed.
* Make non-preemptible IFUNC handling slightly cleaner: avoid setting/clearing sym.gotInIgot
* -z nocopyrel: report all copy relocation places for one symbol
* Make GOT deduplication feasible
* Make parallel relocation scanning feasible (if we can avoid all stateful operations and make Symbol attributes atomic), but parallelism may not be the appealing choice
Since this patch moves a large chunk of code out of ELFT templates. My x86-64
executable is actually a few hundred bytes smaller.
For ppc32-ifunc-nonpreemptible-pic.s: I remove absolute relocation references to non-preemptible ifunc
because absolute relocation references are incorrect in -fpie mode.
Reviewed By: peter.smith, ikudrin
Differential Revision: https://reviews.llvm.org/D114783
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).
Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)
Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).
Also, improve naming of some BaseCommand variables (base -> cmd).
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.
The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
Fix a null pointer dereference when .got.plt is discarded.
This also adds a test for discarding `.plt`.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D114180
For `InputSection` `.foo`, its `InputBaseSection::{areRelocsRela,firstRelocation,numRelocation}` basically
encode the information of `.rel[a].foo`. However, one uint32_t (the relocation section index)
suffices. See the implementation of `relsOrRelas`.
This change decreases sizeof(InputSection) from 184 to 176 on 64-bit Linux.
The maximum resident set size linking a large application (1.2G output) decreases by 0.39%.
Differential Revision: https://reviews.llvm.org/D112513
I think D79300 has fixed the D51892 (`__i686.get_pc_thunk.bx`) issue, so
we can bring back rL330869.
D79300 says `would error undefined symbol instead of the more relevant discarded section`
but it doesn't reproduce now.
This avoids a quirk in `isUndefWeak()`.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D111365
This option is a subset of -Bsymbolic-functions. It applies to STB_GLOBAL
STT_FUNC definitions.
The address of a vague linkage function (STB_WEAK STT_FUNC, e.g. an inline
function, a template instantiation) seen by a -Bsymbolic-functions linked
shared object may be different from the address seen from outside the shared
object. Such cases are uncommon. (ELF/Mach-O programs may use
`-fvisibility-inlines-hidden` to break such pointer equality. On Windows,
correct dllexport and dllimport are needed to make pointer equality work.
Windows link.exe enables /OPT:ICF by default so different inline functions may
have the same address.)
```
// a.cc -> a.o -> a.so (-Bsymbolic-functions)
inline void f() {}
void *g() { return (void *)&f; }
// b.cc -> b.o -> exe
// The address is different!
inline void f() {}
```
-Bsymbolic-non-weak-functions is a safer (C++ conforming) subset of
-Bsymbolic-functions, which can make such programs work.
Implementations usually emit a vague linkage definition in a COMDAT group. We
could detect the group (with more code) but I feel that we should just check
STB_WEAK for simplicity. A weak definition will thus serve as an escape hatch
for rare cases when users want interposition on definitions.
GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27871
Longer write-up: https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic
If Linux distributions migrate to protected non-vague-linkage external linkage
functions by default, the linker option can still be handy because it allows
rapid experiment without recompilation. Protected function addresses currently
have deep issues in GNU ld.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D102570
Since D100490 this case is diagnosed for -z rel. This commit implements
R_AARCH64_TLSDESC cases for AArch64::getImplicitAddend() and
AArch64::relocate(). However, there are probably further relocation types
that need to be handled for full support of -z rel.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47009
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100544
This patch changes the DynamicReloc class to store an enum instead
of the overloaded useSymVA member to make it easier to understand
and fix incorrect addends being written in some corner cases. The
change is motivated by a follow-up review that checks the value of
implicit Elf_Rel addends written to the output file.
This patch fixes an incorrect output when using `-z rela` for i386 files
with R_386_GOT32 relocations (not that this really matters since it's an
unsupported configuration).
Storing the relocation expression kind also addresses an incorrect addend
FIXME in ppc64-abs64-dyn.s introduced in D63383.
DynamicReloc now also has a special case for the MIPS TLS relocations
(DynamicReloc::AgainstSymbolWithTargetVA) since the
R_MIPS_TLS_TPREL{32/64} the symbol VA to the GOT for preemptible
symbols. I'm not sure if the symbol value actually should be written
for R_MIPS_TLS_TPREL32, but this patch does not attempt to change
that behaviour.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100490
D62727 removed GotEntrySize and GotPltEntrySize with a comment that they
are always equal to wordsize(), but that is not entirely true: X32 has a
word size of 4, but needs 8-byte GOT entries. This restores gotEntrySize
for both, adjusted for current naming conventions, but defaults it to
config->wordsize to keep things simple for architectures other than
x86_64.
This partially reverts D62727.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102509
Adopt my suggestion in https://reviews.llvm.org/D91426#2653926 ,
generalizing the ppc64 specific code.
GNU ld and glibc ld.so has a contract about the first few entries of .got .
There are somewhat complex conditions when the header is needed. This patch
switches to a simpler approach: add a header unconditionally if
_GLOBAL_OFFSET_TABLE_ is used or the number of entries is more than just the
header.
The original page no longer works, so use a web.archive.org link instead.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D100949
This is the same problem as 127176e59e, but for static TLS rather than
dynamic TLS. Although we know the symbol will be the one in our own TLS
segment, and thus the offset of it within that, we don't know where in
the static TLS block our data will be allocated and thus we must emit a
dynamic relocation for this case.
Reviewed By: MaskRay, atanasyan
Differential Revision: https://reviews.llvm.org/D101381
Whilst not wrong (unless using static PIE where the relocations are
likely not implemented by the runtime), this is inefficient, as the TLS
module indices and offsets are independent of the executable's load
address.
Reviewed By: MaskRay, atanasyan
Differential Revision: https://reviews.llvm.org/D101382
An unfetched lazy symbol (undefined weak) should be considered to have its
original versionId which is VER_NDX_GLOBAL, instead of the lazy symbol's
versionId. (The original versionId cannot be non-VER_NDX_GLOBAL because a
undefined versioned symbol is an error.)
The regression was introduced in D77280 when making version scripts work
with lazy symbols fetched by LTO calls.
Fix PR49915
Differential Revision: https://reviews.llvm.org/D100624
From the PowerPC ELFv2 ABI section 4.2.3. Global Offset Table.
```
The GOT consists of an 8-byte header that contains the TOC base (the first TOC
base when multiple TOCs are present), followed by an array of 8-byte addresses.
```
Due to the introduction of PC Relative code it is now possible to require a GOT
without having a .TOC. symbol in the object that is being linked. Since LLD uses
the .TOC. symbol to determine whether or not a GOT is required the GOT header is
not setup correctly and the 8-byte header is missing.
This patch allows the Power PC GOT setup to happen when an element is added to
the GOT instead of at the very begining. When this header is added a .TOC.
symbol is also added.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D91426
Fixes PR48693: --emit-relocs keeps relocation sections. --gdb-index drops
.debug_gnu_pubnames and .debug_gnu_pubtypes but not their relocation sections.
This can cause a null pointer dereference in `getOutputSectionName`.
Also delete debug-gnu-pubnames.s which is covered by gdb-index.s
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D94354
As indicated by AArch64 ELF specification, symbols with st_other
marked with STO_AARCH64_VARIANT_PCS indicates it may follow a variant
procedure call standard with different register usage convention
(for instance SVE calls).
Static linkers must preserve the marking and propagate it to the dynamic
symbol table if any reference or definition of the symbol is marked with
STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS dynamic tag if
there are R_<CLS>_JUMP_SLOT relocations that reference that symbols.
It implements https://bugs.llvm.org/show_bug.cgi?id=48368.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93045
In the presence of a gap, the st_value field of a STT_SECTION symbol is the
address of the first input section (incorrect if there is a gap). Set it to the
output section address instead.
In -r mode, this bug can cause an incorrect non-zero st_value of a STT_SECTION
symbol (while output sections have zero addresses, input sections may have
non-zero outSecOff). The non-zero st_value can cause the final link to have
incorrect relocation computation (both GNU ld and LLD add st_value of the
STT_SECTION symbol to the output section address).
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D90520
The ELF spec says
> If the sh_flags field for this section header includes the attribute SHF_INFO_LINK, then this member represents a section header table index.
Set SHF_INFO_LINK so that binary manipulation tools know that sh_info is
a section header table index instead of (the number of local symbols in the case of SHT_SYMTAB/SHT_DYNSYM).
We have already added SHF_INFO_LINK for --emit-relocs retained SHT_REL[A].
For example, we can teach llvm-objcopy to preserve the section index of the sh_info referenced section if
SHF_INFO_LINK is set. (GNU objcopy recognizes .rel[a].plt and updates
sh_info even if SHF_INFO_LINK is not set).
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D89828
The combination has not been tested before. In the case of ICF,
`e.section->getVA(0)` equals the start address of the output section.
This can cause incorrect overlapping with the actual function at the
start of the output section and potentially trigger a GDB internal error
in `dw2_find_pc_sect_compunit_symtab` (presumably because:
if a short address range incorrectly starts at the start address of the
output section, GDB may pick it instead of the correct longer address
range. When mapping an address within the long address range but
out of the scope of the short address range, the routine may find
nothing - while the code asserts that it can find something).
Note that in the case of ICF there may be duplicate address range entries,
but GDB appears to be fine with them.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D89751
Thunk alignment is added in thie patch when using pc-rel instructions
to avoid crossing the 64 byte boundary.
Patched by: nemanjai, NeHuang
Reviewed By: sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D85973
-gdwarf-5 -fdebug-types-section may produce multiple .debug_info sections. All
except one are type units (.debug_types before DWARF v5). When constructing
.gdb_index, we should ignore these type units. We use a simple heuristic: the
compile unit does not have the SHF_GROUP flag. (This needs to be revisited if
people place compile unit .debug_info in COMDAT groups.)
This issue manifests as a data race: because an object file may have multiple
.debug_info sections, we may concurrently construct `LLDDwarfObj` for the same
file in multiple threads. The threads may access `InputSectionBase::data()`
concurrently on the same input section. `InputSectionBase::data()` does a lazy
uncompress() and rewrites the member variable `rawData`. A thread running zlib
`inflate()` (transitively called by uncompress()) on a buffer with `rawData`
tampered by another thread may fail with `uncompress failed: zlib error: Z_DATA_ERROR`.
Even if no data race occurred in an optimistic run, if there are N .debug_info,
one CU entry and its address ranges will be replicated N times. The result
.gdb_index can be much larger than a correct one.
The new test gdb-index-dwarf5-type-unit.s actually has two compile units. This
cannot be produced with regular approaches (it can be produced with -r
--unique). This is used to demonstrate that the .gdb_index construction code
only considers the last non-SHF_GROUP .debug_info
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D85579
For an InputSection, the `buf` argument of `InputSectionBase::relocate` points
to the content of the containing OutputSection, instead of the content of the
InputSection itself, so `outSecOff` needs to be added in its callees. This is
counter-intuitive and leads to many `- outSecOff` and `+ outSecOff`.
This patch makes `InputSection::writeTo` call `InputSectionBase::relocate` with
`outSecOff` added. relocAlloc/relocNonAlloc/relocateNonAllocForRelocatable can
thus be simplified now.
Updated test:
* non-abs-reloc.s: A minor offset bug is fixed for a diagnostic in `relocateNonAlloc`
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D85618
Fix PR36272 and PR46835
A .eh_frame FDE references a text section and (optionally) a LSDA (in
.gcc_except_table). Even if two text sections have identical content and
relocations (e.g. a() and b()), we cannot fold them if their LSDA are different.
```
void foo();
void a() {
try { foo(); } catch (int) { }
}
void b() {
try { foo(); } catch (float) { }
}
```
Scan .eh_frame pieces with LSDA and disallow referenced text sections to be
folded. If two .gcc_except_table have identical semantics (usually identical
content with PC-relative encoding), we will lose folding opportunity.
For ClickHouse (an exception-heavy application), this can reduce --icf=all efficiency
from 9% to 5%. There may be some percentage we can reclaim without affecting
correctness, if we analyze .eh_frame and .gcc_except_table sections.
gold 2.24 implemented a more complex fix (resolution to
https://sourceware.org/bugzilla/show_bug.cgi?id=21066) which combines the
checksum of .eh_frame CIE/FDE pieces.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D84610
The patch adds checking for various potential issues in parsing name
lookup tables and reporting them as recoverable errors, similarly as we
do for other tables.
Differential Revision: https://reviews.llvm.org/D83050
The parsing method did not check reading errors and might easily fall
into an infinite loop on an invalid input because of that.
Differential Revision: https://reviews.llvm.org/D83049