Rewritting the path of the sample profile file in response.txt to be relative to the repro tar.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D96193
A SHF_LINK_ORDER .gcc_except_table is similar to a .gcc_except_table in
a section group. The associated text section is responsible for retaining it.
LLD still does not support GC of non-group non-SHF_LINK_ORDER .gcc_except_table -
but that is not necessary because we can teach the compiler to set SHF_LINK_ORDER.
The current diagnostic has confused users. The new wording is adapted from one suggested by Ian Lance Taylor.
Differential Revision: https://reviews.llvm.org/D95917
binutils 2.36 introduced the new section flag SHF_GNU_RETAIN (for ELFOSABI_GNU &
ELFOSABI_FREEBSD) to mark a sections as a GC root. Several LLVM side toolchain
folks (including me) were involved in the design process of SHF_GNU_RETAIN and
were happy with this proposal.
Currently GNU ld only respects SHF_GNU_RETAIN semantics for ELFOSABI_GNU &
ELFOSABI_FREEBSD object files
(https://sourceware.org/bugzilla/show_bug.cgi?id=27282). GNU ld sets EI_OSABI
to ELFOSABI_GNU for relocatable output
(https://sourceware.org/bugzilla/show_bug.cgi?id=27091). In practice the single
value EI_OSABI is neither a good indicator for object file compatibility, nor a
useful mechanism marking used ELF extensions.
For input, we respect SHF_GNU_RETAIN semantics even for ELFOSABI_NONE object
files. This is compatible with how LLD and GNU ld handle (mildly useful) STT_GNU_IFUNC
/ (emitted by GCC, considered misfeature by some folks) STB_GNU_UNIQUE input.
(As of LLVM 12.0.0, the integrated assembler does not set ELFOSABI_GNU for
STT_GNU_IFUNC/STB_GNU_UNIQUE).
Arguably STT_GNU_IFUNC/STB_GNU_UNIQUE probably need indicators in object files
but SHF_GNU_RETAIN is more likely accepted by more OSABI platforms.
For output, we take a step further than GNU ld: we don't promote ELFOSABI_NONE
to ELFOSABI_GNU for all output.
Differential Revision: https://reviews.llvm.org/D95749
In GCC emitted .debug_info sections, R_386_GOTOFF may be used to
relocate DW_AT_GNU_call_site_value values
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98946).
R_386_GOTOFF (`S + A - GOT`) is one of the `isStaticLinkTimeConstant` relocation
type which is not PC-relative, so it can be used from non-SHF_ALLOC sections. We
current allow new relocation types as needs come. The diagnostic has caught some
bugs in the past.
Differential Revision: https://reviews.llvm.org/D95994
The option catches incompatibility between `R_*_IRELATIVE` and DT_TEXTREL/DF_TEXTREL
before glibc 2.29. Newer glibc versions are more common nowadays and I don't
think this option has ever been used. Diagnosing this problem is also
straightforward by reading the stack trace.
Identify dynamically exported symbols (--export-dynamic[-symbol=],
--dynamic-list=, or definitions needed to preempt shared objects) and
prevent their LTO visibility from being upgraded.
This helps avoid use of whole program devirtualization when there may
be overrides in dynamic libraries.
Differential Revision: https://reviews.llvm.org/D91583
I noticed that this option was not appearing at all in the `--help`
messages for `wasm-ld` or `ld.lld`.
Add help text and make it consistent across all ports.
Differential Revision: https://reviews.llvm.org/D94925
If foo is referenced in any object file, bitcode file or shared object,
`__wrap_foo` should be retained as the redirection target of sym
(f96ff3c0f8).
If the object file defining foo has foo references, we cannot easily distinguish
the case from cases where foo is not referenced (we haven't scanned
relocations). Retain `__wrap_foo` because we choose to wrap sym references
regardless of whether sym is defined to keep non-LTO/LTO/relocatable links' behaviors similar
https://sourceware.org/bugzilla/show_bug.cgi?id=26358 .
If foo is defined in a shared object, `__wrap_foo` can still be omitted
(`wrap-dynamic-undef.s`).
Reviewed By: andrewng
Differential Revision: https://reviews.llvm.org/D95152
Fixes PR48523. When the linker errors with "output file too large",
one question that comes to mind is how the section sizes differ from
what they were previously. Unfortunately, this information is lost
when the linker exits without writing the output file. This change
makes it so that the error message includes the sizes of the largest
sections.
Reviewed By: MaskRay, grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D94560
```
// a.s
jmp fcntl
// b.s
.globl fcntl
fcntl:
ret
```
`ld.lld -shared --wrap=fcntl a.o b.o` has an `R_X86_64_JUMP_SLOT` referencing
the index 0 undefined symbol, which will cause a glibc `symbol lookup error` at
runtime. This is because `__wrap_fcntl` is not in .dynsym
We use an approximation `!wrap->isUndefined()`, which doesn't set
`isUsedInRegularObj` of `__wrap_fcntl` when `fcntl` is referenced and
`__wrap_fcntl` is undefined.
Fix this by using `sym->referenced`.
R_PPC64_ADDR16_HI represents bits 16-31 of a 32-bit value
R_PPC64_ADDR16_HIGH represents bits 16-31 of a 64-bit value.
In the Linux kernel, `LOAD_REG_IMMEDIATE_SYM` defined in `arch/powerpc/include/asm/ppc_asm.h`
uses @l, @high, @higher, @highest to load the 64-bit value of a symbol.
Fixes https://github.com/ClangBuiltLinux/linux/issues/1260
The commit 18aa0be36e changed the default GotBaseSymInGotPlt to true
for AArch64. This is different than binutils, where
_GLOBAL_OFFSET_TABLE_ points at the start or .got.
It seems to not intefere with current relocations used by LLVM. However
as indicated by PR#40357 [1] gcc generates R_AARCH64_LD64_GOTPAGE_LO15
for -pie (in fact it also generated the relocation for -fpic).
This change is requires to correctly handle R_AARCH64_LD64_GOTPAGE_LO15
by lld from objects generated by gcc.
[1] https://bugs.llvm.org/show_bug.cgi?id=40357
OutputSections.h used to close the lld::elf namespace only to
immediately open it again. This change merges both parts into
one.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D94538
Fixes PR48693: --emit-relocs keeps relocation sections. --gdb-index drops
.debug_gnu_pubnames and .debug_gnu_pubtypes but not their relocation sections.
This can cause a null pointer dereference in `getOutputSectionName`.
Also delete debug-gnu-pubnames.s which is covered by gdb-index.s
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D94354
Fixes PR48681: after LTO, lto.tmp may reference a libcall symbol not in an IR
symbol table of any bitcode file. If such a symbol is defined in an archive
matched by a --exclude-libs, we don't correctly localize the symbol.
Add another `excludeLibs` after `compileBitcodeFiles` to localize such libcall
symbols. Unfortunately we have keep the existing one for D43126.
Using VER_NDX_LOCAL is an implementation detail of `--exclude-libs`, it does not
necessarily tie to the "localize" behavior. `local:` patterns in a version
script can be omitted.
The `symbol ... has undefined version ...` error should not be exempted.
Ideally we should error as GNU ld does. https://issuetracker.google.com/issues/73020933
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D94280
This character indicates that when return pointer authentication is
being used, the function signs the return address using the B key.
Differential Revision: https://reviews.llvm.org/D93954
Add support for linking powerpcle code in LLD.
Rewrite lld/test/ELF/emulation-ppc.s to use a shared check block and add powerpcle tests.
Update tests.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93917
We can reduce the number of "using" declarations.
`LLVM_ELF_IMPORT_TYPES_ELFT` was extended in D93801.
Differential revision: https://reviews.llvm.org/D93856
For x86-64, D33100 added a diagnostic for local-exec TLS relocations referencing a preemptible symbol.
This patch generalizes it to non-preemptible symbols (see `-Bsymbolic` in `tls.s`)
on all targets.
Local-exec TLS relocations resolve to offsets relative to a fixed point within
the static TLS block, which are only meaningful for the executable.
With this change, `clang -fpic -shared -fuse-ld=bfd a.c` on the following example will be flagged for AArch64/ARM/i386/x86-64/RISC-V
```
static __attribute__((tls_model("local-exec"))) __thread long TlsVar = 42;
long bump() { return ++TlsVar; }
```
Note, in GNU ld, at least arm, riscv and x86's ports have the similar
diagnostics, but aarch64 and ppc64 do not error.
Differential Revision: https://reviews.llvm.org/D93331
Alternative to D91611.
The TLS General Dynamic/Local Dynamic code sequences need to mark
`__tls_get_addr` with R_PPC64_TLSGD or R_PPC64_TLSLD, e.g.
```
addis r3, r2, x@got@tlsgd@ha # R_PPC64_GOT_TLSGD16_HA
addi r3, r3, x@got@tlsgd@l # R_PPC64_GOT_TLSGD16_LO
bl __tls_get_addr(x@tlsgd) # R_PPC64_TLSGD followed by R_PPC64_REL24
nop
```
However, there are two deviations form the above:
1. direct call to `__tls_get_addr`. This is essential to implement ld.so in glibc/musl/FreeBSD.
```
bl __tls_get_addr
nop
```
This is only used in a -shared link, and thus not subject to the GD/LD to IE/LE
relaxation issue below.
2. Missing R_PPC64_TLSGD/R_PPC64_TLSGD for compiler generated TLS references
According to Stefan Pintille, "In the early days of the transition from the
ELFv1 ABI that is used for big endian PowerPC Linux distributions to the ELFv2
ABI that is used for little endian PowerPC Linux distributions, there was some
ambiguity in the specification of the relocations for TLS. The GNU linker has
implemented support for correct handling of calls to __tls_get_addr with a
missing relocation. Unfortunately, we didn't notice that the IBM XL compiler
did not handle TLS according to the updated ABI until we tried linking XL
compiled libraries with LLD."
In short, LLD needs to work around the old IBM XL compiler issue.
Otherwise, if the object file is linked in -no-pie or -pie mode,
the result will be incorrect because the 4 instructions are partially
rewritten (the latter 2 are not changed).
Work around the compiler bug by disable General Dynamic/Local Dynamic to
Initial Exec/Local Exec relaxation. Note, we also disable Initial Exec
to Local Exec relaxation for implementation simplicity, though technically it can be kept.
ppc64-tls-missing-gdld.s demonstrates the updated behavior.
Reviewed By: #powerpc, stefanp, grimar
Differential Revision: https://reviews.llvm.org/D92959
The scope of R_TLS (TP offset relocation types (TPREL/TPOFF) used for the
local-exec TLS model) is actually narrower than its name may imply. R_TLS_NEG
is only used by Solaris R_386_TLS_LE_32.
Rename them so that they will be less confusing.
Reviewed By: grimar, psmith, rprichard
Differential Revision: https://reviews.llvm.org/D93467
Libraries linked to the lld elf library exposes a function named main.
When debugging code linked to such libraries and intending to set a
breakpoint at main, the debugger also sets breakpoint at the main
function at lld elf driver. The possible choice was to rename it to
link but that would again clash with lld::*::link. This patch tries
to consistently rename them to linkerMain.
Differential Revision: https://reviews.llvm.org/D91418
As indicated by AArch64 ELF specification, symbols with st_other
marked with STO_AARCH64_VARIANT_PCS indicates it may follow a variant
procedure call standard with different register usage convention
(for instance SVE calls).
Static linkers must preserve the marking and propagate it to the dynamic
symbol table if any reference or definition of the symbol is marked with
STO_AARCH64_VARIANT_PCS, and add a DT_AARCH64_VARIANT_PCS dynamic tag if
there are R_<CLS>_JUMP_SLOT relocations that reference that symbols.
It implements https://bugs.llvm.org/show_bug.cgi?id=48368.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93045
Fix PR48357: If .rela.dyn appears as an output section description, its type may
be SHT_RELA (due to the empty synthetic .rela.plt) while there is no input
section. The empty .rela.dyn may be retained due to a reference in a linker
script. Don't crash.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D93367
Follow the naming set by TI's own GCC-based toolchain.
Also, force the `osabi` field to `ELFOSABI_STANDALONE`, this matches GNU LD's output (the patching is done in `elf32_msp430_post_process_headers`).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D92931
Normally we should not delete options. However, the Clang driver passes
`-plugin-opt={new,legacy}-pass-manager` instead of
`--[no-]lto-legacy-pass-manager` (`-plugin-opt=new-pass-manager` has been used
since 7.0), and it is unlikely anyone will use the `--lto-*` style options directly.
So let's rename them to be consistent with the Clang driver option names.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D92988
-DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=on configured LLD and LLVMgold.so
will use the new pass manager by default. Add an option to
use the legacy pass manager. This will also be used by the Clang driver
when -fno-new-pass-manager (D92915) / -fno-experimental-new-pass-manager is set.
Reviewed By: aeubanks, tejohnson
Differential Revision: https://reviews.llvm.org/D92916
This patch changes the archive handling to enable the semantics needed
for legacy FORTRAN common blocks and block data. When we have a COMMON
definition of a symbol and are including an archive, LLD will now
search the members for global/weak defintions to override the COMMON
symbol. The previous LLD behavior (where a member would only be included
if it satisifed some other needed symbol definition) can be re-enabled with the
option '-no-fortran-common'.
Differential Revision: https://reviews.llvm.org/D86142
This reverts a side effect introduced in the code cleanup patch D43571:
LLD started to emit empty output sections that are explicitly assigned to a segment.
This patch fixes the issue by removing the !sec.phdrs.empty() special case from
isDiscardable. As compensation, we add an early phdrs propagation step (see the inline comment).
This is similar to one that we do in adjustSectionsAfterSorting.
Differential revision: https://reviews.llvm.org/D92301
Also, for .o files, include full path as given on link command line.
Before:
lld: error: undefined symbol [...], referenced from sandbox_logging.o
After:
lld: error: undefined symbol [...], referenced from libseatbelt.a(sandbox_logging.o)
Move archiveName up to InputFile so we can consistently use toString()
to print InputFiles in diags, and pass it to the ObjFile ctor. This
matches the ELF and COFF ports.
Differential Revision: https://reviews.llvm.org/D92437
If an object file has an undefined foo@v1, we emit a dynamic symbol foo.
This is incorrect if at runtime a shared object provides the non-default version foo@v1
(the undefined foo may bind to foo@@v2, for example).
GNU ld issues an error for this case, even if foo@v1 is undefined weak
(https://sourceware.org/bugzilla/show_bug.cgi?id=3351). This behavior makes
sense because to represent an undefined foo@v1, we have to construct a Verneed
entry. However, without knowing the defining filename, we cannot construct a
Verneed entry (Verneed::vn_file is unavailable).
This patch implements the error.
Depends on D92258
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D92260
The symbol resolution rules for versioned symbols are:
* foo@@v1 (default version) resolves both undefined foo and foo@v1
* foo@v1 (non-default version) resolves undefined foo@v1
Note, foo@@v1 must be defined (the assembler errors if attempting to
create an undefined foo@@v1).
For defined foo@@v1 in a shared object, we call `SymbolTable::addSymbol` twice,
one for foo and the other for foo@v1. We don't do the same for object files, so
foo@@v1 defined in one object file incorrectly does not resolve a foo@v1
reference in another object file.
This patch fixes the issue by reusing the --wrap code to redirect symbols in
object files. This has to be done after processing input files because
foo and foo@v1 are two separate symbols if we haven't seen foo@@v1.
Add a helper `Symbol::getVersionSuffix` to retrieve the optional trailing
`@...` or `@@...` from the possibly truncated symbol name.
Depends on D92258
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D92259
All three use readFile() for their argument so their argument file is
already copied to the tar, but we weren't rewriting the argument to
point to the path used in the tar file.
No test because the change is trivial (several other flags in
createResponseFile() also aren't tested, likely for the same reason.)
Differential Revision: https://reviews.llvm.org/D92356
This is the #1 of 2 changes that make remarks hotness threshold option
available in more tools. The changes also allow the threshold to sync with
hotness threshold from profile summary with special value 'auto'.
This change modifies the interface of lto::setupLLVMOptimizationRemarks() to
accept remarks hotness threshold. Update all the tools that use it with remarks
hotness threshold options:
* lld: '--opt-remarks-hotness-threshold='
* llvm-lto2: '--pass-remarks-hotness-threshold='
* llvm-lto: '--lto-pass-remarks-hotness-threshold='
* gold plugin: '-plugin-opt=opt-remarks-hotness-threshold='
Differential Revision: https://reviews.llvm.org/D85809
clang may produce `movl x@GOTPCREL+4(%rip), %eax` when loading the high 32 bits
of the address of a global variable in -fpic/-fpie mode.
If assembled by GNU as, the fixup emits an R_X86_64_GOTPCRELX with an
addend != -4. The instruction loads from the GOT entry with an offset
and thus it is incorrect to relax the instruction.
If assembled by the integrated assembler, we emit R_X86_64_GOTPCREL for
relocations that definitely cannot be relaxed (D92114), so this patch is not
needed.
This patch disables the relaxation, which is compatible with the implementation in GNU ld
("Add R_X86_64_[REX_]GOTPCRELX support to gas and ld").
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D91993
This adds support for ld.lld's --reproduce / lld-link's /reproduce:
flag to the MachO port. This flag can be added to a link command
to make the link write a tar file containing all inputs to the link
and a response file containing the link command. This can be used
to reproduce the link on another machine, which is useful for sharing
bug report inputs or performance test loads.
Since the linker is usually called through the clang driver and
adding linker flags can be a bit cumbersome, setting the env var
`LLD_REPRODUCE=foo.tar` triggers the feature as well.
The file response.txt in the archive can be used with
`ld64.lld.darwinnew $(cat response.txt)` as long as the contents are
smaller than the command-line limit, or with `ld64.lld.darwinnew
@response.txt` once D92149 is in.
The support in this patch is sufficient to create a tar file for
Chromium's base_unittests that can link after unpacking on a different
machine.
Differential Revision: https://reviews.llvm.org/D92274
For --gc-sections, SmallVector<InputSection *, 256> -> SmallVector<InputSection *, 0> because the code bloat (1296 bytes) is not worthwhile (the saved reallocation is negligible).
For OutputSection::compressedData, N=1 is useless (for a compressed .debug_*, the size is always larger than 1).
Also sync help texts for the option between elf and coff ports.
Decisions:
- Do this even if /lldignoreenv is passed. /reproduce: does not affect
the main output, and this makes the env var more convenient to use.
(On the other hand, it's now possible to set this env var and forget
about it, and all future builds in the same shell will be much slower.
That's true for ld.lld, but posix shells have an easy way to set an
env var for a single command; in cmd.exe this is not possible without
contortions. Then again, lld-link runs in posix shells too.)
Original patch rebased across D68378 and D68381.
Differential Revision: https://reviews.llvm.org/D67707
With this change, `TargetInfo::adjustRelaxExpr` is only related to TLS
relaxations and a subsequent clean-up can delete the `data` parameter.
Differential Revision: https://reviews.llvm.org/D92079
Enables overriding earlier --lto-whole-program-visibility.
Variant of D91583 while discussing alternate ways to identify and
handle the --export-dynamic case.
Differential Revision: https://reviews.llvm.org/D92060
Also use "unknown flag 'flag'" instead of "unknown flag: flag" for
consistency with the other ports.
Differential Revision: https://reviews.llvm.org/D91970
This allows to reuse the RelocationResolver from the code
that doesn't want to deal with `RelocationRef` class.
I am going to use it in llvm-readobj. See the description
of D91530 for more details.
Differential revision: https://reviews.llvm.org/D91533
As mentioned in https://reviews.llvm.org/D67479#1667256 ,
* `--[no-]allow-shlib-undefined` control the diagnostic for an unresolved symbol in a shared object
* `-z defs/-z undefs` control the diagnostic for an unresolved symbol in a regular object file
* `--unresolved-symbols=` controls both bits.
In addition, make --warn-unresolved-symbols affect --no-allow-shlib-undefined.
This patch makes the behavior match GNU ld.
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D91510
This adds `--[no-]color-diagnostics[=auto,never,always]` to
the MachO port and harmonizes the flag in the other ports:
- Consistently use MetaVarName
- Consistently document the non-eq version as alias of the eq version
- Use B<> in the ports that have it (no-op, shorter)
- Fix oversight in COFF port that made the --no flag have the wrong
prefix
Differential Revision: https://reviews.llvm.org/D91640
`try ... catch` in an inline function produces `.gcc_except_table.*` in a COMDAT
group with GCC or newer Clang (since D83655). For --gc-sections, currently we
scan `.eh_frame` pieces and mark liveness of such a `.gcc_except_table.*` and
then the associated `.text.*` (if a member in a section group is retained, the
others should be retained as well).
Essentially all `.text.*` and `.gcc_except_table.*` compiled from inline
functions with `try ... catch` cannot be discarded by the imprecise
--gc-sections. Compared with the state before D83655, the output
`.gcc_except_table` is smaller (non-prevailing copies in COMDAT groups can now
be discarded) but `.text` may be larger, i.e. size regression.
This patch teaches the .eh_frame piece scanning code to not mark
`.gcc_except_table` in a section group, thus allow unused `.text.*` and
`.gcc_except_table.*` in a section group to be discarded.
Note, non-group `.gcc_except_table` can still not be discarded. That is the status quo.
Reviewed By: grimar, echristo
Differential Revision: https://reviews.llvm.org/D91579
Fixes PR48071
* The Rust compiler produces SHF_ALLOC `.debug_gdb_scripts` (which normally does not have the flag)
* `.debug_gdb_scripts` sections are removed from `inputSections` due to --strip-debug/--strip-all
* When processing --gc-sections, pieces of a SHF_MERGE section can be marked live separately
`=>` segfault when marking liveness of a `.debug_gdb_scripts` which is not split into pieces (because it is not in `inputSections`)
This patch circumvents the problem by not treating SHF_ALLOC ".debug*" as debug sections (to prevent --strip-debug's stripping)
(which is still useful on its own).
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D91291
Input sections `.ctors/.ctors.N` may go to either the output section `.init_array` or the output section `.ctors`:
* output `.ctors`: currently we sort them by name. This patch changes to sort by priority from high to low. If N in `.ctors.N` is in the form of %05u, there is no semantic difference. Actually GCC and Clang do use %05u. (In the test `ctors_dtors_priority.s` and Gold's test `gold/testsuite/script_test_14.s`, we can see %03u, but they are not really produced by compilers.)
* output `.init_array`: users can provide an input section description `SORT_BY_INIT_PRIORITY(.init_array.* .ctors.*)` to mix `.init_array.*` and `.ctors.*`. This can make .init_array.N and .ctors.(65535-N) interchangeable.
With this change, users can mix `.ctors.N` and `.init_array.N` in `.init_array` (PR44698 and PR48096) with linker scripts. As an example:
```
SECTIONS {
.init_array : {
*(SORT_BY_INIT_PRIORITY(.init_array.* .ctors.*))
*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)
}
} INSERT AFTER .fini_array;
SECTIONS {
.fini_array : {
*(SORT_BY_INIT_PRIORITY(.fini_array.* .dtors.*))
*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)
}
} INSERT BEFORE .init_array;
```
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D91187
According to
https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html#Input-Section-Basics
for `*(.a .b)`, the order should match the input order:
* for `ld 1.o 2.o`, sections from 1.o precede sections from 2.o
* within a file, `.a` and `.b` appear in the section header table order
This patch implements the behavior. The interaction with `SORT*` and --sort-section is:
Matched sections are ordered by radix sort with the keys being `(SORT*, --sort-section, input order)`,
where `SORT*` (if present) is most significant.
> Note, multiple `SORT*` within an input section description has undocumented and
> confusing behaviors in GNU ld:
> https://sourceware.org/pipermail/binutils/2020-November/114083.html
> Therefore multiple `SORT*` is not the focus for this patch but
> this patch still strives to have an explainable behavior.
As an example, we partition `SORT(a.*) b.* c.* SORT(d.*)`, into
`SORT(a.*) | b.* c.* | SORT(d.*)` and perform sorting within groups. Sections
matched by patterns between two `SORT*` are sorted by input order. If
--sort-alignment is given, they are sorted by --sort-alignment, breaking tie by
input order.
This patch also allows a section to be matched by multiple patterns, previously
duplicated sections could occupy more space in the output and had erroneous zero bytes.
The patch is in preparation for support for
`*(SORT_BY_INIT_PRIORITY(.init_array.* .ctors.*)) *(.init_array .ctors)`,
which will allow LLD to mix .ctors*/.init_array* like GNU ld (gold's --ctors-in-init-array)
PR44698 and PR48096
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D91127
The second `SORT` in `*(SORT(...) SORT(...))` is incorrectly parsed as a file pattern.
Fix the bug by stopping at `SORT*` in `readInputSectionsList`.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D91180
I noticed when running a large link with the --time-trace option that
there were several areas which were missing any specific time trace
categories (aside from the generic link/ExecuteLinker categories). This
patch adds new categories to fill most of the "gaps", or to provide more
detail than was previously provided.
Reviewed by: MaskRay, grimar, russell.gallop
Differential Revision: https://reviews.llvm.org/D90686
On LP64/Windows platforms, this decreases sizeof(InputSection) from 208 (larger
on Windows) to 184.
For a large executable (7.6GiB, inputSections.size()=5105122,
make<InputSection> called 4835760 times), this decreases cgroup
memory.max_usage_in_bytes by 0.6%
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D91018
Make it possible for lld users to provide a custom script that would help to
find missing libraries. A possible scenario could be:
% clang /tmp/a.c -fuse-ld=lld -loauth -Wl,--error-handling-script=/tmp/addLibrary.py
unable to find library -loauth
looking for relevant packages to provides that library
liboauth-0.9.7-4.el7.i686
liboauth-devel-0.9.7-4.el7.i686
liboauth-0.9.7-4.el7.x86_64
liboauth-devel-0.9.7-4.el7.x86_64
pix-1.6.1-3.el7.x86_64
Where addLibrary would be called with the missing library name as first argument
(in that case addLibrary.py oauth)
Differential Revision: https://reviews.llvm.org/D87758
In the presence of a gap, the st_value field of a STT_SECTION symbol is the
address of the first input section (incorrect if there is a gap). Set it to the
output section address instead.
In -r mode, this bug can cause an incorrect non-zero st_value of a STT_SECTION
symbol (while output sections have zero addresses, input sections may have
non-zero outSecOff). The non-zero st_value can cause the final link to have
incorrect relocation computation (both GNU ld and LLD add st_value of the
STT_SECTION symbol to the output section address).
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D90520
While MC did not produce R_X86_64_GOTPCRELX for test/binop instructions
(movl/adcl/addl/andl/...) before the previous commit, this code path has been
exercised by -fno-integrated-as for GNU as since 2016: -no-pie relaxing
may incorrectly access loc[-3] and produce a corrupted instruction.
Simply handle test/binop R_X86_64_GOTPCRELX like R_X86_64_GOTPCREL.
This partially reverts D85994.
In glibc, elf/dl-sym.c calls the raw `__tls_get_addr` by specifying the
tls_index parameter. Such a call does not have a pairing R_PPC64_TLSGD/R_PPC64_TLSLD.
This is legitimate. Since we cannot distinguish the benign case from cases due
to toolchain issues, we have to be permissive.
Acked by Stefan Pintilie
Add support to LLD for PC Relative Thread Local Storage for Local Dynamic.
This patch adds support for two relocations: R_PPC64_GOT_TLSLD_PCREL34 and
R_PPC64_DTPREL34.
The Local Dynamic code is:
```
pla r3, x@got@tlsld@pcrel R_PPC64_GOT_TLSLD_PCREL34
bl __tls_get_addr@notoc(x@tlsld) R_PPC64_TLSLD
R_PPC64_REL24_NOTOC
...
paddi r9, r3, x@dtprel R_PPC64_DTPREL34
```
After relaxation to Local Exec:
```
paddi r3, r13, 0x1000
nop
...
paddi r9, r3, x@dtprel R_PPC64_DTPREL34
```
Reviewed By: NeHuang, sfertile
Differential Revision: https://reviews.llvm.org/D87504
For a diagnostic `A refers to B` where B refers to a bitcode file, if the
symbol gets optimized out, the user may see `A refers to <internal>`; if the
symbol is retained, the user may see `A refers to lto.tmp`.
Save the reference InputFile * in the DenseMap so that the original filename is
available in reportBackrefs().
The ELF spec says
> If the sh_flags field for this section header includes the attribute SHF_INFO_LINK, then this member represents a section header table index.
Set SHF_INFO_LINK so that binary manipulation tools know that sh_info is
a section header table index instead of (the number of local symbols in the case of SHT_SYMTAB/SHT_DYNSYM).
We have already added SHF_INFO_LINK for --emit-relocs retained SHT_REL[A].
For example, we can teach llvm-objcopy to preserve the section index of the sh_info referenced section if
SHF_INFO_LINK is set. (GNU objcopy recognizes .rel[a].plt and updates
sh_info even if SHF_INFO_LINK is not set).
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D89828
The combination has not been tested before. In the case of ICF,
`e.section->getVA(0)` equals the start address of the output section.
This can cause incorrect overlapping with the actual function at the
start of the output section and potentially trigger a GDB internal error
in `dw2_find_pc_sect_compunit_symtab` (presumably because:
if a short address range incorrectly starts at the start address of the
output section, GDB may pick it instead of the correct longer address
range. When mapping an address within the long address range but
out of the scope of the short address range, the routine may find
nothing - while the code asserts that it can find something).
Note that in the case of ICF there may be duplicate address range entries,
but GDB appears to be fine with them.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D89751
ICF was not able to merge equivalent sections because of relocations to
sections ineligible for ICF that use alternative symbols, e.g. symbol
aliases or section relative relocations.
Merging in this scenario has been enabled by giving the sections that
are ineligible for ICF a unique ID, i.e. an equivalence class of their
own. This approach also provides another benefit as it improves the
hashing that is used to perform the initial equivalance grouping for
ICF. This is because the ICF ineligible sections can now contribute a
unique value towards the hashes instead of the same value of zero. This
has been seen to reduce link time with ICF by ~68% for objects compiled
with -fprofile-instr-generate.
In order to facilitate this use of a unique ID, the existing
inconsistent approach to the setting of the InputSection eqClass in ICF
has been changed so that there is a clear distinction between the
eqClass values of ICF eligible sections and those of the ineligible
sections that have a unique ID. This inconsistency could have caused
incorrect equivalence class equality in the past, although it appears
that no issues were encountered in actual use.
Differential Revision: https://reviews.llvm.org/D88830
In ELF/InputFiles.cpp, getBitcodeMachineKind() is limited to uint8_t return
type. This works as long as EM_xxx is < 256, which is true for common
architectures, but not for some newly assigned or unofficial EM_* values.
The corresponding ELF field (e_machine) can hold uint16_t.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D89185
Similar to D66992.
In GNU ld, a -u specified symbol is a STB_DEFAULT undefined.
It cannot be changed to STB_WEAK by a later STB_WEAK undefined in a regular object file.
The behavior is consistent with our model because -u means "we need to fetch a lazy definition".
It should not be altered just because there is also a STB_WEAK undefined.
Note, our -u semantics are still different from GNU ld (https://github.com/ClangBuiltLinux/linux/issues/515):
we don't force the specified symbol to appear in .symtab This is a deliberate decision.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D88945
Add Thread Local Storage support for the 34 bit relocation R_PPC64_GOT_TLSGD_PCREL34 used in General Dynamic.
The compiler will produce code that looks like:
```
pla r3, x@got@tlsgd@pcrel R_PPC64_GOT_TLSGD_PCREL34
bl __tls_get_addr@notoc(x@tlsgd) R_PPC64_TLSGD
R_PPC64_REL24_NOTOC
```
LLD should be able to correctly compute the relocation for R_PPC64_GOT_TLSGD_PCREL34 as well as do the following two relaxations where possible:
General Dynamic to Local Exec:
```
paddi r3, r13, x@tprel
nop
```
and General Dynamic to Initial Exec:
```
pld r3, x@got@tprel@pcrel
add r3, r3, r13
```
Note:
This patch adds support for the PC Relative (no TOC) version of General Dynamic on top of the existing support for the TOC version of General Dynamic.
The ABI does not provide any way to tell by looking only at the relocation `R_PPC64_TLSGD` when it is being used in a TOC instruction sequence or and when it is being used in a no TOC sequence. The TOC sequence should always be 4 byte aligned. This patch adds one to the offset of the relocation when it is being used in a no TOC sequence. In this way LLD can tell by looking at the alignment of the offset of `R_PPC64_TLSGD` whether or not it is being used as part of a TOC or no TOC sequence.
Reviewed By: NeHuang, sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D87318
Add Thread Local Storage support for the 34 bit relocation R_PPC64_GOT_TLSGD_PCREL34 used in General Dynamic.
The compiler will produce code that looks like:
```
pla r3, x@got@tlsgd@pcrel R_PPC64_GOT_TLSGD_PCREL34
bl __tls_get_addr@notoc(x@tlsgd) R_PPC64_TLSGD
R_PPC64_REL24_NOTOC
```
LLD should be able to correctly compute the relocation for R_PPC64_GOT_TLSGD_PCREL34 as well as do the following two relaxations where possible:
General Dynamic to Local Exec:
```
paddi r3, r13, x@tprel
nop
```
and General Dynamic to Initial Exec:
```
pld r3, x@got@tprel@pcrel
add r3, r3, r13
```
Note:
This patch adds support for the PC Relative (no TOC) version of General Dynamic on top of the existing support for the TOC version of General Dynamic.
The ABI does not provide any way to tell by looking only at the relocation `R_PPC64_TLSGD` when it is being used in a TOC instruction sequence or and when it is being used in a no TOC sequence. The TOC sequence should always be 4 byte aligned. This patch adds one to the offset of the relocation when it is being used in a no TOC sequence. In this way LLD can tell by looking at the alignment of the offset of `R_PPC64_TLSGD` whether or not it is being used as part of a TOC or no TOC sequence.
Reviewed By: NeHuang, sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D87318
The routing rules are:
sym -> __wrap_sym
__real_sym -> sym
__wrap_sym and sym are routing targets, so they need to be exposed to the symbol
table. __real_sym is not and can be eliminated if not used by regular object.
The R2 save stub will now support offsets up to 64 bits.
There are three cases that will be used.
1) The offset fits in 26 bits.
```
b <26 bit offset>
```
2) The offset does not fit in 26 bits but fits in 34 bits.
```
paddi r12, 0, <34 bit offset>, 1
mtctr r12
bctr
```
3) The offset does not fit in 34 bits. Since this is an R2 save stub we can use
the TOC in R2. We are not loading the offset but the actual address we want to
branch to.
```
addis r12, r2, <address in TOC lo>
ld r12 <address in TOC hi>(r12)
mtctr r12
bctr
```
In case 1) the stub is only 8 bytes while in cases 2) and 3) the stub will be
20 bytes.
Reviewed By: MaskRay, sfertile, NeHuang
Differential Revision: https://reviews.llvm.org/D87916
Library users should not need to call errorHandler().reset() explicitly.
google/iree calls lld:🧝:link and without the patch some global
variables are not cleaned up in the next invocation.
".text.split." holds symbols which are split out from functions in
other input sections. For example, with -fsplit-machine-functions,
placing the cold parts in .text.split instead of .text.unlikely mitigates
against poor profile inaccuracy. Techniques such as hugepage remapping can
make conservative decisions at the section granularity.
Differential Revision: https://reviews.llvm.org/D87840
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.
Differential Revision: https://reviews.llvm.org/D70378
Update the thunk range error report for PPC64PCRelLongBranchThunk and add a range
error test case for PPC64R12SetupStub.
Differential Revision: https://reviews.llvm.org/D87381
Add Thread Local Storage Initial Exec support to LLD.
This patch adds the computation for the relocations as well as the relaxation from Initial Exec to Local Exec.
Initial Exec:
```
pld r9, x@got@tprel@pcrel
add r9, r9, x@tls@pcrel
```
or
```
pld r9, x@got@tprel@pcrel
lbzx r10, r9, x@tls@pcrel
```
Note that @tls@pcrel is actually encoded as R_PPC64_TLS with a one byte displacement.
For the above examples relaxing Intitial Exec to Local Exec:
```
paddi r9, r9, x@tprel
nop
```
or
```
paddi r9, r13, x@tprel
lbz r10, 0(r9)
```
Reviewed By: nemanjai, MaskRay, #powerpc
Differential Revision: https://reviews.llvm.org/D86893
Bitcode writer does not flush buffer until the end by default. This is
fine to small bitcode files. When -flto,--plugin-opt=emit-llvm,-gmlt are
used, the final bitcode file is large, for example, >8G. Keeping all
data in memory consumes a lot of memory.
This change allows bitcode writer flush data to disk early when buffered
data size is above some threshold. This is only enabled when lld emits
LLVM bitcode.
One issue to address is backpatching bitcode: subblock length, function
body indexes, meta data indexes need to backfill. If buffer can be
flushed partially, we introduced raw_fd_stream that supports
read/seek/write, and enables backpatching bitcode flushed in disk.
Reviewed-by: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D86905
I have noticed that a 374MiB powerpc64le 'ld.lld' requires 11 passes to link.
There is a ThunkSection (whose parent OutputSection is ".text" of 169MiB) with 12867 thunks.
Optimize the filename glob pattern matching in
LinkerScript::computeInputSections() and LinkerScript::shouldKeep().
Add InputFile::getNameForScript() which gets and if required caches the
Inputfile's name used for linker script matching. This avoids the
overhead of name creation that was in getFilename() in LinkerScript.cpp.
Add InputSectionDescription::matchesFile() and
SectionPattern::excludesFile() which perform the glob pattern matching
for an InputFile and make use of a cache of the previous result. As both
computeInputSections() and shouldKeep() process sections in order and
the sections of the same InputFile are contiguous, these single entry
caches can significantly speed up performance for more complex glob
patterns.
These changes have been seen to reduce link time with --gc-sections by
up to ~40% with linker scripts that contain KEEP filename glob patterns
such as "*crtbegin*.o".
Differential Revision: https://reviews.llvm.org/D87469
Add Thread Local Storage Local Exec support to LLD. This is to support PC Relative addressing of Local Exec.
The patch teaches LLD to handle:
```
paddi r9, r13, x1@tprel
```
The relocation is:
```
R_PPC_TPREL34
```
Reviewed By: NeHuang, MaskRay
Differential Revision: https://reviews.llvm.org/D86608
`ELFFile<ELFT>` has many methods that take pointers,
though they assume that arguments are never null and
hence could take references instead.
This patch performs such clean-up.
Differential revision: https://reviews.llvm.org/D87385
Prefer `errorOrWarn` to `fatal` for recoverable errors and graceful degradation
when --noinhibit-exec is specified.
Mention the destination symbol, otherwise the diagnostic is not really actionable.
Two errors are not tested but the patch does not intend to add the coverage.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D87486
MarkLive::scanEhFrameSection is used to retain personality/LSDA
functions when --gc-sections is enabled.
Improve its performance by only iterating over the .eh_frame relocations
that need to be resolved for an EhSectionPiece. This optimization makes
the same assumption as elsewhere in LLD that the .eh_frame relocations
are sorted by r_offset.
This appears to be a performance regression introduced in commit
e6c24299d2 (https://reviews.llvm.org/D59800).
This change has been seen to reduce link time by up to ~50%.
Differential Revision: https://reviews.llvm.org/D87245
Currently we treat SHT_RISCV_ATTRIBUTES like a normal section and
concatenate all such input sections, yielding invalid output unless only
a single attributes section is present in the input. Instead, pick the
first as with SHT_ARM_ATTRIBUTES. We do not currently need to condition
our behaviour on the contents, unlike Arm. In future, we should both do
stricter validation of the input and merge all sections together to
ensure we have, for example, the full arch string requirement, but this
rudimentary implementation is good enough for most common cases.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D86309
In this patch, a pc-rel based long branch thunk is added for the local
call protocol that caller and callee does not use TOC.
Reviewed By: sfertile, nemanjai
Differential Revision: https://reviews.llvm.org/D86706
The function `__tls_get_addr` is used to get the address of an object that is Thread Local Storage.
It needs to have two relocations on it.
One relocation is for the function call itself and it is either R_PPC64_REL24 or R_PPC64_REL24_NOTOC.
The other is R_PPC64_TLSGD or R_PPC64_TLSLD for the symbol that is having its address computed.
In the early days of the transition from the ELFv1 ABI that is used for big endian PowerPC Linux distributions to the ELFv2 ABI that is used for little endian PowerPC Linux distributions, there was some ambiguity in the specification of the relocations for TLS. The GNU linker has implemented support for correct handling of calls to __tls_get_addr with a missing relocation. Unfortunately, we didn't notice that the IBM XL compiler did not handle TLS according to the updated ABI until we tried linking XL compiled libraries with LLD. As a result, there is a lot of code out there in various libraries compiled with XL that have this problem.
This patch adds a new error check in LLD that makes sure calls to `__tls_get_addr` are not missing the TLSGD/TLSLD relocation.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D85994
PR46970: for `alias = aliasee`, the alias can be used in relocation processing
and on ARM st_type does affect Thumb interworking. It is thus desirable for the
alias to get the same st_type.
Note that the st_size field should not be inherited because some tools use
st_size=0 as a heuristic to detect aliases. Retaining st_size can thwart such
heuristics and cause aliases to be preferred over the original symbols.
Differential Revision: https://reviews.llvm.org/D86263
* GNU ld places non-SHF_ALLOC sections after SHF_ALLOC sections. This has the
advantage that the file offsets of a non-SHF_ALLOC cannot be contained in
a PT_LOAD. This patch matches the behavior.
* For non-SHF_ALLOC non-orphan sections, GNU ld may assign non-zero sh_addr and
treat them similar to SHT_NOBITS (not advance location counter). This
is an alternative approach to what we have done in D85100.
By placing non-SHF_ALLOC sections at the end, we can drop special
cases in createSection and findOrphanPos added by D85100.
Different from GNU ld, we set sh_addr to 0 for non-SHF_ALLOC sections. 0
arguably is better because non-SHF_ALLOC sections don't appear in the memory
image.
ELF spec says:
> sh_addr - If the section will appear in the memory image of a process, this
> member gives the address at which the section's first byte should
> reside. Otherwise, the member contains 0.
D85100 appeared to take a detour. If we take a combined view on D85100 and this
patch, the overall complexity slightly increases (one more 3-line loop) and
compatibility with GNU ld improves.
The behavior we don't want to match is the special treatment of .symtab
.shstrtab .strtab: they can be matched in LLD but not in GNU ld.
Reviewed By: jhenderson, psmith
Differential Revision: https://reviews.llvm.org/D85867
LLD currently does not allow non-contiguous SHF_LINK_ORDER components in an
output section. This makes it infeasible to add SHF_LINK_ORDER to an existing
metadata section if backward compatibility with older object files are
concerned.
We did not allow mixed components (like GNU ld) and D77007 relaxed to allow
non-contiguous SHF_LINK_ORDER components. This patch allows arbitrary mix, with
sorting performed within an InputSectionDescription. For example,
`.rodata : {*(.rodata.foo) *(.rodata.bar)}`, has two InputSectionDescription's.
If there is at least one SHF_LINK_ORDER and at least one non-SHF_LINK_ORDER in
.rodata.foo, they are ordered within `*(.rodata.foo)`: we arbitrarily place
SHF_LINK_ORDER components before non-SHF_LINK_ORDER components (like Solaris ld).
`*(.rodata.bar)` is ordered similarly, but the two InputSectionDescription's
don't interact. It can be argued that this is more reasonable than the previous
behavior where written order was not respected.
It would be nice if the two different semantics (ordering requirement & garbage
collection) were not overloaded on one section flag, however, it is probably
difficult to obtain a generic flag at this point
(https://groups.google.com/forum/#!topic/generic-abi/hgx_m1aXqUo
"SHF_LINK_ORDER's original semantics make upgrade difficult").
(Actually, without the GC semantics, SHF_LINK_ORDER would still have the
sh_link!=0 & sh_link=0 issue. It is just that people find the GC semantics more
useful and tend to use the feature more often.)
GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=16833
Differential Revision: https://reviews.llvm.org/D84001
This patch implements the handling for the R_PPC64_PCREL_OPT relocation as well
as the GOT relocation for the associated R_PPC64_GOT_PCREL34 relocation.
On Power10 targets with PC-Relative addressing, the linker can relax
GOT-relative accesses to PC-Relative under some conditions. Since the sequence
consists of a prefixed load, followed by a non-prefixed access (load or store),
the linker needs to replace the first instruction (as the replacement
instruction will be prefixed). The compiler communicates to the linker that
this optimization is safe by placing the two aforementioned relocations on the
GOT load (of the address).
The linker then does two things:
- Convert the load from the got into a PC-Relative add to compute the address
relative to the PC
- Find the instruction referred to by the second relocation (R_PPC64_PCREL_OPT)
and replace the first with the PC-Relative version of it
It is important to synchronize the mapping from legacy memory instructions to
their PC-Relative form. Hence, this patch adds a file to be included by both
the compiler and the linker so they're always in agreement.
Differential revision: https://reviews.llvm.org/D84360
Thunk alignment is added in thie patch when using pc-rel instructions
to avoid crossing the 64 byte boundary.
Patched by: nemanjai, NeHuang
Reviewed By: sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D85973
-gdwarf-5 -fdebug-types-section may produce multiple .debug_info sections. All
except one are type units (.debug_types before DWARF v5). When constructing
.gdb_index, we should ignore these type units. We use a simple heuristic: the
compile unit does not have the SHF_GROUP flag. (This needs to be revisited if
people place compile unit .debug_info in COMDAT groups.)
This issue manifests as a data race: because an object file may have multiple
.debug_info sections, we may concurrently construct `LLDDwarfObj` for the same
file in multiple threads. The threads may access `InputSectionBase::data()`
concurrently on the same input section. `InputSectionBase::data()` does a lazy
uncompress() and rewrites the member variable `rawData`. A thread running zlib
`inflate()` (transitively called by uncompress()) on a buffer with `rawData`
tampered by another thread may fail with `uncompress failed: zlib error: Z_DATA_ERROR`.
Even if no data race occurred in an optimistic run, if there are N .debug_info,
one CU entry and its address ranges will be replicated N times. The result
.gdb_index can be much larger than a correct one.
The new test gdb-index-dwarf5-type-unit.s actually has two compile units. This
cannot be produced with regular approaches (it can be produced with -r
--unique). This is used to demonstrate that the .gdb_index construction code
only considers the last non-SHF_GROUP .debug_info
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D85579
* For .cfi_*, GCC/GNU as emits SHT_PROGBITS type .eh_frame sections.
* Since rL252300, clang emits SHT_X86_64_UNWIND type .eh_frame sections
(originated from Solaris, documented in the x86-64 psABI).
* Some assembly use `.section .eh_frame,"a",@unwind` to generate
SHT_X86_64_UNWIND .eh_frame sections.
In a non-relocatable link, input .eh_frame are combined and there is
only one SyntheticSection .eh_frame in the output section, so the
"section type mismatch" diagnostic does not fire.
In a relocatable link, there is no SyntheticSection .eh_frame. .eh_frame of
mixed types can trigger the diagnostic. This patch fixes it by adding another
special case 0x70000001 (= SHT_X86_64_UNWIND) to canMergeToProgbits().
ld.lld -r gcc.o clang.o => error: section type mismatch for .eh_frame
There was a discussion "RFC: Usefulness of SHT_X86_64_UNWIND" on the x86-64-abi
mailing list. Folks are not wild about making the psABI value 0x70000001 into
gABI, but a few think defining 0x70000001 for .eh_frame may be a good idea for a
new architecture.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D85785
For an InputSection, the `buf` argument of `InputSectionBase::relocate` points
to the content of the containing OutputSection, instead of the content of the
InputSection itself, so `outSecOff` needs to be added in its callees. This is
counter-intuitive and leads to many `- outSecOff` and `+ outSecOff`.
This patch makes `InputSection::writeTo` call `InputSectionBase::relocate` with
`outSecOff` added. relocAlloc/relocNonAlloc/relocateNonAllocForRelocatable can
thus be simplified now.
Updated test:
* non-abs-reloc.s: A minor offset bug is fixed for a diagnostic in `relocateNonAlloc`
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D85618
glibc/sysdeps/unix/sysv/linux/x86_64/sigaction.c libc.a(sigaction.o) has a CIE
with the augmentation string "zRS". Support 'S' to allow --icf={safe,all}.
Compatibility checks for PPC64PltCallStub and PPC64PCRelPLTStub are
added in this patch to prevent the usage of incompatible thunk/stub.
Reviewed By: sfertile, nemanjai, stefanp
Differential Revision: https://reviews.llvm.org/D85459
tl;dr See D81784 for the 'tombstone value' concept. This patch changes our behavior to be almost the same as GNU ld (except that we also use 1 for .debug_loc):
* .debug_ranges & .debug_loc: 1 (LLD<11: 0+addend; GNU ld uses 1 for .debug_ranges)
* .debug_*: 0 (LLD<11: 0+addend; GNU ld uses 0; future LLD: -1)
We make the tweaks because:
1) The new tombstone is novel and needs more time to be adopted by consumers before it's the default.
2) The old (gold) strategy had problems with zero-length functions - so rather than going back that, we're going to the GNU ld strategy which doesn't have that problem.
3) One slight tweak to (2) is to apply the .debug_ranges workaround to .debug_loc for the same reasons it applies to debug_ranges - to avoid terminating lists early.
-----
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143482.html
The tombstone value -1 in .debug_line caused problems to lldb (fixed by D83957;
will be included in 11.0.0) and breakpad (fixed by
https://crrev.com/c/2321300). It may potentially affects other DWARF consumers.
For .debug_ranges & .debug_loc: 1, an argument preferring 1 (GNU ld for .debug_ranges) over -2 is that:
```
{-1, -2} <<< base address selection entry
{0, length} <<< address range
```
may create a situation where low_pc is greater than high_pc. So we use
1, the GNU ld behavior for .debug_ranges
For other .debug_* sections, there haven't been many reports. One issue is that
bloaty (src/dwarf.cc) can incorrectly count address ranges in .debug_ranges . To
reduce similar disruption, this patch changes the tombstone values to be similar to GNU ld.
This does mean another behavior change to the default trunk behavior. Sorry
about it. The default trunk behavior will be similar to release/11.x while we work on a transition plan for LLD users.
Reviewed By: dblaikie, echristo
Differential Revision: https://reviews.llvm.org/D84825
GNU ld allows sections after a non-SHF_ALLOC section to be covered by PT_LOAD
(PR37607) and assigns addresses to non-SHF_ALLOC output sections (similar to
SHF_ALLOC NOBITS sections. The location counter is not advanced).
This patch tries to fix PR37607 (remove a special case in
`Writer<ELFT>::createPhdrs`). To make the created PT_LOAD meaningful, we cannot
reset dot to 0 for a middle non-SHF_ALLOC output section. This results in
removal of two special cases in LinkerScript::assignOffsets. Non-SHF_ALLOC
non-orphan sections can have non-zero addresses like in GNU ld.
The zero address rule for non-SHF_ALLOC sections is weakened to apply to orphan
only. This results in a special case in createSection and findOrphanPos, respectively.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85100
Part of https://bugs.llvm.org/show_bug.cgi?id=41734
The semantics of SHF_LINK_ORDER have been extended to represent metadata
sections associated with some other sections (usually text).
The associated text section may be discarded (e.g. LTO) and we want the
metadata section to have sh_link=0 (D72899, D76802).
Normally the metadata section is only referenced by the associated text
section. sh_link=0 means the associated text section is discarded, and
the metadata section will be garbage collected. If there is another
section (.gc_root) referencing the metadata section, the metadata
section will be retained. It's the .gc_root consumer's job to validate
the metadata sections.
# This creates a SHF_LINK_ORDER .meta with sh_link=0
.section .meta,"awo",@progbits,0
1:
.section .meta,"awo",@progbits,foo
2:
.section .gc_root,"a",@progbits
.quad 1b
.quad 2b
Reviewed By: pcc, jhenderson
Differential Revision: https://reviews.llvm.org/D72904
GNU ld allows sections after a non-SHF_ALLOC section to be covered by PT_LOAD
(PR37607) and assigns addresses to non-SHF_ALLOC output sections (similar to
SHF_ALLOC NOBITS sections. The location counter is not advanced).
This patch tries to fix PR37607 (remove a special case in
`Writer<ELFT>::createPhdrs`). To make the created PT_LOAD meaningful, we cannot
reset dot to 0 for a middle non-SHF_ALLOC output section. This results in
removal of two special cases in LinkerScript::assignOffsets. Non-SHF_ALLOC
non-orphan sections can have non-zero addresses like in GNU ld.
The zero address rule for non-SHF_ALLOC sections is weakened to apply to orphan
only. This results in a special case in createSection and findOrphanPos, respectively.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85100
Fix PR36272 and PR46835
A .eh_frame FDE references a text section and (optionally) a LSDA (in
.gcc_except_table). Even if two text sections have identical content and
relocations (e.g. a() and b()), we cannot fold them if their LSDA are different.
```
void foo();
void a() {
try { foo(); } catch (int) { }
}
void b() {
try { foo(); } catch (float) { }
}
```
Scan .eh_frame pieces with LSDA and disallow referenced text sections to be
folded. If two .gcc_except_table have identical semantics (usually identical
content with PC-relative encoding), we will lose folding opportunity.
For ClickHouse (an exception-heavy application), this can reduce --icf=all efficiency
from 9% to 5%. There may be some percentage we can reclaim without affecting
correctness, if we analyze .eh_frame and .gcc_except_table sections.
gold 2.24 implemented a more complex fix (resolution to
https://sourceware.org/bugzilla/show_bug.cgi?id=21066) which combines the
checksum of .eh_frame CIE/FDE pieces.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D84610
--oformat=binary is rare (used in a few places in FreeBSD, see `stand/i386/mbr/Makefile` `LDFLAGS_BIN`)
The result should be identical to a normal output transformed by `objcopy -O binary`.
The current implementation ignores addresses and lays out sections by
respecting output section alignments. It can fail when an output section
address is specified, e.g. `.rodata ALIGN(16) :` (PR33651).
Fix PR33651 by respecting LMA. The code is similar to
`tools/llvm-objcop/ELF/Object.cpp` BinaryWriter::finalize after D71035 and D79229.
Unforunately for an output section without PT_LOAD, we assume its LMA is equal
to its VMA. So the result is still incorrect when an output section LMA
(`AT(...)`) is specified
Also drop `alignTo(off, config->wordsize)`. GNU ld does not round up the file size.
Differential Revision: https://reviews.llvm.org/D85086
Clang and GCC have a feature (-MD flag) to create a dependency file
in a format that build systems such as Make or Ninja can read, which
specifies all the additional inputs such .h files.
This change introduces the same functionality to lld bringing it to
feature parity with ld and gold which gained this feature recently.
See https://sourceware.org/bugzilla/show_bug.cgi?id=22843 for more
details and discussion.
The implementation corresponds to -MD -MP compiler flag where the
generated dependency file also includes phony targets which works
around the errors where the dependency is removed. This matches the
format used by ld and gold.
Fixes PR42806
Differential Revision: https://reviews.llvm.org/D82437
D68049 created options for basic block sections: -fbasic-block-sections=,
-funique-basic-block-section-names. Rename options in llc and lld (--lto-)
to be consistent. Specifically,
+ Rename basicblock-sections to basic-block-sections
+ Rename unique-bb-section-names to unique-basic-block-section-names
Differential Revision: https://reviews.llvm.org/D84462
Clang and GCC have a feature (-MD flag) to create a dependency file
in a format that build systems such as Make or Ninja can read, which
specifies all the additional inputs such .h files.
This change introduces the same functionality to lld bringing it to
feature parity with ld and gold which gained this feature recently.
See https://sourceware.org/bugzilla/show_bug.cgi?id=22843 for more
details and discussion.
The implementation corresponds to -MD -MP compiler flag where the
generated dependency file also includes phony targets which works
around the errors where the dependency is removed. This matches the
format used by ld and gold.
Fixes PR42806
Differential Revision: https://reviews.llvm.org/D82437
This patch supports the situation where caller does not have a valid TOC and
calls using the R_PPC64_REL24_NOTOC relocation and the callee is not DSO local.
In this case the call cannot be made directly since the callee may or may not
require a valid TOC pointer. As a result this situation require a PC-relative
plt stub to set up r12.
Reviewed By: sfertile, MaskRay, stefanp
Differential Revision: https://reviews.llvm.org/D83669
This patch adds support for the LOG2CEIL builtin function in linker scripts: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#index-LOG2CEIL_0028exp_0029
As documented for LD, and to keep compatibility, LOG2CEIL(0) returns 0 (not -inf).
The test vectors are somewhat arbitrary. We check minimum values (0-4); middle values (2^32, and 2^32+1); and the maximum value (2^64-1).
The checks for LOG2CEIL explicitly use full 64-bit values (16 hex digits). This is needed to properly verify that -inf and other interesting results aren't returned. (For some reason, all other tests in operators.test use only 14 digits.)
Differential revision: https://reviews.llvm.org/D84054
-r --gc-sections is usually not useful because it just makes intermediate output
smaller. https://bugs.llvm.org/show_bug.cgi?id=46700#c7 mentions a use case:
validating the absence of undefined symbols ealier than in the final link.
After D84129 (SHT_GROUP support in -r links), we can support -r
--gc-sections without extra code. So let's allow it.
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D84131
* If two group members are combined, we should leave just one index in the SHT_GROUP content.
* If a group member is discarded (/DISCARD/ or upcoming -r --gc-sections combination),
we should drop its index in the SHT_GROUP content. LLD currently crashes (`getOutputSection()` is null).
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D84129
The PC Relative code now allows for calls that are marked with the relocation
R_PPC64_REL24_NOTOC. This indicates that the caller does not have a valid TOC
pointer in R2 and does not require R2 to be restored after the call.
This patch is added to support local calls to callees that require a TOC
Reviewed By: sfertile, MaskRay, nemanjai, stefanp
Differential Revision: https://reviews.llvm.org/D83504
The `intrinsics_gen` target exists in the CMake exports since r309389
(see LLVMConfig.cmake.in), hence projects can depend on `intrinsics_gen`
even it they are built separately from LLVM.
Reviewed By: MaskRay, JDevlieghere
Differential Revision: https://reviews.llvm.org/D83454
After D69985, symbols for "-init" and "-fini" were unconditionally
marked as used even if they were just lazy symbols seen when scanning
archives. That resulted in exposing them in the symbol table of an
output file, as Undefined, which added unwanted dependencies. The patch
fixes the issue by checking the kind of the symbols before the marking.
Differential Revision: https://reviews.llvm.org/D83549
It allows handling cases when we have SHT_REL[A] sections before target
sections in objects.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46632
which says: "Normally it is not what compilers would emit. We have to support it,
because some custom tools might want to use this feature, which is not restricted by ELF gABI"
Differential revision: https://reviews.llvm.org/D83469
Implements the missing relocation types for AVR target.
The results have been cross-checked with binutils.
Original patch by LemonBoy. Some changes by me.
Differential Revision: https://reviews.llvm.org/D78741
The PC Relative code allows for calls that are marked with the relocation
R_PPC64_REL24_NOTOC. This indicates that the caller does not have a valid TOC
pointer in R2 and does not require R2 to be restored after the call.
This patch is added to support local calls to callees tha also do not have a TOC.
Reviewed By: sfertile, MaskRay, stefanp
Differential Revision: https://reviews.llvm.org/D82816
The R_PPC64_REL24 is used in function calls when the caller requires a
valid TOC pointer. If the callee shares the same TOC or does not clobber
the TOC pointer then a direct call can be made. If the callee does not
share the TOC a thunk must be added to save the TOC pointer for the caller.
Up until PC Relative was introduced all local calls on medium and large code
models were assumed to share a TOC. This is no longer the case because
if the caller requires a TOC and the callee is PC Relative then the callee
can clobber the TOC even if it is in the same DSO.
This patch is to add support for a TOC caller calling a PC Relative callee that
clobbers the TOC.
Reviewed By: sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D82950
The patch adds checking for various potential issues in parsing name
lookup tables and reporting them as recoverable errors, similarly as we
do for other tables.
Differential Revision: https://reviews.llvm.org/D83050
The parsing method did not check reading errors and might easily fall
into an infinite loop on an invalid input because of that.
Differential Revision: https://reviews.llvm.org/D83049
In the absence of TLS relaxation (rewrite of code sequences),
there is still an applicable optimization:
[gd]: General Dynamic: resolve DTPMOD to 1 and/or resolve DTPOFF statically
All the other relaxations are only performed when transiting to
executable (`!config->shared`).
Since [gd] is handled differently, we can fold `!config->shared` into canRelax
and simplify its use sites. Rename the variable to reflect to new semantics.
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D83243
... to customize the tombstone value we use for an absolute relocation
referencing a discarded symbol. This can be used as a workaround when
some debug processing tool has trouble with current -1 tombstone value
(https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c11 )
For example, to get the current built-in rules (not considering the .debug_line special case for ICF):
```
-z dead-reloc-in-nonalloc='.debug_*=0xffffffffffffffff'
-z dead-reloc-in-nonalloc=.debug_loc=0xfffffffffffffffe
-z dead-reloc-in-nonalloc=.debug_ranges=0xfffffffffffffffe
```
To get GNU ld (as of binutils 2.35)'s behavior:
```
-z dead-reloc-in-nonalloc='*=0'
-z dead-reloc-in-nonalloc=.debug_ranges=1
```
This option has other use cases. For example, if we want to check
whether a non-SHF_ALLOC section has dead relocations.
With this patch, we can run a regular LLD and run another with a special
-z dead-reloc-in-nonalloc=, then compare their output.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D83264
In GNU ld, --no-relax can disable x86-64 GOTPCRELX relaxation.
It is not useful, so we don't implement it.
For RISC-V, --no-relax disables linker relaxations which have larger
impact.
Linux kernel specifies --no-relax when CONFIG_DYNAMIC_FTRACE is specified
(since http://git.kernel.org/linus/a1d2a6b4cee858a2f27eebce731fbf1dfd72cb4e ).
LLD has not implemented the relaxations, so this option is a no-op.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D81359
The Symbol Table in LLD references the global object to add a symbol rather than adding it to itself.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D83184
Follow-up to D82899. Note, we need to disable R_DTPREL relaxation
because ARM psABI does not define TLS relaxation.
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D83138
The location of a TLS variable is encoded as a DW_OP_const4u/DW_OP_const8u
followed by a DW_OP_push_tls_address (or DW_OP_GNU_push_tls_address https://sourceware.org/bugzilla/show_bug.cgi?id=11616 ).
This change follows up to D81784 and makes relocations types generalized as
R_DTPREL (e.g. R_X86_64_DTPOFF{32,64}, R_PPC64_DTPREL64) use -1 as the
tombstone value as well. This works for both TLS Variant I and Variant II
architectures.
* arm: .long tls(tlsldo) # not working currently (R_ARM_TLS_LDO32 is R_ABS)
* mips64: .dtpreldword tls+32768
* ppc64: .quad tls@DTPREL+0x8000
* riscv: neither GCC nor clang has implemented DW_AT_location. It is likely .long/.quad tls@dtprel+0x800
* x86-32: .long tls@DTPOFF
* x86-64: .long tls@DTPOFF; .quad tls@DTPOFF
tls has a non-negative st_value, so such relocations (st_value+addend)
never resolve to -1 in a normal (not discarded) case.
```
// clang -fuse-ld=lld -g -ffunction-sections a.c -Wl,--gc-sections
// foo and tls will be discarded by --gc-sections.
// DW_AT_location [DW_FORM_exprloc] (DW_OP_const8u 0xffffffffffffffff, DW_OP_GNU_push_tls_address)
thread_local int tls;
int foo() { return ++tls; }
int main() {}
```
Also, drop logic added in D26201 intended to address PR30793. It added a test
(gc-debuginfo-tls.s) using a non-SHF_ALLOC section and a local symbol, which
does not reflect the intended scenario: a relocation in a SHF_ALLOC section
referencing a discarded non-local symbol. For such a non .debug_* section, just
emit an error.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82899
After D81784, we resolve a relocation in .debug_* referencing an ICF folded
section symbol to a tombstone value.
Doing this for .debug_line has a problem (https://reviews.llvm.org/D81784#2116925 ):
.debug_line may describe folded lines as having addresses UINT64_MAX or
some wraparound small addresses.
```
int foo(int x) {
return x; // line 2
}
int bar(int x) {
return x; // line 6
}
```
```
Address Line Column File ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x00000000002016c0 1 0 1 0 0 is_stmt
0x00000000002016c7 2 9 1 0 0 is_stmt
prologue_end
0x00000000002016ca 2 2 1 0 0
0x00000000002016cc 2 2 1 0 0 end_sequence
// UINT64_MAX and wraparound small addresses
0xffffffffffffffff 5 0 1 0 0 is_stmt
0x0000000000000006 6 9 1 0 0 is_stmt
prologue_end
0x0000000000000009 6 2 1 0 0
0x000000000000000b 6 2 1 0 0 end_sequence
0x00000000002016d0 9 0 1 0 0 is_stmt
0x00000000002016df 10 6 1 0 0 is_stmt prologue_end
0x00000000002016e6 11 11 1 0 0 is_stmt
...
```
These entries can confuse debuggers:
gdb before 2020-07-01 (binutils-gdb a8caed5d7faa639a1e6769eba551d15d8ddd9510 "Recognize -1 as a tombstone value in .debug_line")
(can't continue due to a breakpoint in an invalid region of memory):
```
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x6
```
lldb (breakpoint has no effect):
```
(lldb) b 6
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
```
This patch special cases .debug_line to not use the tombstone value,
restoring the previous behavior: .debug_line will have entries with the
same addresses (ICF) but different line numbers. A breakpoint on line 2
or 6 will trigger on both functions.
Reviewed By: dblaikie, jhenderson
Differential Revision: https://reviews.llvm.org/D82828
D79300 forgot to change `getBuffer().empty()` in LazyObjFile::parse to
`fetched`. This caused incorrect iterating after the current LazyObjFile was
fetched. This issue is benign and can just cause loss of "undefined symbols"
and "backward reference" diagnostics.
Before D79300 `mb = {}` caused --warn-backrefs-exclude to be useless for
a fetched LazyObjFile.
Add two test cases.
Fixes PR46420
Similar to D43307 for non-LTO.
Module-level inline assembly can use .symver to create a symbol with `@` in the name.
For relocatable output, @ should be retained in the symbol name. `@ver` should
not be parsed and dropped.
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D82433
Add support for the 34bit relocation R_PPC64_GOT_PCREL34 for
PC Relative in LLD.
Reviewers: sfertile, MaskRay
Differential Revision: https://reviews.llvm.org/D81948
This is the followup to D77647 which implements handling for the new
R_AARCH64_PLT32 relocation type in lld. This relocation would benefit the
PIC-friendly vtables feature described in D72959.
Differential Revision: https://reviews.llvm.org/D81184
See D59553, https://lists.llvm.org/pipermail/llvm-dev/2020-May/141885.html and
https://sourceware.org/pipermail/binutils/2020-May/111357.html for
extensive discussions on a tombstone value.
See http://www.dwarfstd.org/ShowIssue.php?issue=200609.1
(Reserve an address value for "not present") for a DWARF enhancement proposal.
We resolve such relocations to a tombstone value to indicate that the address is invalid.
This solves several problems (the normal behavior is to resolve the relocation to the addend):
* For an empty function in a collected section, a pair of (0,0) can
terminate .debug_loc and .debug_ranges (as of binutils 2.34, GNU ld
resolves such a relocation to 1 to avoid the .debug_ranges issue)
* If DW_AT_high_pc is sufficiently large, the address range can collide
with a regular code range of low address (https://bugs.llvm.org/show_bug.cgi?id=41124 )
* If a text section is folded into another by ICF, we may leave entries
in multiple CUs claiming ownership of the same range of code, which can
confuse consumers.
* Debug information associated with COMDAT sections can have problems
similar to ICF, but is more complex - thus not addressed by this patch.
For pre-DWARF-v5 .debug_loc and .debug_ranges, a pair of 0 can terminate
entries (invalidating subsequent ranges).
-1 is a reserved value with special meaning (base address selection entry) which can't be used either.
Use -2 instead.
For all other .debug_*, use UINT32_MAX for 32-bit targets and UINT64_MAX
for 64-bit targets. In the code, we intentionally use
`uint64_t tombstone = UINT64_MAX` for 32-bit targets as well: this matches
SignExtend64 as used in `relocateAlloc`. (Actually UINT32_MAX does not work for R_386_32)
Note 0, we only special case `target->symbolicRel` (R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64), not
short-range absolute relocations (e.g. R_X86_64_32). Only forms like DW_FORM_addr need to be special cased.
They can hold an arbitrary address (must be 64-bit on a 64-bit target). (In theory,
producers can make use of small code model to emit 32-bit relocations. This doesn't seem to be leveraged.)
Note 1, we have to ignore the addend, because we don't want to resolve
DW_AT_low_pc (which may have a non-zero addend) to -1+addend (wrap
around to a low address):
__attribute__((section(".text.x"))) void f1() { }
__attribute__((section(".text.x"))) void f2() { } // DW_AT_low_pc has a non-zero addend
Note 2, if the prevailing copy does not have debugging information while
a non-prevailing copy has (partial debug build), we don't do extra work
to attach debugging information to the prevailing definition. (clang
has a lot of debug info optimizations that are on-by-default that assume
the whole program is built with debug info).
clang -c -ffunction-sections a.cc # prevailing copy has no debug info
clang -c -ffunction-sections -g b.cc
Reviewed By: dblaikie, avl, jhenderson
Differential Revision: https://reviews.llvm.org/D81784
If neither AT(lma) nor AT>lma_region is specified,
D76995 keeps `lmaOffset` (LMA - VMA) if the previous section is in the
default LMA region.
This patch additionally checks that the two sections are in the same
memory region.
Add a test case derived from https://bugs.llvm.org/show_bug.cgi?id=45313
.mdata : AT(0xfb01000) { *(.data); } > TCM
// It is odd to make .bss inherit lmaOffset, because the two sections
// are in different memory regions.
.bss : { *(.bss) } > DDR
With this patch, section VMA/LMA match GNU ld. Note, GNU ld supports
out-of-order (w.r.t sh_offset) sections and places .text and .bss in the
same PT_LOAD. We don't have that behavior.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D81986
Fixes PR46348.
ObjFile<ELFT>::initializeSymbols contains two symbol iteration loops:
```
for each symbol
if non-inheriting && non-local
fill in this->symbols[i]
for each symbol
if local
fill in this->symbols[i]
else
symbol resolution
```
Symbol resolution can trigger a duplicate symbol error which will call
InputSectionBase::getObjMsg to iterate over InputFile::symbols. If a
non-local symbol appears after the non-local symbol being resolved
(violating ELF spec), its `this->symbols[i]` entry has not been filled
in, InputSectionBase::getObjMsg will crash due to
`dyn_cast<Defined>(nullptr)`.
To fix the bug, reorganize the two loops to ensure this->symbols is
complete before symbol resolution. This enforces the invariant:
InputFile::symbols has none null entry when InputFile::getSymbols() is called.
```
for each symbol
if non-inheriting
fill in this->symbols[i]
for each symbol starting from firstGlobal
if non-local
symbol resolution
```
Additionally, move the (non-local symbol in local part of .symtab)
diagnostic from Writer<ELFT>::copyLocalSymbols() to initializeSymbols().
Reviewed By: grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D81988
A hasWildcard pattern iterates over symVector, which can be slow when there
are many --export-dynamic-symbol. In optimistic cases, most patterns don't use
a wildcard character. hasWildcard: false can avoid a symbol table iteration.
While here, add two tests using `[` and `?`, respectively.
This change introduces an LLD switch --thinlto-single-module to allow compiling only a part of the input modules. This is specifically enables:
1. Fast investigating/debugging modules of interest without spending time on compiling unrelated modules.
2. Compiler debug dump with -mllvm -debug-only= for specific modules.
It will be useful for large applications which has 1K+ input modules for thinLTO.
The switch can be combined with `--lto-obj-path=` or `--lto-emit-asm` to obtain intermediate object files or assembly files. So far the module name matching is implemented as a fuzzy name lookup where the modules with name containing the switch value are compiled.
E.g,
Command:
ld.lld main.o thin.a --thinlto-single-module=thin.a --lto-obj-path=single.o
log:
[ThinLTO] Selecting thin.a(thin1.o at 168) to compile
[ThinLTO] Selecting thin.a(thin2.o at 228) to compile
Command:
ld.lld main.o thin.a --thinlto-single-module=thin1.o --lto-obj-path=single.o
log:
[ThinLTO] Selecting thin.a(thin1.o at 168) to compile
Differential Revision: https://reviews.llvm.org/D80406
After D79300, we don't rewrite InputFile::mb to an empty buffer.
In thinLTOCreateEmptyIndexFiles(), we should check LazyObjFile::fetched
as well as checking whether mb is a bitcode, otherwise we would overwrite (path + .thinlto.bc) with an empty index.
Fixes PR45594.
In `ObjFile<ELFT>::initializeSymbols()`, for a defined symbol relative to
a discarded section (due to section group rules), it may have been
inserted as a lazy symbol. We need to demote it to an Undefined to
enable the `discarded section` error happened in a later pass.
Add `LazyObjFile::fetched` (if true) and `ArchiveFile::parsed` (if
false) to represent that there is an ongoing lazy symbol fetch and we
should replace the current lazy symbol with an Undefined, instead of
calling `Symbol::resolve` (`Symbol::resolve` should be called if the lazy
symbol was added by an unrelated archive/lazy object).
As a side result, one small issue in start-lib-comdat.s is now fixed.
The hack motivating D51892 will be unsupported: if
`.gnu.linkonce.t.__i686.get_pc_thunk.bx` in an archive is referenced
by another section, this will likely be errored unless the function is
also defined in a regular object file.
(Bringing back rL330869 would error `undefined symbol` instead of the
more relevant `discarded section`.)
Note, glibc i386's crti.o still works (PR31215), because
`.gnu.linkonce.t.__x86.get_pc_thunk.bx` is in crti.o (one of the first
regular object files in a linker command line).
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D79300
If both a.a and b.so define foo
```
ld.bfd -u foo a.a b.so # foo is defined
ld.bfd a.a b.so -u foo # foo is defined
ld.bfd -u foo b.so a.a # foo is undefined (provided at runtime by b.so)
ld.bfd b.so a.a -u foo # foo is undefined (provided at runtime by b.so)
```
In all cases we make foo undefined in the output. I tend to think the
GNU ld behavior makes more sense.
* In their model, they have to treat -u as a fake object file with an
undefined symbol before all input files, otherwise the first archive would not be fetched.
* Following their behavior allows us to drop a --warn-backrefs special case.
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D81052
--no-allow-shlib-undefined (enabled by default when linking an
executable) rejects unresolved references in shared objects.
Users may be confused by the common diagnostics of unresolved symbols in
object files (LLD: "undefined symbol: foo"; GNU ld/gold: "undefined reference to")
Learn from GCC/clang " [-Wfoo]": append the option name to the
diagnostics. Users can find relevant information by searching
"--no-allow-shlib-undefined". It should also be obvious to them that
the positive form --allow-shlib-undefined can suppress the error.
Also downgrade the error to a warning if --noinhibit-exec is used (compatible
with GNU ld and gold).
Reviewed By: grimar, psmith
Differential Revision: https://reviews.llvm.org/D81028
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.
+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.
Differential Revision: https://reviews.llvm.org/D68049
GNU ld from binutils 2.35 onwards will likely support
--export-dynamic-symbol but with different semantics.
https://sourceware.org/pipermail/binutils/2020-May/111302.html
Differences:
1. -export-dynamic-symbol is not supported
2. --export-dynamic-symbol takes a glob argument
3. --export-dynamic-symbol can suppress binding the references to the definition within the shared object if (-Bsymbolic or -Bsymbolic-functions)
4. --export-dynamic-symbol does not imply -u
I don't think the first three points can affect any user.
For the fourth point, Not implying -u can lead to some archive members unfetched.
Add -u foo to restore the previous behavior.
Exact semantics:
* -no-pie or -pie: matched non-local defined symbols will be added to the dynamic symbol table.
* -shared: matched non-local STV_DEFAULT symbols will not be bound to definitions within the shared object
even if they would otherwise be due to -Bsymbolic, -Bsymbolic-functions, or --dynamic-list.
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D80487
LLD supports both REL and RELA for static relocations, but emits either
of REL and RELA for dynamic relocations. The relocation entry format is
specified by each psABI.
musl ld.so supports both REL and RELA. For such ld.so implementations,
REL (.rel.dyn .rel.plt) has size benefits even if the psABI chooses RELA:
sizeof(Elf64_Rel)=16 < sizeof(Elf64_Rela)=24.
* COPY, GLOB_DAT and J[U]MP_SLOT always have 0 addend. A ld.so
implementation does not need to read the implicit addend.
REL is strictly better.
* A RELATIVE has a non-zero addend. Such relocations can be packed
compactly with the RELR relocation entry format, which is out of scope
of this patch.
* For other dynamic relocation types (e.g. symbolic relocation R_X86_64_64),
a ld.so implementation needs to read the implicit addend. REL may have
minor performance impact, because reading implicit addends forces
random access reads instead of being able to blast out a bunch of
writes while chasing the relocation array.
This patch adds -z rel and -z rela to change the relocation entry format
for dynamic relocations. I have tested that a -z rel produced x86-64
executable works with musl ld.so
-z rela may be useful for debugging purposes on processors whose psABIs
specify REL as the canonical format: addends can be easily read by a tool.
Reviewed By: grimar, mcgrathr
Differential Revision: https://reviews.llvm.org/D80496
In D34993, we discussed and concluded that we should drop `__real_
symbol from the symbol table, but I did the opposite in D50569.
This patch is to drop `__real_` symbol.
MaskRay's note: omitting `__real_` is important if it is undefined:
otherwise a subsequent link may error due to the undefined `__real_` in .dynsym
Differential Revision: https://reviews.llvm.org/D51283
Bazel created interface shared objects (.ifso) may be misaligned. We use
llvm::support::detail::packed_endian_specific_integral under the hood
which allows reading of misaligned values, so there is not a need to
diagnose (in LLD we don't intend to support sophisticated parsing for
SHT_GNU_*).
In the 64-bit ELF V2 API Specification: Power Architecture, 2.3.3.1. GPR
Save and Restore Functions defines some special functions which may be
referenced by GCC produced assembly (LLVM does not reference them).
With GCC -Os, when the number of call-saved registers exceeds a certain
threshold, GCC generates `_savegpr0_* _restgpr0_*` calls and expects the
linker to define them. See
https://sourceware.org/pipermail/binutils/2002-February/017444.html and
https://sourceware.org/pipermail/binutils/2004-August/036765.html . This
is weird because libgcc.a would be the natural place. However, the linker
generation approach has the advantage that the linker can generate
multiple copies to avoid long branch thunks. We don't consider the
advantage significant enough to complicate our trunk implementation, so
we take a simple approach.
* Check whether `_savegpr0_{14..31}` are used
* If yes, define needed symbols and add an InputSection with the code sequence.
`_savegpr1_*` `_restgpr0_*` and `_restgpr1_*` are similar.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D79977
An undefined symbol in a shared object can be versioned, like `f@v1`.
We currently insert `f` as an Undefined into the symbol table, but we
should insert `f@v1` instead.
The string `v1` is inferred from SHT_GNU_versym and SHT_GNU_verneed.
This patch implements the functionality.
Failing to do this can cause two issues:
* If a versioned symbol referenced by a shared object is defined in the
executable, we will fail to export it.
* If a versioned symbol referenced by a shared object in another object
file, --no-allow-shlib-undefined may spuriously report an
"undefined reference to " error. See https://bugs.llvm.org/show_bug.cgi?id=44842
(Linking -lfftw3 -lm on Arch Linux can cause
`undefined reference to __log_finite`)
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D80059
Note, we still name a preempted SharedSymbol "shared definition",
instead of "reference" as printed by GNU ld. This difference should not matter.
```
// GNU ld
ld.bfd: t: definition of f@v1
ld.bfd: t.so: reference to f@v1
```
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D80143
This is fixing a thinLTO module collision issue for thin archives. The problem is that we always use a zero offset to name members in a thin archive and that causes the following build error:
ld.lld: error: Expected at most one ThinLTO module per bitcode file
which happens to a thin archive that has two members with the same object file name (whose paths will be ignored by thinLTO driver)
The fix here is to use real member offset instead as is done for non-thin archives.
Differential Revision: https://reviews.llvm.org/D79880
Announced on https://lists.llvm.org/pipermail/llvm-dev/2020-May/141416.html
Similar to D79371, but for `multiclass B` (convenience helper for defining --foo and --no-foo)
Some changed options are also used by gold, but I haven't seen their
one-dash use cases outside of lld's testsuite.
Both the .ARM.exidx and .eh_frame sections have a custom SyntheticSection
that acts as a container for the InputSections. The InputSections are added
to the SyntheticSection prior to /DISCARD/ which limits the affect a
/DISCARD/ can have to the whole SyntheticSection. In the majority of cases
this is sufficient as it is not common to discard subsets of the
InputSections. The Linux kernel has one of these scripts which has something
like:
/DISCARD/ : { *(.ARM.exidx.exit.text) *(.ARM.extab.exit.text) ... }
The .ARM.exidx.exit.text are not discarded because the InputSection has been
transferred to the Synthetic Section. The *(.ARM.extab.exit.text) sections
have not so they are discarded. When we come to write out the .ARM.exidx
sections the dangling references from .ARM.exidx.exit.text to
.ARM.extab.exit.text currently cause relocation out of range errors, but
could as easily cause a fatal error message if we check for dangling
references at relocation time.
This patch attempts to respect the /DISCARD/ command by running it on the
.ARM.exidx InputSections stored in the SyntheticSection.
The .eh_frame is in theory affected by this problem, but I don't think that
there is a dangling reference problem that can happen with these sections.
Fixes remaining part of pr44824
Differential Revision: https://reviews.llvm.org/D79687
For sampleFDO, because the optimized build uses profile generated from previous
release, often we couldn't tell a function without profile was truely cold or
just newly created so we had to treat them conservatively and put them in .text
section instead of .text.unlikely. The result was when we persue the best
performance by locking .text.hot and .text in memory, we wasted a lot of memory
to keep cold functions inside. This problem has been largely solved for regular
sampleFDO using profile-symbol-list (https://reviews.llvm.org/D66374), but for
the case when we use partial profile, we still waste a lot of memory because
of it.
In https://reviews.llvm.org/D62540, we propose to save functions with unknown
hotness information in a special section called ".text.unknown", so that
compiler will treat those functions as luck-warm, but runtime can choose not
to mlock the special section in memory or use other strategy to save memory.
That will solve most of the memory problem even if we use a partial profile.
The patch adds the support in lld for the special section.For sampleFDO,
because the optimized build uses profile generated from previous release,
often we couldn't tell a function without profile was truely cold or just
newly created so we had to treat them conservatively and put them in .text
section instead of .text.unlikely. The result was when we persue the best
performance by locking .text.hot and .text in memory, we wasted a lot of
memory to keep cold functions inside. This problem has been largely solved
for regular sampleFDO using profile-symbol-list
(https://reviews.llvm.org/D66374), but for the case when we use partial
profile, we still waste a lot of memory because of it.
In https://reviews.llvm.org/D62540, we propose to save functions with unknown
hotness information in a special section called ".text.unknown", so that
compiler will treat those functions as luck-warm, but runtime can choose not
to mlock the special section in memory or use other strategy to save memory.
That will solve most of the memory problem even if we use a partial profile.
The patch adds the support in lld for the special section.
Differential Revision: https://reviews.llvm.org/D79590
Essentially takes the lld/Common/Threads.h wrappers and moves them to
the llvm/Support/Paralle.h algorithm header.
The changes are:
- Remove policy parameter, since all clients use `par`.
- Rename the methods to `parallelSort` etc to match LLVM style, since
they are no longer C++17 pstl compatible.
- Move algorithms from llvm::parallel:: to llvm::, since they have
"parallel" in the name and are no longer overloads of the regular
algorithms.
- Add range overloads
- Use the sequential algorithm directly when 1 thread is requested
(skips task grouping)
- Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
using any other parameter, and it made overload resolution hard for
for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.
Remove Threads.h and update LLD for that.
This is a prerequisite for parallel public symbol processing in the PDB
library, which is in LLVM.
Reviewed By: MaskRay, aganea
Differential Revision: https://reviews.llvm.org/D79390