This restores 59733525d3 (D71913), along
with bot fix 19c76989bb.
The bot failure should be fixed by D73418, committed as
af954e441a.
I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
* Generalize the code added in D70637 and D70937. We should eventually remove the EM_MIPS special case.
* Handle R_PPC_LOCAL24PC the same way as R_PPC_REL24.
Reviewed By: Bdragon28
Differential Revision: https://reviews.llvm.org/D73424
-fno-pie produces a pair of non-GOT-non-PLT relocations R_PPC_ADDR16_{HA,LO} (R_ABS) referencing external
functions.
```
lis 3, func@ha
la 3, func@l(3)
```
In a -no-pie/-pie link, if func is not defined in the executable, a canonical PLT entry (st_value>0, st_shndx=0) will be needed.
References to func in shared objects will be resolved to this address.
-fno-pie -pie should fail with "can't create dynamic relocation ... against ...", so we just need to think about -no-pie.
On x86, the PLT entry passes the JMP_SLOT offset to the rtld PLT resolver.
On x86-64: the PLT entry passes the JUMP_SLOT index to the rtld PLT resolver.
On ARM/AArch64: the PLT entry passes &.got.plt[n]. The PLT header passes &.got.plt[fixed-index]. The rtld PLT resolver can compute the JUMP_SLOT index from the two addresses.
For these targets, the canonical PLT entry can just reuse the regular PLT entry (in PltSection).
On PPC32: PltSection (.glink) consists of `b PLTresolve` instructions and `PLTresolve`. The rtld PLT resolver depends on r11 having been set up to the .plt (GotPltSection) entry.
On PPC64 ELFv2: PltSection (.glink) consists of `__glink_PLTresolve` and `bl __glink_PLTresolve`. The rtld PLT resolver depends on r12 having been set up to the .plt (GotPltSection) entry.
We cannot reuse a `b PLTresolve`/`bl __glink_PLTresolve` in PltSection as a canonical PLT entry. PPC64 ELFv2 avoids the problem by using TOC for any external reference, even in non-pic code, so the canonical PLT entry scenario should not happen in the first place.
For PPC32, we have to create a PLT call stub as the canonical PLT entry. The code sequence sets up r11.
Reviewed By: Bdragon28
Differential Revision: https://reviews.llvm.org/D73399
Symbol information can be used to improve out-of-range/misalignment diagnostics.
It also helps R_ARM_CALL/R_ARM_THM_CALL which has different behaviors with different symbol types.
There are many (67) relocateOne() call sites used in thunks, {Arm,AArch64}errata, PLT, etc.
Rename them to `relocateNoSym()` to be clearer that there is no symbol information.
Reviewed By: grimar, peter.smith
Differential Revision: https://reviews.llvm.org/D73254
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
I felt really sad to push this commit for my selfish purpose to make
glibc -static-pie build with lld. Some code constructs in glibc require
R_X86_64_GOTPCREL/R_X86_64_REX_GOTPCRELX referencing undefined weak to
be resolved to a GOT entry not relocated by R_X86_64_GLOB_DAT (GNU ld
behavior), e.g.
csu/libc-start.c
if (__pthread_initialize_minimal != NULL)
__pthread_initialize_minimal ();
elf/dl-object.c
void
_dl_add_to_namespace_list (struct link_map *new, Lmid_t nsid)
{
/* We modify the list of loaded objects. */
__rtld_lock_lock_recursive (GL(dl_load_write_lock));
Emitting a GLOB_DAT will make the address equal &__ehdr_start (true
value) and cause elf/ldconfig to segfault. glibc really should move away
from weak references, which do not have defined semantics.
Temporarily special case --no-dynamic-linker.
These functions call relocateOne(). This patch is a prerequisite for
making relocateOne() aware of `Symbol` (D73254).
Reviewed By: grimar, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D73250
Summary:
Linker scripts allow filenames to be put in double quotes to prevent
characters in filenames that are part of the linker script syntax from
having their special meaning. Case in point the * wildcard character.
Availability of double quoting filenames also allows to fix a failure in
ELF/linkerscript/filename-spec.s when the path contain a @ which the
lexer consider as a special characters and thus break up a filename
containing it. This may happens under Jenkins which createspath such as
pipeline@2.
To avoid the need for escaping GlobPattern metacharacters in filename
in double quotes, GlobPattern::create is augmented with a new parameter
to request literal matching instead of relying on the presence of a
wildcard character in the pattern.
Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap
Reviewed By: MaskRay
Subscribers: peter.smith, grimar, ruiu, emaste, arichardson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72517
The --fix-cortex-a8 is sensitive to alignment and the precise destination
of branch instructions. These are not knowable at relocatable link time. We
follow GNU ld and the --fix-cortex-a53-843419 (D72968) by not patching the
code when there is a relocatable link.
Differential Revision: https://reviews.llvm.org/D73100
Add new method getFirstInputSection and use instead of getInputSections
where appropriate to avoid creation of an unneeded vector of input
sections.
Differential Revision: https://reviews.llvm.org/D73047
The INPUT_SECTION_FLAGS linker script command is used to constrain the
section pattern matching to sections that match certain combinations of
flags.
There are two ways to express the constraint.
withFlags: Section must have these flags.
withoutFlags: Section must not have these flags.
The syntax of the command is:
INPUT_SECTION_FLAGS '(' sect_flag_list ')'
sect_flag_list: NAME
| sect_flag_list '&' NAME
Where NAME matches a section flag name such as SHF_EXECINSTR, or the
integer value of a section flag. If the first character of NAME is ! then
it means must not contain flag.
We do not support the rare case of { INPUT_SECTION_FLAGS(flags) filespec }
where filespec has no input section description like (.text).
As an example from the ld man page:
SECTIONS {
.text : { INPUT_SECTION_FLAGS (SHF_MERGE & SHF_STRINGS) *(.text) }
.text2 : { INPUT_SECTION_FLAGS (!SHF_WRITE) *(.text) }
}
.text will match sections called .text that have both the SHF_MERGE and
SHF_STRINGS flag.
.text2 will match sections called .text that don't have the SHF_WRITE flag.
The flag names accepted are the generic to all targets and SHF_ARM_PURECODE
as it is very useful to filter all the pure code sections into a single
program header that can be marked execute never.
fixes PR44265
Differential Revision: https://reviews.llvm.org/D72756
Summary:
Unlike R_RISCV_RELAX, which is a linker hint, R_RISCV_ALIGN requires the
support of the linker even when ignoring all R_RISCV_RELAX relocations.
This is because the compiler emits as many NOPs as may be required for
the requested alignment, more than may be required pre-relaxation, to
allow for the target becoming more unaligned after relaxing earlier
sequences. This means that the target is often not initially aligned in
the object files, and so the R_RISCV_ALIGN relocations cannot just be
ignored. Since we do not support linker relaxation, we must turn these
into errors.
Reviewers: ruiu, MaskRay, espindola
Reviewed By: MaskRay
Subscribers: grimar, Jim, emaste, arichardson, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71820
The code doesn't apply the fix correctly to relocatable links. I could
try to fix the code that applies the fix, but it's pointless: we don't
actually know what the offset will be in the final executable. So just
ignore the flag for relocatable links.
Issue discovered building Android.
Differential Revision: https://reviews.llvm.org/D72968
This essentially reverts b841e119d7.
Such code construct can be used in the following way:
// glibc/stdlib/exit.c
// clang -fuse-ld=lld => succeeded
// clang -fuse-ld=lld -fpie -pie => relocation R_PLT_PC cannot refer to absolute symbol
__attribute__((weak, visibility("hidden"))) extern void __call_tls_dtors();
void __run_exit_handlers() {
if (__call_tls_dtors)
__call_tls_dtors();
}
Since we allow R_PLT_PC in -no-pie mode, it makes sense to allow it in
-pie mode as well.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D72943
In D71281 a fix was put in to round up the size of a ThunkSection to the
nearest 4KiB when performing errata patching. This fixed a problem with a
very large instrumented program that had thunks and patches mutually
trigger each other. Unfortunately it triggers an assertion failure in an
AArch64 allyesconfig build of the kernel. There is a specific assertion
preventing an InputSectionDescription being larger than 4KiB. This will
always trigger if there is at least one Thunk needed in that
InputSectionDescription, which is possible for an allyesconfig build.
Abstractly the problem case is:
.text : {
*(.text) ;
...
. = ALIGN(SZ_4K);
__idmap_text_start = .;
*(.idmap.text)
__idmap_text_end = .;
...
}
The assertion checks that __idmap_text_end - __idmap_start is < 4 KiB.
Note that there is more than one InputSectionDescription in the
OutputSection so we can't just restrict the fix to OutputSections smaller
than 4 KiB.
The fix presented here limits the D71281 to InputSectionDescriptions that
meet the following conditions:
1.) The OutputSection is bigger than the thunkSectionSpacing so adding
thunks will affect the addresses of following code.
2.) The InputSectionDescription is larger than 4 KiB. This will prevent
any assertion failures that an InputSectionDescription is < 4 KiB
in size.
We do this at ThunkSection creation time as at this point we know that
the addresses are stable and up to date prior to adding the thunks as
assignAddresses() will have been called immediately prior to thunk
generation.
The fix reverts the two tests affected by D71281 to their original state
as they no longer need the 4KiB size roundup. I've added simpler tests to
check for D71281 when the OutputSection size is larger than the ThunkSection
spacing.
Fixes https://github.com/ClangBuiltLinux/linux/issues/812
Differential Revision: https://reviews.llvm.org/D72344
`{clang,gcc} -nostdlib -r a.c` passes --dynamic-linker to the linker,
and the expected behavior is to ignore it.
If .interp is kept in the relocatable object file, a final link will get
PT_INTERP even if --dynamic-linker is not specified. glibc ld.so expects
to see PT_DYNAMIC and the executable will likely fail to run.
Ignore --dynamic-linker in -r mode as well as -shared.
ThunkSection contains 4-byte instructions on all targets that use
thunks. Thunks should not be used in any performance sensitive places,
and locality/cache line/instruction fetching arguments should not apply.
We use 16 bytes as preferred function alignments for modern PowerPC cores.
In any case, 8 is not optimal.
Differential Revision: https://reviews.llvm.org/D72819
Moved the section name check ahead of any filename matching or
exclusion. Firstly, this reduces the need to retrieve the filename and
secondly, reduces the amount of potentially expensive filename pattern
matching if such rules are present in the linker script.
The impact of this change is particularly significant when linking
objects built with -ffunction-sections and -fstack-size-section, using a
linker script that includes non-trivial filename patterns. In a number
of such cases, the link time is halved.
Differential Revision: https://reviews.llvm.org/D72775
This assertion was added as part of D70659 but did not account for .bss
input sections. I noticed that this assert was incorrectly triggering
while building FreeBSD for MIPS64. Fixed by relaxing the assert to also
account for SHT_NOBITS input sections and adjust the test
mips-jalr-non-function.s to link a file with a .bss section first.
Reviewed By: MaskRay, grimar
Differential Revision: https://reviews.llvm.org/D72567
R_HINT is ignored like R_NONE. There are no strong reasons to keep
R_HINT. The largest RelExpr member R_RISCV_PC_INDIRECT is 60 now.
Differential Revision: https://reviews.llvm.org/D71822
Suggested by Peter Collingbourne.
Non-VER_NDX_GLOBAL versions should not be assigned to defined symbols. --exclude-libs violates this and can cause a spurious error "cannot refer to absolute symbol" after D71795.
excludeLibs incorrectly assigns VER_NDX_LOCAL to an undefined weak symbol =>
isPreemptible is false =>
R_PLT_PC is optimized to R_PC =>
in isStaticLinkTimeConstant, an error is emitted.
Reviewed By: pcc, grimar
Differential Revision: https://reviews.llvm.org/D72681
This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang.
It adds Intel CET (Control-flow Enforcement Technology) support to lld.
The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.
CET introduces a new restriction on indirect jump instructions so that
you can limit the places to which you can jump to using indirect jumps.
In order to use the feature, you need to compile source files with
-fcf-protection=full.
* IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt.
* SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified.
IBT-enabled executables/shared objects have two PLT sections, ".plt" and
".plt.sec". For the details as to why we have two sections, please read
the comments.
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D59780
RELA targets don't read initial .got.plt entries.
REL targets (ARM, x86-32) write the address of the IFUNC resolver to the
entry (`write32le(buf, s.getVA())`).
The default writeIgotPlt() is not meaningful. Make it a no-op. AArch64
and x86-64 will have 0 as initial .got.plt entries associated with
IFUNC.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72474
down to pass builder in ltobackend.
Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang
are not passed down to pass builder in ltobackend when new pass manager is
used. This is inconsistent with the behavior when new pass manager is used
and thinlto is not used. Such inconsistency causes slp vectorization pass
not being enabled in ltobackend for O3 + thinlto right now. This patch
fixes that.
Differential Revision: https://reviews.llvm.org/D72386
An undefined weak does not fetch the lazy definition. A lazy weak symbol
should be considered undefined, and thus preemptible if .dynsym exists.
D71795 is not quite an NFC. It errors on an R_X86_64_PLT32 referencing
an undefined weak symbol. isPreemptible is false (incorrect) => R_PLT_PC
is optimized to R_PC => in isStaticLinkTimeConstant, an error is emitted
when an R_PC is applied on an undefined weak (considered absolute).
Weak undefined symbols are preemptible after D71794.
if (sym.isPreemptible)
return false;
if (!config->isPic)
return true;
// isPic means includeInDynsym is true after D71794.
...
// We can delete this if because it can never be true.
if (sym.isUndefWeak)
return true;
Differential Revision: https://reviews.llvm.org/D71795
D59275 added the following clause to Symbol::includeInDynsym()
if (isUndefWeak() && Config->Pie && SharedFiles.empty())
return false;
D59549 explored the possibility to generalize it for -no-pie.
GNU ld's rules are architecture dependent and partly controlled by -z
{,no-}dynamic-undefined-weak. Our attempts to mimic its rules are
actually half-baked and don't provide perceivable benefits (it can save
a few more weak undefined symbols in .dynsym in a -static-pie
executable). Let's just delete the rule for simplicity. We will expect
cosmetic inconsistencies with ld.bfd in certain -static-pie scenarios.
This permits a simplification in D71795.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D71794
In AArch64 a branch to an undefined weak symbol that does not have a PLT
entry should resolve to the next instruction. The thunk generation code
can prevent this from happening as a range extension thunk can be generated
if the branch is sufficiently far away from 0, the value of an undefined
weak symbol.
The fix is taken from the Arm implementation of needsThunk(), we prevent a
thunk from being generated to an undefined weak symbol.
fixes pr44451
Differential Revision: https://reviews.llvm.org/D72267
```
lld/ELF/Relocations.cpp:1622:56: warning: loop variable 'ts' of type 'const std::pair<ThunkSection *, uint32_t>' (aka 'const pair<lld:🧝:ThunkSection *, unsigned int>') creates a copy from type 'const std::pair<ThunkSection *, uint32_t>' [-Wrange-loop-analysis]
for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)
```
Drop const qualifier to fix -Wrange-loop-analysis.
We can make -Wrange-loop-analysis warnings (DiagnoseForRangeConstVariableCopies) on `const A` more
permissive on more types (e.g. POD -> trivially copyable), unfortunately it will not make std::pair
good, because `constexpr pair& operator=(const pair& p);` is unfortunately user-defined.
Reviewed By: Mordante
Differential Revision: https://reviews.llvm.org/D72211
One instance looks like a false positive:
lld/ELF/Relocations.cpp:1622:14: note: use reference type 'const std::pair<ThunkSection *, uint32_t> &' (aka 'cons
t pair<lld:🧝:ThunkSection *, unsigned int> &') to prevent copying
for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)
It is not changed in this commit.
GCC before r245813 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79439)
did not emit nop after b/bl. This can happen with recursive calls.
r245813 was back ported to GCC 5.5 and GCC 6.4.
This is common, for example, libstdc++.a(locale.o) shipped with GCC 4.9
and many objects in netlib lapack can cause lld to error. gold allows
such calls to the same section. Our __plt_foo symbol's `section` field
is used for ThunkSection, so we can't implement a similar loosen rule
easily. But we can make use of its `file` field which is currently NULL.
Differential Revision: https://reviews.llvm.org/D71639
Similar to D71509 (EM_PPC64), on EM_PPC, the IPLT code sequence should
be similar to a PLT call stub. Unlike EM_PPC64, EM_PPC -msecure-plt has
small/large PIC model differences.
* -fpic/-fpie: R_PPC_PLTREL24 r_addend=0. The call stub loads an address relative to `_GLOBAL_OFFSET_TABLE_`.
* -fPIC/-fPIE: R_PPC_PLTREL24 r_addend=0x8000. (A partial linked object
file may have an addend larger than 0x8000.) The call stub loads an address relative to .got2+0x8000.
Just assume large PIC model for now. This patch makes:
// clang -fuse-ld=lld -msecure-plt -fno-pie -no-pie a.c
// clang -fuse-ld=lld -msecure-plt -fPIE -pie a.c
#include <stdio.h>
static void impl(void) { puts("meow"); }
void thefunc(void) __attribute__((ifunc("resolver")));
void *resolver(void) { return &impl; }
int main(void) {
thefunc();
void (*theptr)(void) = &thefunc;
theptr();
}
work on Linux glibc. -fpie will crash because the compiler and the
linker do not agree on the value which r30 stores (_GLOBAL_OFFSET_TABLE_
vs .got2+0x8000).
Differential Revision: https://reviews.llvm.org/D71621
Non-preemptible IFUNC are placed in in.iplt (.glink on EM_PPC64). If
there is a non-GOT non-PLT relocation, for pointer equality, we change
the type of the symbol from STT_IFUNC and STT_FUNC and bind it to the
.glink entry.
On EM_386, EM_X86_64, EM_ARM, and EM_AARCH64, the PLT code sequence
loads the address from its associated .got.plt slot. An IPLT also has an
associated .got.plt slot and can use the same code sequence.
On EM_PPC64, the PLT code sequence is actually a bl instruction in
.glink . It jumps to `__glink_PLTresolve` (the PLT header). and
`__glink_PLTresolve` computes the .plt slot (relocated by
R_PPC64_JUMP_SLOT).
An IPLT does not have an associated R_PPC64_JUMP_SLOT, so we cannot use
`bl` in .iplt . Instead, create a call stub which has a similar code
sequence as PPC64PltCallStub. We don't save the TOC pointer, so such
scenarios will not work: a function pointer to a non-preemptible ifunc,
which resolves to a function defined in another DSO. This is the
restriction described by https://sourceware.org/glibc/wiki/GNU_IFUNC
(though on many architectures it works in practice):
Requirement (a): Resolver must be defined in the same translation unit as the implementations.
If an ifunc is taken address but not called, technically we don't need
an entry for it, but we currently do that.
This patch makes
// clang -fuse-ld=lld -fno-pie -no-pie a.c
// clang -fuse-ld=lld -fPIE -pie a.c
#include <stdio.h>
static void impl(void) { puts("meow"); }
void thefunc(void) __attribute__((ifunc("resolver")));
void *resolver(void) { return &impl; }
int main(void) {
thefunc();
void (*theptr)(void) = &thefunc;
theptr();
}
work on Linux glibc and FreeBSD. Calling a function pointer pointing to
a Non-preemptible IFUNC never worked before.
Differential Revision: https://reviews.llvm.org/D71509
This restores commit 1417558e4a and its follow-up, reverted by commit c3dbd782f1.
After this commit:
clang -fuse-ld=bfd -no-pie -nostdlib a.c => .interp not created
clang -fuse-ld=bfd -pie -fPIE -nostdlib a.c => .interp created
clang -fuse-ld=gold -no-pie -nostdlib a.c => .interp not created
clang -fuse-ld=gold -pie -fPIE -nostdlib a.c => .interp created
clang -fuse-ld=lld -no-pie -nostdlib a.c => .interp created
clang -fuse-ld=lld -pie -fPIE -nostdlib a.c => .interp created
This reverts commit 1417558e4a.
Also reverts commit 019a92bb28.
This causes check-sanitizer to fail. The "-Nolib" variant of the test
crashes on startup in the loader.
Similar to rL362355, but with the `!config->shared` guard.
(1) {gcc,clang} -fuse-ld=bfd -pie -fPIE -nostdlib a.c => .interp created
(2) {gcc,clang} -fuse-ld=lld -pie -fPIE -nostdlib a.c => .interp not created
(3) {gcc,clang} -fuse-ld=lld -pie -fPIE -nostdlib a.c a.so => .interp created
The inconsistency of (2) is due to the condition `!Config->SharedFiles.empty()`.
To make lld behave more like ld.bfd, we could change the condition to:
config->hasDynSymTab && !config->dynamicLinker.empty() && script->needsInterpSection();
However, that would bring another inconsistency as can be observed with:
(4) {gcc,clang} -fuse-ld=bfd -no-pie -nostdlib a.c => .interp not created
Linux powerpc discards `*(.gnu.version*)` (arch/powerpc/kernel/vmlinux.lds.S)
to suppress --orphan-handling=warn warnings in the -pie output `.tmp_vmlinux1`
The support is simple. Just add isLive() to:
1) Fix an assertion in SectionBase::getPartition() called by VersionTableSection::isNeeded().
2) Suppress DT_VERSYM, DT_VERDEF, DT_VERNEED and DT_VERNEEDNUM, if the relevant section is discarded.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D71819
For undef-not-suggest.test, we currently make redundant alternative
spelling suggestions:
```
ld.lld: error: relocation refers to a discarded section: .text.foo
>>> defined in a.o
>>> section group signature: foo
>>> prevailing definition is in a.o
>>> referenced by a.o:(.rodata+0x0)
>>> did you mean:
>>> defined in: a.o
ld.lld: error: relocation refers to a symbol in a discarded section: foo
>>> defined in a.o
>>> section group signature: foo
>>> prevailing definition is in a.o
>>> referenced by a.o:(.rodata+0x8)
>>> did you mean: for
>>> defined in: a.o
```
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D71735
Summary:
If none of the input files are ELF object files (for example, when
generating an object file from a single binary input file via
"-b binary"), use a fallback value for the ELF header flags instead
of crashing with an assertion failure.
Reviewers: MaskRay, ruiu, espindola
Reviewed By: MaskRay, ruiu
Subscribers: kevans, grimar, emaste, arichardson, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits, jrtc27
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71101
GNU ld creates the synthetic section .iplt, and has a built-in linker
script that assigns .iplt to the output section .plt . There is no
output section named .iplt .
Making .iplt an output section actually has a benefit that makes the
tricky toolchain feature stand out. Symbolizers don't have to deal with
mixed PLT entries (e.g. llvm-objdump -d incorrectly annotates such jump
targets).
On EM_PPC{,64}, .glink contains a PLT resolver and a series of jump
instructions. The 4-byte entry size makes it unnecessary to have an
alignment of 16.
Mark ppc32-gnu-ifunc.s and ppc32-gnu-ifunc-nonpreemptable.s as `XFAIL: *`.
They test IPLT on EM_PPC, which never works.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D71520
PltSection is used by both PLT and IPLT. The PLT section may have a
header while the IPLT section does not. Split off IpltSection from
PltSection to be clearer.
Unlike other targets, PPC64 cannot use the same code sequence for PLT
and IPLT. This helps make a future PPC64 patch (D71509) more isolated.
On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently
we are inconsistent whether the PLT header is conceptually attached to
in.plt or in.iplt . Consistently attach the header to in.plt can make
the -z retpolineplt logic simpler. It also makes `jmp` point to an
aesthetically better place for non-retpolineplt cases.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D71519
This change only affects EM_386. relOff can be computed from `index`
easily, so it is unnecessarily passed as a parameter.
Both in.plt and in.iplt entries are written by writePLT. For in.iplt,
the instruction `push reloc_offset` will change because `index` is now
different. Fortunately, this does not matter because `push; jmp` is only
used by PLT. IPLT does not need the code sequence.
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D71518
This reverts commit 2bbd32f5e8, it was
causing UBSan failures like the following:
lld/ELF/Target.cpp:103:41: runtime error: applying non-zero offset 24 to null pointer
When a common symbol is merged with a shared symbol, increase st_size if
the shared symbol has a larger st_size. At runtime, the executable's
symbol overrides the shared symbol. The shared symbol may be created
from common symbols in a previous link. This rule makes sure we pick
the largest size among all common symbols.
This behavior matches GNU ld. See
https://sourceware.org/bugzilla/show_bug.cgi?id=25236 for discussions.
A shared symbol does not hold alignment constraints. Ignore the
alignment update.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D71161
Summary:
So far it seems like the only test affected by this change is the one I
recently added for R_MIPS_JALR relocations since the other test cases that
use this function early (unknown-relocation-*) do not have a valid input
section for the relocation offset.
Reviewers: ruiu, grimar, MaskRay, espindola
Reviewed By: ruiu, MaskRay
Subscribers: emaste, sdardis, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70659
On some edge cases such as Chromium compiled with full instrumentation we
have a .text section over twice the size of the maximum branch range and
the instrumented code generation containing many examples of the erratum
sequence. The combination of Thunks and many erratum sequences causes
finalizeAddressDependentContent() to not converge. We end up with:
start
- Thunk Creation (disturbs addresses after thunks, creating more patches)
- Patch Creation (disturbs addresses after patches, creating more thunks)
- goto start
In most images with few thunks and patches the mutual disturbance does not
cause convergence problems. As the .text size and number of patches go up
the risk increases.
A way to prevent the thunk creation from interfering with patch creation is
to round up the size of the thunks to a 4KiB boundary when the
erratum patch is enabled. As the erratum sequence only triggers when an
instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the
thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering with the patch.
The patches themselves could be aggregated in the same way that Thunks are
within ThunkSections and we could round up the size in the same way. This
would reduce the number of patches created in a .text section size >
128 MiB but would not likely help convergence problems.
Differential Revision: https://reviews.llvm.org/D71281
fixes (remaining part of) pr44071, other part in D71242
The code to insert patch section merges them with a comparison function that
uses logic of the form:
return (isa<PatchSection>(a) && !isa<PatchSection>(b));
If the PatchSections don't implement classof this check fails if b is also
a SyntheticSection. This can result in the patches being out of range if
the SyntheticSection is big, for example a ThunkSection with lots of thunks.
Differential Revision: https://reviews.llvm.org/D71242
fixes (part of) pr44071
clang/gcc -fdebug-type-sections places .debug_types and
.rela.debug_types in a section group, with a signature symbol which
represents the type signature. The section group is for deduplication
purposes.
After D70146, we will discard such section groups. Refine the rule so
that we will retain the group if no member has the SHF_ALLOC flag.
GNU ld has a similar rule to retain the group if all members have the
SEC_DEBUGGING flag. We try to be more general for future-proof purposes:
if other non-SHF_ALLOC sections have deduplication needs, they may be
placed in a section group. Don't discard them.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D71157
Fixes PPC64 part of PR40438
// clang -target ppc64le -c a.cc
// .text.unlikely may be placed in a separate output section (via -z keep-text-section-prefix)
// The distance between bar in .text.unlikely and foo in .text may be larger than 32MiB.
static void foo() {}
__attribute__((section(".text.unlikely"))) static int bar() { foo(); return 0; }
__attribute__((used)) static int dummy = bar();
This patch makes such thunks with addends work for PPC64.
AArch64: .text -> `__AArch64ADRPThunk_ (adrp x16, ...; add x16, x16, ...; br x16)` -> target
PPC64: .text -> `__long_branch_ (addis 12, 2, ...; ld 12, ...(12); mtctr 12; bctr)` -> target
AArch64 can leverage ADRP to jump to the target directly, but PPC64
needs to load an address from .branch_lt . Before Power ISA v3.0, the
PC-relative ADDPCIS was not available. .branch_lt was invented to work
around the limitation.
Symbol::ppc64BranchltIndex is replaced by
PPC64LongBranchTargetSection::entry_index which take addends into
consideration.
The tests are rewritten: ppc64-long-branch.s tests -no-pie and
ppc64-long-branch-pi.s tests -pie and -shared.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D70937
replaceWithDefined is used by canonical PLT and copy relocations, which
imply that the symbol is preemptable. ppc64BranchltIndex is only used by
non-preemptable cases, and it can only be the default value in
replaceWithDefined.
The .note.gnu.property SHT_NOTE sections on AArch64 (a 64-bit target)
should have alignment 8 to more closely match the binutils implementation
where alignment is 4-bytes on 32-bit machines and 8-bytes on 64-bit
machines.
Previously LLD was using 4 for both 32-bit and 64-bit machines.
Differential Revision: https://reviews.llvm.org/D70962
The PT_GNU_PROPERTY program header describes the location of the
.note.gnu.property SHT_NOTES section. The linux kernel uses this program
header to find the .note.gnu.property section rather than parsing.
Executables that have properties that the kernel needs to act on that don't
have the PT_GNU_PROPERTY program header will not boot.
Differential Revision: https://reviews.llvm.org/D70961
Fixes AArch64 part of PR40438
The current range extension thunk framework does not handle a relocation
relative to a STT_SECTION symbol with a non-zero addend, which may be
used by jumps/calls to local functions on some RELA targets (AArch64,
powerpc ELFv1, powerpc64 ELFv2, etc). See PR40438 and the following
code for examples:
// clang -target $target a.cc
// .text.cold may be placed in a separate output section.
// The distance between bar in .text.cold and foo in .text may be larger than 128MiB.
static void foo() {}
__attribute__((section(".text.cold"))) static int bar() { foo(); return
0; }
__attribute__((used)) static int dummy = bar();
This patch makes such thunks with addends work for AArch64. The target
independent part can be reused by PPC in the future.
On REL targets (ARM, MIPS), jumps/calls are not represented as
STT_SECTION + non-zero addend (see
MCELFObjectTargetWriter::needsRelocateWithSymbol), so they don't need
this feature, but we need to make sure this patch does not affect them.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D70637
ThunkCreator::getThunk and ThunkCreator::normalizeExistingThunk
currently assume that the implicit addends are -8 for ARM and -4 for
Thumb. In D70637, ThunkCreator::getThunk will need to take care of the
relocation addend explicitly.
Add the utility function getPCBias() as a prerequisite so that the getThunk change in D70637
can be more general.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D70690
D62381 introduced forEachSymbol(). It seems that many call sites cannot
be parallelized because the body shared some states. Replace
forEachSymbol with iterator_range<filter_iterator<...>> symbols() to
simplify code and improve debuggability (std::function calls take some
frames).
It also allows us to use early return to simplify code added in D69650.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D70505
Currently LLD always use zlib compression level 6.
This patch changes it to use 1 for -O0, -O1 and 6 for -O2.
It fixes https://bugs.llvm.org/show_bug.cgi?id=44089.
There was also a thread in llvm-dev on this topic:
https://lists.llvm.org/pipermail/llvm-dev/2018-August/125020.html
Here is a table with results of building clang mentioned there:
```
Level Time Size
0 0m17.128s 2045081496 Z_NO_COMPRESSION
1 0m31.471s 922618584 Z_BEST_SPEED
2 0m32.659s 903642376
3 0m36.749s 890805856
4 0m41.532s 876697184
5 0m48.383s 862778576
6 1m3.176s 855251640 Z_DEFAULT_COMPRESSION
7 1m15.335s 853755920
8 2m0.561s 852497560
9 2m33.972s 852397408 Z_BEST_COMPRESSION
```
It shows that it is probably not reasonable to use values greater than 6.
Differential revision: https://reviews.llvm.org/D70658
In GNU ld, -Ttext sets the address of the .text section and -Ttext-segment sets the address of the text segment (RX).
gold only supports the -Ttext-segment semantic and treats -Ttext as an alias for -Ttext-segment.
lld only supports the -Ttext semantic and treats -Ttext-segment as an
alias for -Ttext. The text segment will be assigned to an address less
than the specified -Ttext-segment value.
This patch drops the -Ttext-segment alias.
The text segment is traditionally the first segment. Users who specify
-Ttext-segment may actually want to specify --image-base, the lld way to
express this. Unfortunately currently this is supported by GNU ld's
COFF port but not by its ELF port. gold does not support this option.
With -z separate-code, the behavior of GNU ld -Ttext-segment is weird (see https://sourceware.org/bugzilla/show_bug.cgi?id=25207)
rL289827 introduced the alias for linking qemu's non-pie user mode
binaries. As explained previously, this actually assigns the text
segment to an address less than 0x60000000. I feel that a better fix is
on the qemu side:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02480.html
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D70468
Remove the lld::enableColors function, as it just obscures which
stream it's affecting, and replace with explicit calls to the stream's
enable_colors.
Also, assign the stderrOS and stdoutOS globals first in link function,
just to ensure nothing might use them.
(Either change individually fixes the issue of using the old
stream, but both together seems best.)
Follow-up to b11386f9be.
Differential Revision: https://reviews.llvm.org/D70492
Summary:
Current versions of clang would erroneously emit this relocation not only
against functions (loaded from the GOT) but also against data symbols
(e.g. a table of function pointers). LLD was then changing this into a
branch-and-link instruction, causing the program to jump to the data
symbol at run time. I discovered this problem when attempting to boot
MIPS64 FreeBSD after updating the to the latest upstream master.
Reviewers: atanasyan, jrtc27, espindola
Reviewed By: atanasyan
Subscribers: emaste, sdardis, krytarowski, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70406
Based on D70020 by serge-sans-paille.
The ELF spec says:
> Furthermore, there may be internal references among these sections that would not make sense if one of the sections were removed or replaced by a duplicate from another object. Therefore, such groups must be included or omitted from the linked object as a unit. A section cannot be a member of more than one group.
GNU ld has 2 behaviors that we don't have:
- Group members (nextInSectionGroup != nullptr) are subject to garbage collection.
This includes non-SHF_ALLOC SHT_NOTE sections.
In particular, discarding non-SHF_ALLOC SHT_NOTE sections is an expected behavior by the Annobin
project. See
https://developers.redhat.com/blog/2018/02/20/annobin-storing-information-binaries/
for more information.
- Groups members are retained or discarded as a unit.
Members may have internal references that are not expressed as
SHF_LINK_ORDER, relocations, etc. It seems that we should be more conservative here:
if a section is marked live, mark all the other member within the
group.
Both behaviors are reasonable. This patch implements them.
A new field InputSectionBase::nextInSectionGroup tracks the next member
within a group. on ELF64, this increases sizeof(InputSectionBase) froms
144 to 152.
InputSectionBase::dependentSections tracks section dependencies, which
is used by both --gc-sections and /DISCARD/. We can't overload it for
the "next member" semantic, because we should allow /DISCARD/ to discard
sections independent of --gc-sections (GNU ld behavior). This behavior
may be reasonably used by `/DISCARD/ : { *(.ARM.exidx*) }` or `/DISCARD/
: { *(.note*) }` (new test `linkerscript/discard-group.s`).
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D70146
This change is for those who use lld as a library. Context:
https://reviews.llvm.org/D70287
This patch adds a new parmeter to lld::*::link() so that we can pass
an raw_ostream object representing stdout. Previously, lld::*::link()
took only an stderr object.
Justification for making stdoutOS and stderrOS mandatory: I wanted to
make link() functions to take stdout and stderr in that order.
However, if we change the function signature from
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stderrOS = llvm::errs());
to
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stdoutOS = llvm::outs(),
raw_ostream &stderrOS = llvm::errs());
, then the meaning of existing code that passes stderrOS silently
changes (stderrOS would be interpreted as stdoutOS). So, I chose to
make existing code not to compile, so that developers can fix their
code.
Differential Revision: https://reviews.llvm.org/D70292
The patch in https://reviews.llvm.org/D64077 causes a build failure
because both the Defined and SharedSymbol classes are bigger than 80
bytes on MinGW 8.
This patch fixes this build failure by changing the type of the
bitfields. It is a similar change to the bitfield changes in
https://reviews.llvm.org/D64238, but instead of changing to bool I
decided to use uint8_t because one of the bitfields takes up two bits
instead of one.
Note: the patch is slightly different from the one reviewed in
Phabricator, but it is a trivial change to align it with LLVM master
instead of LLVM 9. Also, it passes all lld tests.
Differential Revision: https://reviews.llvm.org/D70266
The definition may be mangled while an undefined reference is not.
This may come up when (1) the reference is from a C file or (2) the definition
misses an extern "C".
(2) is more common. Suggest an arbitrary mangled name that matches the
undefined reference, if such a definition exists.
ld.lld: error: undefined symbol: foo
>>> referenced by a.o:(.text+0x1)
>>> did you mean to declare foo(int) as extern "C"?
>>> defined in: a1.o
Reviewed By: dblaikie, ruiu
Differential Revision: https://reviews.llvm.org/D69650
When missing an extern "C" declaration, an undefined reference may be
mangled while the definition is not. Suggest the missing
extern "C" and the base name.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D69592
The logic added in r372781 caused ARMExidxSyntheticSection::addSection()
to return false for exidx sections without a link order dep that passed
isValidExidxSectionDep(). This included exidx sections for empty functions. As
a result, such exidx sections would end up treated like ordinary sections and
would end up being laid out before the ARMExidxSyntheticSection, most likely in
the wrong order relative to the exidx entries in the ARMExidxSyntheticSection,
breaking the orderedness invariant relied upon by unwinders. Fix this by
simply discarding such sections.
Differential Revision: https://reviews.llvm.org/D69744
Summary:
Add a flag `F_no_mmap` to `FileOutputBuffer` to support
`--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores
this flag for compatibility with GNU ld and gold.
We need this flag to speed up link time for large binaries in certain
scenarios. When we link some of our larger binaries we find that LLD
takes 50+ GB of memory, which causes memory pressure. The memory
pressure causes the VM to flush dirty pages of the output file to disk.
This is normally okay, since we should be flushing cold pages. However,
when using BtrFS with compression we need to write 128KB at a time when
we flush a page. If any page in that 128KB block is written again, then
it must be flushed a second time, and so on. Since LLD doesn't write
sequentially this causes write amplification. The same 128KB block will
end up being flushed multiple times, causing the linker to many times
more IO than necessary. We've observed 3-5x faster builds with
-no-mmap-output-file when we hit this scenario.
The bad scenario only applies to compressed filesystems, which group
together multiple pages into a single compressed block. I've tested
BtrFS, but the problem will be present for any compressed filesystem
on Linux, since it is caused by the VM.
Silently ignoring --no-mmap-output-file caused a silent regression when
we switched from gold to lld. We pass --no-mmap-output-file to fix this
edge case, but since lld silently ignored the flag we didn't realize it
wasn't being respected.
Benchmark building a 9 GB binary that exposes this edge case. I linked 3
times with --mmap-output-file and 3 times with --no-mmap-output-file and
took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM,
BtrFS mounted with -compress-force=zstd, and an 80% full disk.
| Mode | Time |
|---------|-------|
| mmap | 894 s |
| no mmap | 126 s |
When compression is disabled, BtrFS performs just as well with and
without mmap on this benchmark.
I was unable to reproduce the regression with any binaries in
lld-speed-test.
Reviewed By: ruiu, MaskRay
Differential Revision: https://reviews.llvm.org/D69294
Add a new '-z nognustack' option that suppresses emitting PT_GNU_STACK
segment. This segment is not supported at all on NetBSD (stack is
always non-executable), and the option is meant to be used to disable
emitting it.
Differential Revision: https://reviews.llvm.org/D56554