Summary:
This option causes lld to shuffle sections by assigning different
priorities in each run.
The use case for this is to introduce randomization in benchmarks. The
idea is inspired by the paper "Producing Wrong Data Without Doing
Anything Obviously Wrong!"
(https://www.inf.usi.ch/faculty/hauswirth/publications/asplos09.pdf). Unlike
the paper, we shuffle individual sections, not just input files.
Doing this in lld is particularly convenient as the --reproduce option
makes it easy to collect all the necessary bits for relinking the
program being benchmarked. Once that it is done, all that is needed is
to add --shuffle-sections=0 to the response file and relink before each
run of the benchmark.
Differential Revision: https://reviews.llvm.org/D74791
Summary:
Generate PAC protected plt only when "-z pac-plt" is passed to the
linker. GNU toolchain generates when it is explicitly requested[1].
When pac-plt is requested then set the GNU_PROPERTY_AARCH64_FEATURE_1_PAC
note even when not all function compiled with PAC but issue a warning.
Harmonizing the warning style for BTI/PAC/IBT.
Generate BTI protected PLT if case of "-z force-bti".
[1] https://www.sourceware.org/ml/binutils/2019-03/msg00021.html
Reviewers: peter.smith, espindola, MaskRay, grimar
Reviewed By: peter.smith, MaskRay
Subscribers: tatyana-krasnukha, emaste, arichardson, kristof.beyls, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74537
Fixes https://bugs.llvm.org//show_bug.cgi?id=44878
When --strip-debug is specified, .debug* are removed from inputSections
while .rel[a].debug* (incorrectly) remain.
LinkerScript::addOrphanSections() requires the output section of a relocated
InputSectionBase to be created first.
.debug* are not in inputSections ->
output sections .debug* are not created ->
getOutputSectionName(.rel[a].debug*) dereferences a null pointer.
Fix the null pointer dereference by deleting .rel[a].debug* from inputSections as well.
Reviewed By: grimar, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D74510
D43468+D44380 added INSERT [AFTER|BEFORE] for non-orphan sections. This patch
makes INSERT work for orphan sections as well.
`SECTIONS {...} INSERT [AFTER|BEFORE] .foo` does not set `hasSectionCommands`, so the result
will be similar to a regular link without a linker script. The differences when `hasSectionCommands` is set include:
* image base is different
* -z noseparate-code/-z noseparate-loadable-segments are unavailable
* some special symbols such as `_end _etext _edata` are not defined
The behavior is similar to GNU ld:
INSERT is not considered an external linker script.
This feature makes the section layout more flexible. It can be used to:
* Place .nv_fatbin before other readonly SHT_PROGBITS sections to mitigate relocation overflows.
* Disturb the layout to expose address sensitive application bugs.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D74375
This adds some of LLD specific scopes and picks up optimisation scopes
via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in
77e6bb3c.
Differential Revision: https://reviews.llvm.org/D71060
This restores 59733525d3 (D71913), along
with bot fix 19c76989bb.
The bot failure should be fixed by D73418, committed as
af954e441a.
I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
I felt really sad to push this commit for my selfish purpose to make
glibc -static-pie build with lld. Some code constructs in glibc require
R_X86_64_GOTPCREL/R_X86_64_REX_GOTPCRELX referencing undefined weak to
be resolved to a GOT entry not relocated by R_X86_64_GLOB_DAT (GNU ld
behavior), e.g.
csu/libc-start.c
if (__pthread_initialize_minimal != NULL)
__pthread_initialize_minimal ();
elf/dl-object.c
void
_dl_add_to_namespace_list (struct link_map *new, Lmid_t nsid)
{
/* We modify the list of loaded objects. */
__rtld_lock_lock_recursive (GL(dl_load_write_lock));
Emitting a GLOB_DAT will make the address equal &__ehdr_start (true
value) and cause elf/ldconfig to segfault. glibc really should move away
from weak references, which do not have defined semantics.
Temporarily special case --no-dynamic-linker.
The --fix-cortex-a8 is sensitive to alignment and the precise destination
of branch instructions. These are not knowable at relocatable link time. We
follow GNU ld and the --fix-cortex-a53-843419 (D72968) by not patching the
code when there is a relocatable link.
Differential Revision: https://reviews.llvm.org/D73100
The code doesn't apply the fix correctly to relocatable links. I could
try to fix the code that applies the fix, but it's pointless: we don't
actually know what the offset will be in the final executable. So just
ignore the flag for relocatable links.
Issue discovered building Android.
Differential Revision: https://reviews.llvm.org/D72968
Suggested by Peter Collingbourne.
Non-VER_NDX_GLOBAL versions should not be assigned to defined symbols. --exclude-libs violates this and can cause a spurious error "cannot refer to absolute symbol" after D71795.
excludeLibs incorrectly assigns VER_NDX_LOCAL to an undefined weak symbol =>
isPreemptible is false =>
R_PLT_PC is optimized to R_PC =>
in isStaticLinkTimeConstant, an error is emitted.
Reviewed By: pcc, grimar
Differential Revision: https://reviews.llvm.org/D72681
This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang.
It adds Intel CET (Control-flow Enforcement Technology) support to lld.
The implementation follows the draft version of psABI which you can
download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI.
CET introduces a new restriction on indirect jump instructions so that
you can limit the places to which you can jump to using indirect jumps.
In order to use the feature, you need to compile source files with
-fcf-protection=full.
* IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt.
* SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified.
IBT-enabled executables/shared objects have two PLT sections, ".plt" and
".plt.sec". For the details as to why we have two sections, please read
the comments.
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D59780
One instance looks like a false positive:
lld/ELF/Relocations.cpp:1622:14: note: use reference type 'const std::pair<ThunkSection *, uint32_t> &' (aka 'cons
t pair<lld:🧝:ThunkSection *, unsigned int> &') to prevent copying
for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)
It is not changed in this commit.
D62381 introduced forEachSymbol(). It seems that many call sites cannot
be parallelized because the body shared some states. Replace
forEachSymbol with iterator_range<filter_iterator<...>> symbols() to
simplify code and improve debuggability (std::function calls take some
frames).
It also allows us to use early return to simplify code added in D69650.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D70505
In GNU ld, -Ttext sets the address of the .text section and -Ttext-segment sets the address of the text segment (RX).
gold only supports the -Ttext-segment semantic and treats -Ttext as an alias for -Ttext-segment.
lld only supports the -Ttext semantic and treats -Ttext-segment as an
alias for -Ttext. The text segment will be assigned to an address less
than the specified -Ttext-segment value.
This patch drops the -Ttext-segment alias.
The text segment is traditionally the first segment. Users who specify
-Ttext-segment may actually want to specify --image-base, the lld way to
express this. Unfortunately currently this is supported by GNU ld's
COFF port but not by its ELF port. gold does not support this option.
With -z separate-code, the behavior of GNU ld -Ttext-segment is weird (see https://sourceware.org/bugzilla/show_bug.cgi?id=25207)
rL289827 introduced the alias for linking qemu's non-pie user mode
binaries. As explained previously, this actually assigns the text
segment to an address less than 0x60000000. I feel that a better fix is
on the qemu side:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02480.html
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D70468
Remove the lld::enableColors function, as it just obscures which
stream it's affecting, and replace with explicit calls to the stream's
enable_colors.
Also, assign the stderrOS and stdoutOS globals first in link function,
just to ensure nothing might use them.
(Either change individually fixes the issue of using the old
stream, but both together seems best.)
Follow-up to b11386f9be.
Differential Revision: https://reviews.llvm.org/D70492
This change is for those who use lld as a library. Context:
https://reviews.llvm.org/D70287
This patch adds a new parmeter to lld::*::link() so that we can pass
an raw_ostream object representing stdout. Previously, lld::*::link()
took only an stderr object.
Justification for making stdoutOS and stderrOS mandatory: I wanted to
make link() functions to take stdout and stderr in that order.
However, if we change the function signature from
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stderrOS = llvm::errs());
to
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stdoutOS = llvm::outs(),
raw_ostream &stderrOS = llvm::errs());
, then the meaning of existing code that passes stderrOS silently
changes (stderrOS would be interpreted as stdoutOS). So, I chose to
make existing code not to compile, so that developers can fix their
code.
Differential Revision: https://reviews.llvm.org/D70292
Summary:
Add a flag `F_no_mmap` to `FileOutputBuffer` to support
`--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores
this flag for compatibility with GNU ld and gold.
We need this flag to speed up link time for large binaries in certain
scenarios. When we link some of our larger binaries we find that LLD
takes 50+ GB of memory, which causes memory pressure. The memory
pressure causes the VM to flush dirty pages of the output file to disk.
This is normally okay, since we should be flushing cold pages. However,
when using BtrFS with compression we need to write 128KB at a time when
we flush a page. If any page in that 128KB block is written again, then
it must be flushed a second time, and so on. Since LLD doesn't write
sequentially this causes write amplification. The same 128KB block will
end up being flushed multiple times, causing the linker to many times
more IO than necessary. We've observed 3-5x faster builds with
-no-mmap-output-file when we hit this scenario.
The bad scenario only applies to compressed filesystems, which group
together multiple pages into a single compressed block. I've tested
BtrFS, but the problem will be present for any compressed filesystem
on Linux, since it is caused by the VM.
Silently ignoring --no-mmap-output-file caused a silent regression when
we switched from gold to lld. We pass --no-mmap-output-file to fix this
edge case, but since lld silently ignored the flag we didn't realize it
wasn't being respected.
Benchmark building a 9 GB binary that exposes this edge case. I linked 3
times with --mmap-output-file and 3 times with --no-mmap-output-file and
took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM,
BtrFS mounted with -compress-force=zstd, and an 80% full disk.
| Mode | Time |
|---------|-------|
| mmap | 894 s |
| no mmap | 126 s |
When compression is disabled, BtrFS performs just as well with and
without mmap on this benchmark.
I was unable to reproduce the regression with any binaries in
lld-speed-test.
Reviewed By: ruiu, MaskRay
Differential Revision: https://reviews.llvm.org/D69294
Add a new '-z nognustack' option that suppresses emitting PT_GNU_STACK
segment. This segment is not supported at all on NetBSD (stack is
always non-executable), and the option is meant to be used to disable
emitting it.
Differential Revision: https://reviews.llvm.org/D56554
The combination of the two flags doesn't make sense. And other linkers
seem to just ignore --export-dynamic if --relocatable is given, but
we probably should report it as an error to let users know that is
an invalid combination.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43552
Differential Revision: https://reviews.llvm.org/D68441
llvm-svn: 374022
This makes it clear `ELF/**/*.cpp` files define things in the `lld::elf`
namespace and simplifies `elf::foo` to `foo`.
Reviewed By: atanasyan, grimar, ruiu
Differential Revision: https://reviews.llvm.org/D68323
llvm-svn: 373885
D64906 allows PT_LOAD to have overlapping p_offset ranges. In the
default R RX RW RW layout + -z noseparate-code case, we do not tail pad
segments when transiting to another segment. This can save at most
3*maxPageSize bytes.
a) Before D64906, we tail pad R, RX and the first RW.
b) With -z separate-code, we tail pad R and RX, but not the first RW (RELRO).
In some cases, b) saves one file page. In some cases, b) wastes one
virtual memory page. The waste is a concern on Fuchsia. Because it uses
compressed binaries, it doesn't benefit from the saved file page.
This patch adds -z separate-loadable-segments to restore the behavior before
D64906. It can affect section addresses and can thus be used as a
debugging mechanism (see PR43214 and ld.so partition bug in
crbug.com/998712).
Reviewed By: jakehehrlich, ruiu
Differential Revision: https://reviews.llvm.org/D67481
llvm-svn: 372807
Summary:
When support for ThinLTO was first added to lld, the options that
control it were prefixed with --plugin-opt= for compatibility with
an existing implementation as a linker plugin. This change enables
shorter versions of the options to be used, as follows:
New Existing
-thinlto-emit-imports-files --plugin-opt=thinlto-emit-imports-files
-thinlto-index-only --plugin-opt=thinlto-index-only
-thinlto-index-only= --plugin-opt=thinlto-index-only=
-thinlto-object-suffix-replace= --plugin-opt=thinlto-object-suffix-replace=
-thinlto-prefix-replace= --plugin-opt=thinlto-prefix-replace=
-lto-obj-path= --plugin-opt=obj-path=
The options with the --plugin-opt= prefix have been retained as aliases
for the shorter variants so that they continue to be accepted.
Reviewers: tejohnson, ruiu, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67782
llvm-svn: 372798
Fixes PR38748
mergeSections() calls getOutputSectionName() to get output section
names. Two MergeInputSections may be merged even if they are made
different by SECTIONS commands.
This patch moves mergeSections() after processSectionCommands() and
addOrphanSections() to fix the issue. The new pass is renamed to
OutputSection::finalizeInputSections().
processSectionCommands() and addorphanSections() are changed to add
sections to InputSectionDescription::sectionBases.
finalizeInputSections() merges MergeInputSections and migrates
`sectionBases` to `sections`.
For the -r case, we drop an optimization that tries keeping sh_entsize
non-zero. This is for the simplicity of addOrphanSections(). The
updated merge-entsize2.s reflects the change.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67504
llvm-svn: 372734
The --fix-cortex-a8 option implements a linker workaround for the
coretex-a8 erratum 657417. A summary of the erratum conditions is:
- A 32-bit Thumb-2 branch instruction B.w, Bcc.w, BL, BLX spans two
4KiB regions.
- The destination of the branch is to the first 4KiB region.
- The instruction before the branch is a 32-bit Thumb-2 non-branch
instruction.
The linker fix is to redirect the branch to a patch not in the first
4KiB region. The patch forwards the branch on to its target.
The cortex-a8, is an old CPU, with the first implementation of this
workaround in ld.bfd appearing in 2009. The cortex-a8 has been used in
early Android Phones and there are some critical applications that still
need to run on a cortex-a8 that have the erratum. The patch is applied
roughly 10 times on LLD and 20 on Clang when they are built with
--fix-cortex-a8 on an Arm system.
The formal erratum description is avaliable in the ARM Core Cortex-A8
(AT400/AT401) Errata Notice document. This is available from Arm on
request but it seems to be findable via a web search.
Differential Revision: https://reviews.llvm.org/D67284
llvm-svn: 371965
-z undefs is the inverse of -z defs. It allows unresolved references
from object files. This can be used to cancel --no-undefined or -z defs.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67479
llvm-svn: 371715
Recommit r370635 (reverted by r371202), with one change: move addOrphanSections() before ICF.
Before, orphan sections in two different partitions may be folded and
moved to the main partition.
Now, InputSection->OutputSection assignment for orphans happens before
ICF. ICF does not fold input sections with different output sections.
With the PR43241 reproduce,
`llvm-objcopy --extract-partition libvr.so libchrome__combined.so libvr.so` => no error
Updated description:
Fixes PR39418. Complements D47241 (the non-linker-script case).
processSectionCommands() assigns input sections to output sections.
ICF is called before it, so .text.foo and .text.bar may be folded even if
their output sections are made different by SECTIONS commands.
```
markLive<ELFT>()
doIcf<ELFT>() // During ICF, we don't know the output sections
writeResult()
combineEhSections<ELFT>()
script->processSectionCommands() // InputSection -> OutputSection assignment
```
This patch splits processSectionCommands() into processSectionCommands()
and processSymbolAssignments(), and moves
processSectionCommands()/addOrphanSections() before ICF:
```
markLive<ELFT>()
combineEhSections<ELFT>()
script->processSectionCommands()
script->addOrphanSections();
doIcf<ELFT>() // should remove folded input sections
writeResult()
script->processSymbolAssignments()
```
An alternative approach is to unfold a section `sec` in
processSectionCommands() when we find `sec` and `sec->repl` belong to
different output sections. I feel this patch is superior because this
can fold more sections and the decouple of
SectionCommand/SymbolAssignment gives flexibility:
* An ExprValue can't be evaluated before its section is assigned to an
output section -> we can delete getOutputSectionVA and simplify
another place where we had to check if the output section is null.
Moreover, a case in linkerscript/early-assign-symbol.s can be handled
now.
* processSectionCommands/processSymbolAssignments can be freely moved
around.
llvm-svn: 371216
Fixes PR39418. Complements D47241 (the non-linker-script case).
processSectionCommands() assigns input sections to output sections.
ICF is called before it, so .text.foo and .text.bar may be folded even if
their output sections are made different by SECTIONS commands.
```
markLive<ELFT>()
doIcf<ELFT>() // During ICF, we don't know the output sections
writeResult()
combineEhSections<ELFT>()
script->processSectionCommands() // InputSection -> OutputSection assignment
```
This patch splits processSectionCommands() into processSectionCommands() and
processSymbolAssignments(), and moves processSectionCommands() before ICF:
```
markLive<ELFT>()
combineEhSections<ELFT>()
script->processSectionCommands()
doIcf<ELFT>() // should remove folded input sections
writeResult()
script->processSymbolAssignments()
```
An alternative approach is to unfold a section `sec` in
processSectionCommands() when we find `sec` and `sec->repl` belong to
different output sections. I feel this patch is superior because this
can fold more sections and the decouple of
SectionCommand/SymbolAssignment gives flexibility:
* An ExprValue can't be evaluated before its section is assigned to an
output section -> we can delete getOutputSectionVA and simplify
another place where we had to check if the output section is null.
Moreover, a case in linkerscript/early-assign-symbol.s can be handled
now.
* processSectionCommands/processSymbolAssignments can be freely moved
around.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66717
llvm-svn: 370635
--strip-all suppresses the creation of in.symtab
This can cause a null pointer dereference in OutputSection::finalize()
// --emit-relocs => copyRelocs is true
if (!config->copyRelocs || (type != SHT_RELA && type != SHT_REL))
return;
...
link = in.symTab->getParent()->sectionIndex; // in.symTab is null
Let's just disallow the combination. In some cases the combination can
cause GNU linkers to fail:
* ld.bfd: final link failed: invalid operation
* gold: internal error in set_no_output_symtab_entry, at ../../gold/object.h:1814
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66704
llvm-svn: 369878
We prioritize non-* wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
This patch generalizes the rule to "*" of other versions and thus fixes PR40176.
I don't feel strongly about this GNU linkers' behavior but the
generalization simplifies code.
Delete `config->defaultSymbolVersion` which was used to special case
VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
In `SymbolTable::scanVersionScript`, custom versions are handled the same
way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge
`config->versionScript{Locals,Globals}` into `config->versionDefinitions`.
Overall this seems to simplify the code.
In `SymbolTable::assign{Exact,Wildcard}Versions`,
`sym->verdefIndex == config->defaultSymbolVersion` is changed to
`verdefIndex == UINT32_C(-1)`.
This allows us to give duplicate assignment diagnostics for
`{ global: foo; };` `V1 { global: foo; };`
In test/linkerscript/version-script.s:
vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65716
llvm-svn: 367869