D62381 introduced forEachSymbol(). It seems that many call sites cannot
be parallelized because the body shared some states. Replace
forEachSymbol with iterator_range<filter_iterator<...>> symbols() to
simplify code and improve debuggability (std::function calls take some
frames).
It also allows us to use early return to simplify code added in D69650.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D70505
Currently LLD always use zlib compression level 6.
This patch changes it to use 1 for -O0, -O1 and 6 for -O2.
It fixes https://bugs.llvm.org/show_bug.cgi?id=44089.
There was also a thread in llvm-dev on this topic:
https://lists.llvm.org/pipermail/llvm-dev/2018-August/125020.html
Here is a table with results of building clang mentioned there:
```
Level Time Size
0 0m17.128s 2045081496 Z_NO_COMPRESSION
1 0m31.471s 922618584 Z_BEST_SPEED
2 0m32.659s 903642376
3 0m36.749s 890805856
4 0m41.532s 876697184
5 0m48.383s 862778576
6 1m3.176s 855251640 Z_DEFAULT_COMPRESSION
7 1m15.335s 853755920
8 2m0.561s 852497560
9 2m33.972s 852397408 Z_BEST_COMPRESSION
```
It shows that it is probably not reasonable to use values greater than 6.
Differential revision: https://reviews.llvm.org/D70658
In GNU ld, -Ttext sets the address of the .text section and -Ttext-segment sets the address of the text segment (RX).
gold only supports the -Ttext-segment semantic and treats -Ttext as an alias for -Ttext-segment.
lld only supports the -Ttext semantic and treats -Ttext-segment as an
alias for -Ttext. The text segment will be assigned to an address less
than the specified -Ttext-segment value.
This patch drops the -Ttext-segment alias.
The text segment is traditionally the first segment. Users who specify
-Ttext-segment may actually want to specify --image-base, the lld way to
express this. Unfortunately currently this is supported by GNU ld's
COFF port but not by its ELF port. gold does not support this option.
With -z separate-code, the behavior of GNU ld -Ttext-segment is weird (see https://sourceware.org/bugzilla/show_bug.cgi?id=25207)
rL289827 introduced the alias for linking qemu's non-pie user mode
binaries. As explained previously, this actually assigns the text
segment to an address less than 0x60000000. I feel that a better fix is
on the qemu side:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02480.html
Reviewed By: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D70468
Remove the lld::enableColors function, as it just obscures which
stream it's affecting, and replace with explicit calls to the stream's
enable_colors.
Also, assign the stderrOS and stdoutOS globals first in link function,
just to ensure nothing might use them.
(Either change individually fixes the issue of using the old
stream, but both together seems best.)
Follow-up to b11386f9be.
Differential Revision: https://reviews.llvm.org/D70492
Summary:
Current versions of clang would erroneously emit this relocation not only
against functions (loaded from the GOT) but also against data symbols
(e.g. a table of function pointers). LLD was then changing this into a
branch-and-link instruction, causing the program to jump to the data
symbol at run time. I discovered this problem when attempting to boot
MIPS64 FreeBSD after updating the to the latest upstream master.
Reviewers: atanasyan, jrtc27, espindola
Reviewed By: atanasyan
Subscribers: emaste, sdardis, krytarowski, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70406
Based on D70020 by serge-sans-paille.
The ELF spec says:
> Furthermore, there may be internal references among these sections that would not make sense if one of the sections were removed or replaced by a duplicate from another object. Therefore, such groups must be included or omitted from the linked object as a unit. A section cannot be a member of more than one group.
GNU ld has 2 behaviors that we don't have:
- Group members (nextInSectionGroup != nullptr) are subject to garbage collection.
This includes non-SHF_ALLOC SHT_NOTE sections.
In particular, discarding non-SHF_ALLOC SHT_NOTE sections is an expected behavior by the Annobin
project. See
https://developers.redhat.com/blog/2018/02/20/annobin-storing-information-binaries/
for more information.
- Groups members are retained or discarded as a unit.
Members may have internal references that are not expressed as
SHF_LINK_ORDER, relocations, etc. It seems that we should be more conservative here:
if a section is marked live, mark all the other member within the
group.
Both behaviors are reasonable. This patch implements them.
A new field InputSectionBase::nextInSectionGroup tracks the next member
within a group. on ELF64, this increases sizeof(InputSectionBase) froms
144 to 152.
InputSectionBase::dependentSections tracks section dependencies, which
is used by both --gc-sections and /DISCARD/. We can't overload it for
the "next member" semantic, because we should allow /DISCARD/ to discard
sections independent of --gc-sections (GNU ld behavior). This behavior
may be reasonably used by `/DISCARD/ : { *(.ARM.exidx*) }` or `/DISCARD/
: { *(.note*) }` (new test `linkerscript/discard-group.s`).
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D70146
This change is for those who use lld as a library. Context:
https://reviews.llvm.org/D70287
This patch adds a new parmeter to lld::*::link() so that we can pass
an raw_ostream object representing stdout. Previously, lld::*::link()
took only an stderr object.
Justification for making stdoutOS and stderrOS mandatory: I wanted to
make link() functions to take stdout and stderr in that order.
However, if we change the function signature from
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stderrOS = llvm::errs());
to
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stdoutOS = llvm::outs(),
raw_ostream &stderrOS = llvm::errs());
, then the meaning of existing code that passes stderrOS silently
changes (stderrOS would be interpreted as stdoutOS). So, I chose to
make existing code not to compile, so that developers can fix their
code.
Differential Revision: https://reviews.llvm.org/D70292
The patch in https://reviews.llvm.org/D64077 causes a build failure
because both the Defined and SharedSymbol classes are bigger than 80
bytes on MinGW 8.
This patch fixes this build failure by changing the type of the
bitfields. It is a similar change to the bitfield changes in
https://reviews.llvm.org/D64238, but instead of changing to bool I
decided to use uint8_t because one of the bitfields takes up two bits
instead of one.
Note: the patch is slightly different from the one reviewed in
Phabricator, but it is a trivial change to align it with LLVM master
instead of LLVM 9. Also, it passes all lld tests.
Differential Revision: https://reviews.llvm.org/D70266
The definition may be mangled while an undefined reference is not.
This may come up when (1) the reference is from a C file or (2) the definition
misses an extern "C".
(2) is more common. Suggest an arbitrary mangled name that matches the
undefined reference, if such a definition exists.
ld.lld: error: undefined symbol: foo
>>> referenced by a.o:(.text+0x1)
>>> did you mean to declare foo(int) as extern "C"?
>>> defined in: a1.o
Reviewed By: dblaikie, ruiu
Differential Revision: https://reviews.llvm.org/D69650
When missing an extern "C" declaration, an undefined reference may be
mangled while the definition is not. Suggest the missing
extern "C" and the base name.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D69592
The logic added in r372781 caused ARMExidxSyntheticSection::addSection()
to return false for exidx sections without a link order dep that passed
isValidExidxSectionDep(). This included exidx sections for empty functions. As
a result, such exidx sections would end up treated like ordinary sections and
would end up being laid out before the ARMExidxSyntheticSection, most likely in
the wrong order relative to the exidx entries in the ARMExidxSyntheticSection,
breaking the orderedness invariant relied upon by unwinders. Fix this by
simply discarding such sections.
Differential Revision: https://reviews.llvm.org/D69744
Summary:
Add a flag `F_no_mmap` to `FileOutputBuffer` to support
`--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores
this flag for compatibility with GNU ld and gold.
We need this flag to speed up link time for large binaries in certain
scenarios. When we link some of our larger binaries we find that LLD
takes 50+ GB of memory, which causes memory pressure. The memory
pressure causes the VM to flush dirty pages of the output file to disk.
This is normally okay, since we should be flushing cold pages. However,
when using BtrFS with compression we need to write 128KB at a time when
we flush a page. If any page in that 128KB block is written again, then
it must be flushed a second time, and so on. Since LLD doesn't write
sequentially this causes write amplification. The same 128KB block will
end up being flushed multiple times, causing the linker to many times
more IO than necessary. We've observed 3-5x faster builds with
-no-mmap-output-file when we hit this scenario.
The bad scenario only applies to compressed filesystems, which group
together multiple pages into a single compressed block. I've tested
BtrFS, but the problem will be present for any compressed filesystem
on Linux, since it is caused by the VM.
Silently ignoring --no-mmap-output-file caused a silent regression when
we switched from gold to lld. We pass --no-mmap-output-file to fix this
edge case, but since lld silently ignored the flag we didn't realize it
wasn't being respected.
Benchmark building a 9 GB binary that exposes this edge case. I linked 3
times with --mmap-output-file and 3 times with --no-mmap-output-file and
took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM,
BtrFS mounted with -compress-force=zstd, and an 80% full disk.
| Mode | Time |
|---------|-------|
| mmap | 894 s |
| no mmap | 126 s |
When compression is disabled, BtrFS performs just as well with and
without mmap on this benchmark.
I was unable to reproduce the regression with any binaries in
lld-speed-test.
Reviewed By: ruiu, MaskRay
Differential Revision: https://reviews.llvm.org/D69294
Add a new '-z nognustack' option that suppresses emitting PT_GNU_STACK
segment. This segment is not supported at all on NetBSD (stack is
always non-executable), and the option is meant to be used to disable
emitting it.
Differential Revision: https://reviews.llvm.org/D56554
sections, but the current code misses certain variants. In particular, those
named when clang takes the code path in
clang/lib/Driver/ToolChain.cpp:416, where crtfiles are named:
clang_rt.<component>-<arch>-<env>.<suffix>
Previously, the code only handled:
clang_rt.<component>.<suffix>
<component>.<suffix>
This revision fixes that.
Fix PR43767
In -r mode, when processing a SHT_REL[A] that relocates a SHF_MERGE, sec->getRelocatedSection() is a
MergeInputSection and its parent is an OutputSection but is asserted to
be a SyntheticSection (MergeSyntheticSection) in LinkerScript.cpp:addInputSec().
##
The code path is not exercised in non -r mode because the relocated
section changed from MergeInputSection to InputSection.
Reorder the code to make the non -r logic apply to -r as well, thus fix
the crash.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D69364
The only condition that isecLoc becomes null is
Out::bufferStart == nullptr,
isec->getParent()->offset == 0, and
isec->outSecOff == 0.
We can check the first condition only once.
llvm-svn: 374332
isecLoc there can be null, but at the same time isec->getSize() may
be non-null. It is UB to offset a nullptr.The most straight-forward fix
here appears to perform casts+normal integral arithmetics.
FAIL: lld :: ELF/invalid/invalid-relocation-aarch64.test (1158 of 2217)
******************** TEST 'lld :: ELF/invalid/invalid-relocation-aarch64.test' FAILED ********************
Script:
--
: 'RUN: at line 2'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/yaml2obj /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-aarch64.test -o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-aarch64.test.tmp.o
: 'RUN: at line 3'; not /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld.lld /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-aarch64.test.tmp.o -o /dev/null 2>&1 | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-aarch64.test
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-aarch64.test:4:10: error: CHECK: expected string not found in input
# CHECK: error: unknown relocation (1024) against symbol foo
^
<stdin>:1:1: note: scanning from here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/ELF/Target.cpp💯41: runtime error: applying non-zero offset 24 to null pointer
^
<stdin>:1:118: note: possible intended match here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/ELF/Target.cpp💯41: runtime error: applying non-zero offset 24 to null pointer
^
--
********************
Testing: 0.. 10.. 20.. 30.. 40.. 50.
FAIL: lld :: ELF/invalid/invalid-relocation-x64.test (1270 of 2217)
******************** TEST 'lld :: ELF/invalid/invalid-relocation-x64.test' FAILED ********************
Script:
--
: 'RUN: at line 2'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/yaml2obj /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-x64.test -o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp1.o
: 'RUN: at line 3'; echo ".global foo; foo:" > /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp2.s
: 'RUN: at line 4'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llvm-mc /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp2.s -o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp2.o -filetype=obj -triple x86_64-pc-linux
: 'RUN: at line 5'; not /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld.lld /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp1.o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/lld/test/ELF/invalid/Output/invalid-relocation-x64.test.tmp2.o -o /dev/null 2>&1 | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-x64.test
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/test/ELF/invalid/invalid-relocation-x64.test:6:10: error: CHECK: expected string not found in input
# CHECK: error: unknown relocation (152) against symbol foo
^
<stdin>:1:1: note: scanning from here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/ELF/Target.cpp💯41: runtime error: applying non-zero offset 24 to null pointer
^
<stdin>:1:118: note: possible intended match here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/lld/ELF/Target.cpp💯41: runtime error: applying non-zero offset 24 to null pointer
^
--
********************
Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
Testing Time: 20.73s
********************
Failing Tests (2):
lld :: ELF/invalid/invalid-relocation-aarch64.test
lld :: ELF/invalid/invalid-relocation-x64.test
llvm-svn: 374329
The combination of the two flags doesn't make sense. And other linkers
seem to just ignore --export-dynamic if --relocatable is given, but
we probably should report it as an error to let users know that is
an invalid combination.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43552
Differential Revision: https://reviews.llvm.org/D68441
llvm-svn: 374022
This makes it clear `ELF/**/*.cpp` files define things in the `lld::elf`
namespace and simplifies `elf::foo` to `foo`.
Reviewed By: atanasyan, grimar, ruiu
Differential Revision: https://reviews.llvm.org/D68323
llvm-svn: 373885
This allows us to delete `using namespace llvm::support::endian` and
simplify D68323. This change adds runtime config->endianness check but
the overhead should be negligible.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D68561
llvm-svn: 373884
Before, SecToClusters[*] was used to track the belonged cluster.
During a merge (From -> Into), every element of From has to be updated.
Use a union-find set to speed up this use case.
Also, replace `std::vector<int> Sections;` with a doubly-linked
pointers: int Next, Prev;
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D46228
llvm-svn: 373708
Our .interp section is not a SyntheticSection. As a result, it terminates the
loop in removeUnusedSyntheticSections(). This has at least two consequences:
- The synthetic .bss and .bss.rel.ro sections are always present in
dynamically linked executables, even when they are not needed.
- The synthetic .ARM.exidx (and possibly other) sections are always present
in partitions other than the last one, even when not needed.
.ARM.exidx in particular is problematic because it assumes that its
list of code sections is non-empty in getLinkOrderDep(), which can
lead to a crash if the partition does not have any code sections.
Fix these problems by moving the creation of the .interp sections to the
top of createSyntheticSections(). While here, make the code a little less
error-prone by changing the add() lambdas to take a SyntheticSection instead
of an InputSectionBase.
Differential Revision: https://reviews.llvm.org/D68256
llvm-svn: 373347
Merging SHF_LINK_ORDER sections can affect semantics if the sh_link
fields point to different sections.
Specifically, for SHF_LINK_ORDER sections, the sh_link field acts as a reverse
dependency from the linked section, causing the SHF_LINK_ORDER section to
be included if the linked section is included. Merging sections with different
sh_link fields will cause the entire contents of the SHF_LINK_ORDER section
to be associated with a single (arbitrarily chosen) output section, whereas the
correct semantics are for the individual pieces of the SHF_LINK_ORDER section
to be associated with their linked output sections. As a result we can end up
incorrectly dropping SHF_LINK_ORDER section contents or including the wrong
section contents, depending on which linked sections were chosen.
Differential Revision: https://reviews.llvm.org/D68094
llvm-svn: 373255
Instead of returning an optional, just return the input string if
demangling fails, as that's what all callers use anyway.
Differential Revision: https://reviews.llvm.org/D68015
llvm-svn: 373077
Fixes PR43461 (regression caused by D67504)
The partition field of a SECTIONS-specified section is not set after
D67504. The 0 value affects findSection() which checks if the partition
field is 1.
So `Out::initArray = findSection(".init_array")` is null, and
DT_INIT_ARRAYSZ is not set.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D68087
llvm-svn: 372996
The R_MIPS_JALR relocation denotes jalr/jr instructions in position
independent code. Both these instructions take a target's address from
the $25 register. If offset to the target symbol fits into the 18-bits,
it's more efficient to replace jalr/jr by bal/b instructions.
Differential Revision: https://reviews.llvm.org/D68057
llvm-svn: 372951
D64906 allows PT_LOAD to have overlapping p_offset ranges. In the
default R RX RW RW layout + -z noseparate-code case, we do not tail pad
segments when transiting to another segment. This can save at most
3*maxPageSize bytes.
a) Before D64906, we tail pad R, RX and the first RW.
b) With -z separate-code, we tail pad R and RX, but not the first RW (RELRO).
In some cases, b) saves one file page. In some cases, b) wastes one
virtual memory page. The waste is a concern on Fuchsia. Because it uses
compressed binaries, it doesn't benefit from the saved file page.
This patch adds -z separate-loadable-segments to restore the behavior before
D64906. It can affect section addresses and can thus be used as a
debugging mechanism (see PR43214 and ld.so partition bug in
crbug.com/998712).
Reviewed By: jakehehrlich, ruiu
Differential Revision: https://reviews.llvm.org/D67481
llvm-svn: 372807
Summary:
When support for ThinLTO was first added to lld, the options that
control it were prefixed with --plugin-opt= for compatibility with
an existing implementation as a linker plugin. This change enables
shorter versions of the options to be used, as follows:
New Existing
-thinlto-emit-imports-files --plugin-opt=thinlto-emit-imports-files
-thinlto-index-only --plugin-opt=thinlto-index-only
-thinlto-index-only= --plugin-opt=thinlto-index-only=
-thinlto-object-suffix-replace= --plugin-opt=thinlto-object-suffix-replace=
-thinlto-prefix-replace= --plugin-opt=thinlto-prefix-replace=
-lto-obj-path= --plugin-opt=obj-path=
The options with the --plugin-opt= prefix have been retained as aliases
for the shorter variants so that they continue to be accepted.
Reviewers: tejohnson, ruiu, espindola
Reviewed By: ruiu
Subscribers: emaste, arichardson, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67782
llvm-svn: 372798
When /DISCARD/ is used on an input section, that input section may have
a .ARM.exidx metadata section that depends on it. As the discard handling
comes after the .ARM.exidx synthetic section is created we need to make
sure that we account for the case where the .ARM.exidx output section
should be removed because there are no more live input sections.
Differential Revision: https://reviews.llvm.org/D67848
llvm-svn: 372781
D67504 removed uses of `assigned` from OutputSection::addSection, which
makes `assigned` purely used in processSectionCommands() and its
callees. By replacing its references with `parent`, we can remove
`assigned`.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67531
llvm-svn: 372735
Fixes PR38748
mergeSections() calls getOutputSectionName() to get output section
names. Two MergeInputSections may be merged even if they are made
different by SECTIONS commands.
This patch moves mergeSections() after processSectionCommands() and
addOrphanSections() to fix the issue. The new pass is renamed to
OutputSection::finalizeInputSections().
processSectionCommands() and addorphanSections() are changed to add
sections to InputSectionDescription::sectionBases.
finalizeInputSections() merges MergeInputSections and migrates
`sectionBases` to `sections`.
For the -r case, we drop an optimization that tries keeping sh_entsize
non-zero. This is for the simplicity of addOrphanSections(). The
updated merge-entsize2.s reflects the change.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67504
llvm-svn: 372734
In case of linking binary blobs which do not have any ELF headers, we can
deduce MIPS ABI ELF header flags from an `emulation` option.
Patch by Kyle Evans.
llvm-svn: 372513
Summary:
If st_link(A)=B, and A has the SHF_LINK_ORDER flag, we may dereference
a null pointer if B is garbage collected (PR43147):
1. In Wrter.cpp:compareByFilePosition, `aOut->sectionIndex` or `bOut->sectionIndex`
2. In OutputSections::finalize, `d->getParent()->sectionIndex`
Simply error and bail out to avoid null pointer dereferences. ld.bfd has
a similar error:
sh_link of section `.bar' points to discarded section `.foo0' of `a.o'
ld.bfd is more permissive in that it just checks whether the linked-to
section of the first input section is discarded. This is likely because
it sets sh_link of the output section according to the first input
section.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D67761
llvm-svn: 372400
D67284 introduced ARMErrataFix.cpp which was derived from
AArch64ErrataFix.cpp. There were some useful refactoring changes made to
ARMErrataFix.cpp made as part of the review. This change applies the
relevant changes back to AArch64ErrataFix.cpp.
Main changes are:
- Old style variable names in comments like IS, are now new style isec.
- Simplify init() collection of mappingSymbols to always start with a code
mapping symbol.
- Simplify logic in mergeCmp().
- Fix one 80 column overflow caused by IS -> isec transformation.
Differential Revision: https://reviews.llvm.org/D67622
llvm-svn: 372094
Provide a missing initializer to get rid of warning provoking buildbot
failures.
error: missing field 'rel' initializer
[-Werror,-Wmissing-field-initializers]
llvm-svn: 371970
The --fix-cortex-a8 option implements a linker workaround for the
coretex-a8 erratum 657417. A summary of the erratum conditions is:
- A 32-bit Thumb-2 branch instruction B.w, Bcc.w, BL, BLX spans two
4KiB regions.
- The destination of the branch is to the first 4KiB region.
- The instruction before the branch is a 32-bit Thumb-2 non-branch
instruction.
The linker fix is to redirect the branch to a patch not in the first
4KiB region. The patch forwards the branch on to its target.
The cortex-a8, is an old CPU, with the first implementation of this
workaround in ld.bfd appearing in 2009. The cortex-a8 has been used in
early Android Phones and there are some critical applications that still
need to run on a cortex-a8 that have the erratum. The patch is applied
roughly 10 times on LLD and 20 on Clang when they are built with
--fix-cortex-a8 on an Arm system.
The formal erratum description is avaliable in the ARM Core Cortex-A8
(AT400/AT401) Errata Notice document. This is available from Arm on
request but it seems to be findable via a web search.
Differential Revision: https://reviews.llvm.org/D67284
llvm-svn: 371965
If there is no readonly section, we map:
* The ELF header at imageBase+maxPageSize
* Program headers at imageBase+maxPageSize+sizeof(Ehdr)
* The first section .text at imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)
Due to the interaction between Writer<ELFT>::fixSectionAlignments and
LinkerScript::allocateHeaders,
`alignDown(p_vaddr(R PT_LOAD)) = alignDown(p_vaddr(RX PT_LOAD))`.
The RX PT_LOAD will override the R PT_LOAD at runtime, which is not ideal:
```
// PHDR at 0x401034, should be 0x400034
PHDR 0x000034 0x00401034 0x00401034 0x000a0 0x000a0 R 0x4
// R PT_LOAD contains just Ehdr and program headers.
// At 0x401000, should be 0x400000
LOAD 0x000000 0x00401000 0x00401000 0x000d4 0x000d4 R 0x1000
LOAD 0x0000d4 0x004010d4 0x004010d4 0x00001 0x00001 R E 0x1000
```
* createPhdrs allocates the headers to the R PT_LOAD.
* fixSectionAlignments assigns `imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)` (formula: `alignTo(dot, maxPageSize) + dot % config->maxPageSize`) to addrExpr of .text
* allocateHeaders computes the minimum address among SHF_ALLOC sections, i.e. addr(.text)
* allocateHeaders sets address of ELF header to `addr(.text)-sizeof(Ehdr)-sizeof(program headers) = imageBase+maxPageSize`
The main observation is that when the SECTIONS command is not used, we
don't have to call allocateHeaders. This requires an assumption that
the presence of PT_PHDR and addresses of headers can be decided
regardless of address information.
This may seem natural because dot is not manipulated by a linker script.
The other thing is that we have to drop the special rule for -T<section>
in `getInitialDot`. If -Ttext is smaller than the image base, the headers
will not be allocated with the old behavior (allocateHeaders is called)
but always allocated with the new behavior.
The behavior change is not a problem. Whether and where headers are
allocated can vary among linkers, or ld.bfd across different versions
(--enable-separate-code or not). It is thus advised to use a linker
script with the PHDRS command to have a consistent behavior across
linkers. If PT_PHDR is needed, an explicit --image-base can be a simpler
alternative.
Differential Revision: https://reviews.llvm.org/D67325
llvm-svn: 371957
ICF is performed after EhInputSections and MergeInputSections were
eliminated from inputSections. Every element of inputSections is an
InputSection.
llvm-svn: 371744
-z undefs is the inverse of -z defs. It allows unresolved references
from object files. This can be used to cancel --no-undefined or -z defs.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67479
llvm-svn: 371715
```
part.phdrs = script->hasPhdrsCommands() ? script->createPhdrs() : createPhdrs(part);
```
createPhdrs() allocates a PT_PHDR and a PF_R PT_LOAD, which will be
deleted later in LinkerScript::allocateHeaders, but leave a gap between
the program headers and the first section. Don't allocate the segments
to avoid the gap. PT_INTERP is likely not needed as well.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67324
llvm-svn: 371398
Summary:
ld.bfd produces an output with --noinhibit-exec when an ASSERT fails.
Use errorOrWarn() so that we can produce an output as well.
An interesting case is that symbol assignments may execute multiple
times, so we probably want to suppress errors for non-final runs.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D67285
llvm-svn: 371225
Recommit r370635 (reverted by r371202), with one change: move addOrphanSections() before ICF.
Before, orphan sections in two different partitions may be folded and
moved to the main partition.
Now, InputSection->OutputSection assignment for orphans happens before
ICF. ICF does not fold input sections with different output sections.
With the PR43241 reproduce,
`llvm-objcopy --extract-partition libvr.so libchrome__combined.so libvr.so` => no error
Updated description:
Fixes PR39418. Complements D47241 (the non-linker-script case).
processSectionCommands() assigns input sections to output sections.
ICF is called before it, so .text.foo and .text.bar may be folded even if
their output sections are made different by SECTIONS commands.
```
markLive<ELFT>()
doIcf<ELFT>() // During ICF, we don't know the output sections
writeResult()
combineEhSections<ELFT>()
script->processSectionCommands() // InputSection -> OutputSection assignment
```
This patch splits processSectionCommands() into processSectionCommands()
and processSymbolAssignments(), and moves
processSectionCommands()/addOrphanSections() before ICF:
```
markLive<ELFT>()
combineEhSections<ELFT>()
script->processSectionCommands()
script->addOrphanSections();
doIcf<ELFT>() // should remove folded input sections
writeResult()
script->processSymbolAssignments()
```
An alternative approach is to unfold a section `sec` in
processSectionCommands() when we find `sec` and `sec->repl` belong to
different output sections. I feel this patch is superior because this
can fold more sections and the decouple of
SectionCommand/SymbolAssignment gives flexibility:
* An ExprValue can't be evaluated before its section is assigned to an
output section -> we can delete getOutputSectionVA and simplify
another place where we had to check if the output section is null.
Moreover, a case in linkerscript/early-assign-symbol.s can be handled
now.
* processSectionCommands/processSymbolAssignments can be freely moved
around.
llvm-svn: 371216
```
Writer<ELFT>::run
assignFileOffsets
setFileOffset
computeFileOffset
os->ptLoad->p_align may be smaller than config->maxPageSize
setPhdrs
p_align = max(p_align, config->maxPageSize)
```
If we move the config->maxPageSize logic to the constructor of
PhdrEntry, computeFileOffset can be simplified.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67211
llvm-svn: 371085
Previously, segments were aligned according to their first section's
alignment requirements. That was not correct, but segments are also
aligned to a page boundary, and a page boundary is usually much larger
than a section alignment requirement, so no one noticed this bug before.
Now, lld has --nmagic option which sets maxPageSize to 1 to effectively
disable page alignment, which reveals the issue.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43212
Differential Revision: https://reviews.llvm.org/D67152
llvm-svn: 371013
Fixes PR43214.
The size of SHT_RELR may oscillate between 2 numbers (see D53003 for a
similar --pack-dyn-relocs=android issue). This can happen if the shrink
of SHT_RELR causes it to take more words to encode relocation offsets
(this can happen with thunks or segments with overlapping p_offset
ranges), and the expansion of SHT_RELR causes it to take fewer words to
encode relocation offsets.
To avoid the issue, add padding 1s to the end of the relocation section
if its size would decrease. Trailing 1s do not decode to more relocations.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D67164
llvm-svn: 370923
Non-undefined symbols with Levenshtein distance 1 or a transposition are
suggestion candidates. This is probably good enough and it can suggest
some missing/superfluous qualifiers: const, restrict, volatile, & and &&
ref-qualifier, e.g.
error: undefined symbol: foo(int*)
>>> referenced by b.o:(.text+0x1)
+>>> did you mean: foo(int const*)
+>>> defined in: a.o
error: undefined symbol: foo(int*&)
>>> referenced by b.o:(.text+0x1)
+>>> did you mean: foo(int*)
+>>> defined in: b.o
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67039
llvm-svn: 370853
Fixes PR39418. Complements D47241 (the non-linker-script case).
processSectionCommands() assigns input sections to output sections.
ICF is called before it, so .text.foo and .text.bar may be folded even if
their output sections are made different by SECTIONS commands.
```
markLive<ELFT>()
doIcf<ELFT>() // During ICF, we don't know the output sections
writeResult()
combineEhSections<ELFT>()
script->processSectionCommands() // InputSection -> OutputSection assignment
```
This patch splits processSectionCommands() into processSectionCommands() and
processSymbolAssignments(), and moves processSectionCommands() before ICF:
```
markLive<ELFT>()
combineEhSections<ELFT>()
script->processSectionCommands()
doIcf<ELFT>() // should remove folded input sections
writeResult()
script->processSymbolAssignments()
```
An alternative approach is to unfold a section `sec` in
processSectionCommands() when we find `sec` and `sec->repl` belong to
different output sections. I feel this patch is superior because this
can fold more sections and the decouple of
SectionCommand/SymbolAssignment gives flexibility:
* An ExprValue can't be evaluated before its section is assigned to an
output section -> we can delete getOutputSectionVA and simplify
another place where we had to check if the output section is null.
Moreover, a case in linkerscript/early-assign-symbol.s can be handled
now.
* processSectionCommands/processSymbolAssignments can be freely moved
around.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66717
llvm-svn: 370635
Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=998712
SHT_LLVM_PART_EHDR marks the start of a partition. The partition
sections will be extracted to a separate file. Align to the next maximum
page size boundary so that we can find the ELF header at the start. We
cannot benefit from overlapping p_offset ranges with the previous
segment anyway.
It seems we lack some llvm-objcopy --extract-main-partition and
--extract-partition sanity checks. It may place EHDR at the start
even if p_offset if non zero. Anyway, the lld change is justified for
the reasons above.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D67032
llvm-svn: 370629
D64136 and D65584, while fixing STB_WEAK issues and improving our
compatibility with ld.bfd, can cause another STB_WEAK problem related to
LTO:
If %tundef.o has an undefined reference on f,
and %tweakundef.o has a weak undefined reference on f,
%tdef.o has a definition of f
```
ld.lld %tundef.o %tweakundef.o --start-lib %tdef.o --end-lib
```
1) `%tundef.o` doesn't set the `referenced` bit.
2) `%weakundef.o` changes the binding from STB_GLOBAL to STB_WEAK
3) `%tdef.o` is not fetched because the binding is weak.
Step (1) is incorrect. This patch sets the `referenced` bit of Undefined
created by bitcode files.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66992
llvm-svn: 370437
Port the D64906 technique to RISC-V. It deletes 3 alignments at
PT_LOAD boundaries for the default case: the size of a RISC-V binary
decreases by at most 12kb.
llvm-svn: 370192
This essentially reverts the code change of D63132 and switches to a simpler approach.
In an executable/shared object, st_shndx of a symbol can be:
1) SHN_UNDEF: undefined symbol (or canonical PLT)
2) SHN_ABS: absolute symbol
3) any other value (usually a regular section index) represents a relative symbol.
The actual value does not matter.
Many ld.so (musl, all archs except MIPS of FreeBSD rtld-elf) even treat 2) and 3)
the same. If .sdata does not exist, it does not matter what value/section
__global_pointer$ has, as long as it is relative (otherwise there will be a pedantic
lld error. See D63132). Just set the st_shndx arbitrarily to 1.
Dummy st_shndx=1 may be used by __rela_iplt_start, linker-script-defined symbols outside a section, __dso_handle, etc.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66798
llvm-svn: 370172
Port the D64906 technique to ARM. It deletes 3 alignments at
PT_LOAD boundaries for the default case: the size of an arm binary
decreases by at most 12kb.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D66749
llvm-svn: 370049
EhFrameSection::addSection checks liveness of FDE early. This makes it
infeasible to move combineEhSections() before ICF.
Postpone the check to EhFrameSection::finalizeContents(). This is what
ARMExidxSyntheticSection does and it will make a subsequent patch D66717
simpler.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66727
llvm-svn: 369890
PR42990. For `SECTIONS { b = a; . = 0xff00 + (a >> 8); a = .; }`,
we currently set st_value(a)=0xff00 while st_value(b)=0xffff.
The following call tree demonstrates the problem:
```
link<ELF64LE>(Args);
Script->declareSymbols(); // insert a and b as absolute Defined
Writer<ELFT>().run();
Script->processSectionCommands();
addSymbol(cmd); // a and b are re-inserted. LinkerScript::getSymbolValue
// is lazily called by subsequent evaluation
finalizeSections();
forEachRelSec(scanRelocations<ELFT>);
processRelocAux // another problem PR42506, not affected by this patch
finalizeAddressDependentContent(); // loop executed once
script->assignAddresses(); // a = 0, b = 0xff00
script->assignAddresses(); // a = 0xff00, _end = 0xffff
```
We need another assignAddresses() to finalize the value of `a`.
This patch
1) modifies assignAddress() to track the original section/value of each
symbol and return a symbol whose section/value has changed.
2) moves the post-finalizeSections assignAddress() inside the loop
of finalizeAddressDependentContent() and makes it iterative.
Symbol assignment may not converge so we make a few attempts before
bailing out.
Note, assignAddresses() must be called at least twice. The penultimate
call finalized section addresses while the last finalized symbol values.
It is somewhat obscure and there was no comment.
linkerscript/addr-zero.test tests this.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66279
llvm-svn: 369889
--strip-all suppresses the creation of in.symtab
This can cause a null pointer dereference in OutputSection::finalize()
// --emit-relocs => copyRelocs is true
if (!config->copyRelocs || (type != SHT_RELA && type != SHT_REL))
return;
...
link = in.symTab->getParent()->sectionIndex; // in.symTab is null
Let's just disallow the combination. In some cases the combination can
cause GNU linkers to fail:
* ld.bfd: final link failed: invalid operation
* gold: internal error in set_no_output_symtab_entry, at ../../gold/object.h:1814
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D66704
llvm-svn: 369878
Reported at https://reviews.llvm.org/D64930#1642223
If the only section of a PT_LOAD is a SHT_NOBITS section (e.g. .bss), we
may not align its sh_offset. p_offset of the PT_LOAD will be set to
sh_offset, and we will get p_offset!=p_vaddr (mod p_align). If such
executable is mapped by the Linux kernel, it will segfault.
After D64906, this may happen the non-linker script case.
The linker script case has had this issue for a long time.
This was fixed by rL321657 (but the test linkerscript/nobits-offset.s
failed to test a SHT_NOBITS section), but broken by rL345154.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D66658
llvm-svn: 369828
Building on D60557 mention the name of the linker generated contents of
the reproduce archive, response.txt and version.txt.
Also write a shorter description in the ld.lld --help that is closer to
the documentation.
Differential Revision: https://reviews.llvm.org/D66641
llvm-svn: 369762
This fixed a bug in r369488. When config->isRela is false, i->r_addend
is not initialized (see encodeDynamicReloc). So we should check
config->isRela before accessing r_addend:
- if (j - i < 3 || i->r_addend)
+ if (j - i < 3 || (config->isRela && i->r_addend != 0))
Original description:
Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.
The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.
Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.
As a simple correctness check, I've also built x86_64 Android and booted
successfully.
Differential Revision: https://reviews.llvm.org/D65242
Patch by Vic Yang
llvm-svn: 369507
Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.
The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.
Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.
As a simple correctness check, I've also built x86_64 Android and booted
successfully.
Differential Revision: https://reviews.llvm.org/D66491
Patch by Vic Yang!
llvm-svn: 369488
Ported the D64906 technique to EM_386.
If `sh_addralign(.tdata) < sh_addralign(.tbss)`,
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`.
ld.so that are known to have problems if p_vaddr%p_align!=0:
* FreeBSD 13.0-CURRENT rtld-elf
* glibc https://sourceware.org/bugzilla/show_bug.cgi?id=24606
New test i386-tls-vaddr-align.s checks our workaround makes p_vaddr%p_align = 0.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65865
llvm-svn: 369347
Ported the D64906 technique to AArch64. It deletes 3 alignments at
PT_LOAD boundaries for the default case: the size of an aarch64 binary
decreases by at most 192kb.
If `sh_addralign(.tdata) < sh_addralign(.tbss)`,
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`.
ld.so that are known to have problems if p_vaddr%p_align!=0:
* musl<=1.1.22
* FreeBSD 13.0-CURRENT (and before) rtld-elf arm64
New test aarch64-tls-vaddr-align.s checks that our workaround makes p_vaddr%p_align = 0.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64930
llvm-svn: 369344
This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.
Alternatively, if we advance to 0x20020, the new PT_LOAD will have
p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap. At runtime,
p_vaddr will be rounded down by pagesize (65536 if
pagesize=maxPageSize). This PT_LOAD will load additional initial
contents from p_offset ranges [0x10000,0x10020), which will also be
loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
effect or if we are not transiting between executable and non-executable
segments.
ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero.
The updated formula takes account of that factor.
2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
Fix them. See the updated comments in InputSection.cpp for details.
On targets that we enable the technique (only PPC64 now),
we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
if `sh_addralign(.tdata) < sh_addralign(.tbss)`
This exposes many problems in ld.so implementations, especially the
offsets of dynamic TLS blocks. Known issues:
FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in
subsequent patches.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D64906
llvm-svn: 369343
After D66007/r369262, if the control flow reaches `if (sym.isUndefined())`, we know:
* The relocation is not a link-time constant => symbol is preemptable => Undefined or SharedSymbol
* Not an undef weak.
* -no-pie.
* The symbol type is neither STT_OBJECT nor STT_FUNC.
ld.lld --export-dynamic --unresolved-symbols=ignore-all %t.o can satisfy
these conditions. Delete the isUndefined() test so that we error
`symbol '...' has no type`, because we don't know the type to make the
decision to create copy relocation/canonical PLT.
llvm-svn: 369271
In processRelocAux(), we handle errors before copy relocation/canonical PLT.
This makes error checking a bit complex because we have to check for
conditions that will be allowed by copy relocation/canonical PLT.
Instead, move copy relocation/canonical PLT before error checking. This
simplifies the previous clumsy error checking code
`config->shared || (config->pie && expr == R_ABS && type != target->symbolicRel)`
to the simple `config->isPic`. Some diagnostics can be reported in
different ways. The code motion changes diagnostics for some contrived
test cases:
* copy-rel-pie-error.s -> copy-rel-pie2.s:
It was rejected before but accepted now. ld.bfd also accepts the case.
* copy-errors.s: "cannot preempt symbol" changes to "symbol 'bar' has no type"
* got32{,x}-i386.s: the suggestion changes from "-fPIC or -Wl,-z,notext" to "-fPIE"
* x86-64-dyn-rel-error5.s: one diagnostic changes for -pie case
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D66007
llvm-svn: 369262