Currently, LLD does not support the complete set of ARM group relocations.
Given that I intend to start using these in the Linux kernel [0], let's add
support for these.
This implements the group processing as documented in the ELF psABI. Notably,
this means support is dropped for very far symbol references that also carry a
small component, where the immediate is rotated in such a way that only part of
it wraps to the other end of the 32-bit word. To me, it seems unlikely that
this is something anyone could be relying on, but of course I could be wrong.
[0] https://lore.kernel.org/r/20211122092816.2865873-8-ardb@kernel.org/
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D114172
This allows --power10-stubs= and --[no-]power10-stubs to override each other
(they are position dependent in GNU ld).
Also improve --help messages and the manpage.
Note: GNU ld's default "auto" mode uses heuristics to decide whether Power10
instructions are used. Arguably it is a design mistake of R_PPC64_REL24_NOTOC
(acked by the relevant folks on a libc-alpha discussion). We don't implement
"auto", so the default --power10-stubs is the same as "yes".
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)
Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
BaseCommand was picked when PHDRS/INSERT/etc were not implemented. Rename it to
SectionCommand to match `sectionCommands` and make it clear that the commands
are used in SECTIONS (except a special case for SymbolAssignment).
Also, improve naming of some BaseCommand variables (base -> cmd).
This partially reverts r315409: the description applies to LinkerScript, but not
to OutputSection.
The name "sectionCommands" is used in both LinkerScript::sectionCommands and
OutputSection::sectionCommands, which may lead to confusion.
"commands" in OutputSection has no ambiguity because there are no other types
of commands.
The attribute 'r' allows (or disallows for the negative case) read-only
sections, i.e. ones without the SHF_WRITE flag, to be assigned to the
memory region. Before the patch, lld could put a section in the wrong
region or fail with "error: no memory region specified for section".
Differential Revision: https://reviews.llvm.org/D113771
The current TLSDESC optimization code assumes:
```
leaq x@tlsdesc(%rip), %rax
call *x@tlscall(%rax) # adjacent
```
From https://gitlab.freedesktop.org/mesa/mesa/-/issues/5665 , it seems that the
two instructions may not be adjacent in GCC 10's output:
```
leaq x@tlsdesc(%rip), %rax
something else
call *x@tlscall(%rax)
```
This patch supports the case. While here, support non-RAX registers for
R_X86_64_GOTPC32_TLSDESC, in case the compiler generates inefficient:
```
leaq x@tlsdesc(%rip), %rcx # or %rdx, %rbx, %rdi, ...
movq %rcx, %rax
call *x@tlscall(%rax) # GNU ld/gold error for non-RAX
```
Differential Revision: https://reviews.llvm.org/D114416
The section symbols aren't of much practical use when looking at
a linked image. This shrinks one observed mingw style unstripped
binary by 14%.
IMAGE_SYM_CLASS_LABEL is in spirit the same as a temporary assembler
label that isn't emitted on the object file level at all.
Differential Revision: https://reviews.llvm.org/D113866
std::vector can have different sizes depending on the STL's debug level,
so account for its size separately. (You could argue that we should be
accounting for all the other members separately as well, but that would
be very unergonomic, and std::vector is the only one that's caused
problems so far.)
Follup-up to D107533, where we replaced local syms with non-local.
It doesn't make sense to replace local symbol with lazy.
Differential Revision: https://reviews.llvm.org/D110040
Fix a null pointer dereference when .got.plt is discarded.
This also adds a test for discarding `.plt`.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D114180
Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.
Differential Revision: https://reviews.llvm.org/D114195
When aligning the start address of an output section introduces a gap between the current dot pointer
and the new aligned address, we were already properly expanding the memory region, if available.
D74286 introduced a new behavior to also align the LMA address if an LMA region is specified.
However, this did not expand the corresponding LMA region.
Now, we also expand the LMA region if it is set.
This fixes PR52510.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114166
ld64 doesn't warn on builds using `-install_name` if it's a bundle. But, the
current warning is nice to have because `install_name` only works with dylib.
To prevent an overflow of warnings in build logs and have parity with ld64,
create a `--warn-dylib-install-name` and `--warn-no-dylib-install-name` flag
that enables this LLD specific warning.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113534
In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here.
Differential Revision: https://reviews.llvm.org/D114017
As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit
unfortunate to move the asserts away from the structs whose sizes
they're checking, but it's a far better developer experience when one of
the asserts is violated, because you get a single error instead of every
single source file including the header erroring out.
The `r_address` field of `relocation_info` is only 4 bytes, so our
offset field (which is the `r_address` field adjusted for subsection
splitting) also only needs to be 4 bytes. This reduces the structure
size from 32 bytes to 24 bytes.
Combined with https://reviews.llvm.org/D113813, this is a minor perf
improvement for linking an internal app, tested on two machines:
```
smol-relocs baseline difference (95% CI)
sys_time 7.367 ± 0.138 7.543 ± 0.157 [ +0.9% .. +3.8%]
user_time 21.843 ± 0.351 21.861 ± 0.450 [ -1.3% .. +1.4%]
wall_time 20.301 ± 0.307 20.556 ± 0.324 [ +0.1% .. +2.4%]
samples 16 16
smol-relocs baseline difference (95% CI)
sys_time 2.923 ± 0.050 2.992 ± 0.018 [ +1.4% .. +3.4%]
user_time 10.345 ± 0.039 10.448 ± 0.023 [ +0.8% .. +1.2%]
wall_time 12.068 ± 0.071 12.229 ± 0.021 [ +1.0% .. +1.7%]
samples 15 12
```
More importantly though, this change by itself reduces our maximum
resident set size by 220 MB (2.75%, from 7.85 GB to 7.64 GB) on the
first machine. On the second machine, it reduces it by 125 MB (1.94%,
from 6.31 GB to 6.19 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113818
We can lay out Symbol more optimally to reduce its size from 56 bytes to
48 bytes by eliminating unnecessary padding, and we can lay out Defined
such that its bitfield members are placed in the tail padding of Symbol
(on ABIs which support this), to reduce it from 96 bytes to 80 bytes (8
bytes from the Symbol reduction, and 8 bytes from the tail padding
reuse).
This is perf-neutral for an internal app (results from two different
machines):
```
smol-syms baseline difference (95% CI)
sys_time 7.430 ± 0.202 7.440 ± 0.193 [ -2.6% .. +2.9%]
user_time 21.443 ± 0.513 21.206 ± 0.396 [ -3.3% .. +1.1%]
wall_time 20.453 ± 0.534 20.222 ± 0.488 [ -3.7% .. +1.5%]
samples 9 8
smol-syms baseline difference (95% CI)
sys_time 3.011 ± 0.050 3.040 ± 0.052 [ -0.4% .. +2.3%]
user_time 10.416 ± 0.075 10.496 ± 0.091 [ +0.1% .. +1.4%]
wall_time 12.229 ± 0.144 12.354 ± 0.192 [ -0.1% .. +2.1%]
samples 14 13
```
However, on the first machine, it reduces maximum resident set size by
65.9 MB (0.8%, from 7.92 GB to 7.85 GB). On the second machine, it
reduces it by 92 MB (1.4%, from 6.40 GB to 6.31 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113813
It was checking for 64-bit builds incorrectly. Unfortunately,
ConcatInputSection has grown a bit in the meantime, and I don't see any
obvious way to shrink it. Perhaps icfEqClass could use 32-bit hashes
instead of 64-bit ones, but xxHash64 is supposed to be much faster than
xxHash32 (https://github.com/Cyan4973/xxHash#benchmarks), so that sounds
like a loss. (Unrelatedly, we should really look at using XXH3 instead
of xxHash64 now.)
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113809
This is an NFC diff that prepares for pruning & relocating `__eh_frame`.
Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.
Differential Revision: https://reviews.llvm.org/D113241
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with merged in these comments.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113903
Non-allocatable sections are not part of the memory image of the
program, so there is no need to find memory regions for them either
matching properties or handling explicit assignments. The early test
and return help to simplify LinkerScript::findMemoryRegion() a bit.
Differential Revision: https://reviews.llvm.org/D113768
```
/Users/ksmiley/dev/llvm-project/lld/MachO/Symbols.cpp:43:27: warning: field 'external' will be initialized after field 'weakDefCanBeHidden' [-Wreorder-ctor]
weakDef(isWeakDef), external(isExternal),
^
1 warning generated.
```
Differential Revision: https://reviews.llvm.org/D113823
autohide symbols behaves similarly to private_extern symbols.
However, LD64 allows exporting autohide symbols. LLD currently does not.
This patch allows LLD to export them.
Differential Revision: https://reviews.llvm.org/D113167
(Split from D113167)
Benchmarking on one of our large apps which exports a few thousands symbols,
this showed an improvement of ~17%.
x ./LLD_no_parallel.txt
+ ./LLD_with_parallel.txt
N Min Max Median Avg Stddev
x 10 84.01 89.41 88.64 87.693 1.7424061
+ 10 71.9 74.29 72.63 72.753 0.77734663
Difference at 95.0% confidence
-14.94 +/- 1.26763
-17.0367% +/- 1.44553%
(Student's t, pooled s = 1.34912)
(wallclock)
Differential Revision: https://reviews.llvm.org/D113820
Similar to D113702, but for the LSDAs. Clang seems to emit all LSDA
relocs as section relocs, but ld -r can turn those relocs into symbol
ones.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113721
Dedup'ing unwind info is tricky because each CUE contains a different
function address, if ICF operated naively and compared the entire
contents of each CUE, entries with identical unwind info but belonging
to different functions would never be considered identical. To work
around this problem, we slice away the function address before
performing ICF. We rely on `relocateCompactUnwind()` to correctly handle
these truncated input sections.
Here are the numbers before and after D109944, D109945, and this diff
were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W:
Without any optimizations:
base diff difference (95% CI)
sys_time 0.849 ± 0.015 0.896 ± 0.012 [ +4.8% .. +6.2%]
user_time 3.357 ± 0.030 3.512 ± 0.023 [ +4.3% .. +5.0%]
wall_time 3.944 ± 0.039 4.032 ± 0.031 [ +1.8% .. +2.6%]
samples 40 38
With `-dead_strip`:
base diff difference (95% CI)
sys_time 0.847 ± 0.010 0.896 ± 0.012 [ +5.2% .. +6.5%]
user_time 3.377 ± 0.014 3.532 ± 0.015 [ +4.4% .. +4.8%]
wall_time 3.962 ± 0.024 4.060 ± 0.030 [ +2.1% .. +2.8%]
samples 47 30
With `-dead_strip` and `--icf=all`:
base diff difference (95% CI)
sys_time 0.935 ± 0.013 0.957 ± 0.018 [ +1.5% .. +3.2%]
user_time 3.472 ± 0.022 6.531 ± 0.046 [ +87.6% .. +88.7%]
wall_time 4.080 ± 0.040 5.329 ± 0.060 [ +30.0% .. +31.2%]
samples 37 30
Unsurprisingly, ICF is now a lot slower, likely due to the much larger
number of input sections it needs to process. But the rest of the
linker only suffers a mild slowdown.
Note that the compact-unwind-bad-reloc.s test was expanded because we
now handle the relocation for CUE's function address in a separate code
path from the rest of the CUE relocations. The extended test covers both
code paths.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D109946
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
Previously if you passed `-oso_prefix path/to/foo/` with a trailing
slash at the end, using `real_path` would remove that slash, but that
slash is necessary to make sure OSO prefix paths end up as valid
relative paths instead of starting with `/`.
Differential Revision: https://reviews.llvm.org/D113541
This brings back the original version of D81359.
I have found several use cases now.
* Unlike GNU ld, LLD's relocation processing is one pass. If we decide to
optimize(relax) R_X86_64_{,REX_}GOTPCRELX, we will suppress GOT generation and
cannot undo the decision later. Optimizing R_X86_64_REX_GOTPCRELX can usually
make it easy to hit `relocation R_X86_64_REX_GOTPCRELX out of range` because
the distance to GOT is usually shorter. Without --no-relax, the user has to
recompile with `-Wa,-mrelax-relocations=no`.
* The option would help during my investigationg of the root cause of https://git.kernel.org/linus/09e43968db40c33a73e9ddbfd937f46d5c334924
* There is need for relaxation for AArch64 & RISC-V. Implementing this for
x86-64 improves consistency with little target-specific cost (two-line
X86_64.cpp change).
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113615
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
This change implements support for R_ARM_THM_JUMP8 relocation in
addition to R_ARM_THM_JUMP11 which is already supported by LLD.
Differential Revision: https://reviews.llvm.org/D21225