This reverts the PPC64PCRelLongBranchThunk part from D86706.
PPC64PCRelLongBranchThunk is the same as PPC64R12SetupStub.
Use `__gep_setup_` instead of `__long_branch_pcrel_` for the stub symbol name
as it more closely indicates the operation.
(Note: GNU ld uses `*.long_branch.*` and `*.plt_branch.*`).
Reviewed By: NeHuang, nemanjai
Differential Revision: https://reviews.llvm.org/D114656
There is a trend of having more optional options (usually security
hardening related) like -z cet-report=, -z bti-report=, -z force-bti.
If ld.lld 14.0.0 uses a warning, in 15/16/17/... timeframe when people
add new options to software, they can worry less about linker errors on ld.lld 14.0.0.
In some cases `-z foo` does essential work where a silent ignore can be
problematic, but the user has received a warning. From my observation, the
doing-essential-work `-z foo` is much fewer than the converse. In addition,
the user who cares can use `--fatal-warnings` (Note: GNU ld doesn't upgrade warnings to errors).
It is unclear whether we need something like `clang -Wunknown-warning-option`.
If we ever run into unfortunate transition like `-z start-stop-gc`, the
affected software (e.g. ldc is a compiler which passes linker options to the underlying ld)
can blindly add the `-z` option, without worrying it may cause a linker error to LLD 14.0.0.
Reviewed By: jrtc27, peter.smith
Differential Revision: https://reviews.llvm.org/D114748
For -z separate-code and -z separate-loadable-segments:
When RW is present, the RX to RW transition is aligned with max-page-size.
When RW is absent, the RX to non-SHF_ALLOC transition should use max-page-size as well.
Currently, LLD does not support the complete set of ARM group relocations.
Given that I intend to start using these in the Linux kernel [0], let's add
support for these.
This implements the group processing as documented in the ELF psABI. Notably,
this means support is dropped for very far symbol references that also carry a
small component, where the immediate is rotated in such a way that only part of
it wraps to the other end of the 32-bit word. To me, it seems unlikely that
this is something anyone could be relying on, but of course I could be wrong.
[0] https://lore.kernel.org/r/20211122092816.2865873-8-ardb@kernel.org/
Reviewed By: peter.smith, MaskRay
Differential Revision: https://reviews.llvm.org/D114172
This allows --power10-stubs= and --[no-]power10-stubs to override each other
(they are position dependent in GNU ld).
Also improve --help messages and the manpage.
Note: GNU ld's default "auto" mode uses heuristics to decide whether Power10
instructions are used. Arguably it is a design mistake of R_PPC64_REL24_NOTOC
(acked by the relevant folks on a libc-alpha discussion). We don't implement
"auto", so the default --power10-stubs is the same as "yes".
The canonical term is "extract" (GNU ld documentation, Solaris's `-z *extract`
options). Avoid inventing a term and match --why-extract. (ld64 prefers "load"
but the word is overloaded too much)
Mostly MFC, except for --help messages and the header row in
--print-archive-stats output.
The attribute 'r' allows (or disallows for the negative case) read-only
sections, i.e. ones without the SHF_WRITE flag, to be assigned to the
memory region. Before the patch, lld could put a section in the wrong
region or fail with "error: no memory region specified for section".
Differential Revision: https://reviews.llvm.org/D113771
The current TLSDESC optimization code assumes:
```
leaq x@tlsdesc(%rip), %rax
call *x@tlscall(%rax) # adjacent
```
From https://gitlab.freedesktop.org/mesa/mesa/-/issues/5665 , it seems that the
two instructions may not be adjacent in GCC 10's output:
```
leaq x@tlsdesc(%rip), %rax
something else
call *x@tlscall(%rax)
```
This patch supports the case. While here, support non-RAX registers for
R_X86_64_GOTPC32_TLSDESC, in case the compiler generates inefficient:
```
leaq x@tlsdesc(%rip), %rcx # or %rdx, %rbx, %rdi, ...
movq %rcx, %rax
call *x@tlscall(%rax) # GNU ld/gold error for non-RAX
```
Differential Revision: https://reviews.llvm.org/D114416
The section symbols aren't of much practical use when looking at
a linked image. This shrinks one observed mingw style unstripped
binary by 14%.
IMAGE_SYM_CLASS_LABEL is in spirit the same as a temporary assembler
label that isn't emitted on the object file level at all.
Differential Revision: https://reviews.llvm.org/D113866
Follup-up to D107533, where we replaced local syms with non-local.
It doesn't make sense to replace local symbol with lazy.
Differential Revision: https://reviews.llvm.org/D110040
Fix a null pointer dereference when .got.plt is discarded.
This also adds a test for discarding `.plt`.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D114180
Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.
Differential Revision: https://reviews.llvm.org/D114195
When aligning the start address of an output section introduces a gap between the current dot pointer
and the new aligned address, we were already properly expanding the memory region, if available.
D74286 introduced a new behavior to also align the LMA address if an LMA region is specified.
However, this did not expand the corresponding LMA region.
Now, we also expand the LMA region if it is set.
This fixes PR52510.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114166
ld64 doesn't warn on builds using `-install_name` if it's a bundle. But, the
current warning is nice to have because `install_name` only works with dylib.
To prevent an overflow of warnings in build logs and have parity with ld64,
create a `--warn-dylib-install-name` and `--warn-no-dylib-install-name` flag
that enables this LLD specific warning.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113534
Non-allocatable sections are not part of the memory image of the
program, so there is no need to find memory regions for them either
matching properties or handling explicit assignments. The early test
and return help to simplify LinkerScript::findMemoryRegion() a bit.
Differential Revision: https://reviews.llvm.org/D113768
autohide symbols behaves similarly to private_extern symbols.
However, LD64 allows exporting autohide symbols. LLD currently does not.
This patch allows LLD to export them.
Differential Revision: https://reviews.llvm.org/D113167
Similar to D113702, but for the LSDAs. Clang seems to emit all LSDA
relocs as section relocs, but ld -r can turn those relocs into symbol
ones.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113721
Dedup'ing unwind info is tricky because each CUE contains a different
function address, if ICF operated naively and compared the entire
contents of each CUE, entries with identical unwind info but belonging
to different functions would never be considered identical. To work
around this problem, we slice away the function address before
performing ICF. We rely on `relocateCompactUnwind()` to correctly handle
these truncated input sections.
Here are the numbers before and after D109944, D109945, and this diff
were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W:
Without any optimizations:
base diff difference (95% CI)
sys_time 0.849 ± 0.015 0.896 ± 0.012 [ +4.8% .. +6.2%]
user_time 3.357 ± 0.030 3.512 ± 0.023 [ +4.3% .. +5.0%]
wall_time 3.944 ± 0.039 4.032 ± 0.031 [ +1.8% .. +2.6%]
samples 40 38
With `-dead_strip`:
base diff difference (95% CI)
sys_time 0.847 ± 0.010 0.896 ± 0.012 [ +5.2% .. +6.5%]
user_time 3.377 ± 0.014 3.532 ± 0.015 [ +4.4% .. +4.8%]
wall_time 3.962 ± 0.024 4.060 ± 0.030 [ +2.1% .. +2.8%]
samples 47 30
With `-dead_strip` and `--icf=all`:
base diff difference (95% CI)
sys_time 0.935 ± 0.013 0.957 ± 0.018 [ +1.5% .. +3.2%]
user_time 3.472 ± 0.022 6.531 ± 0.046 [ +87.6% .. +88.7%]
wall_time 4.080 ± 0.040 5.329 ± 0.060 [ +30.0% .. +31.2%]
samples 37 30
Unsurprisingly, ICF is now a lot slower, likely due to the much larger
number of input sections it needs to process. But the rest of the
linker only suffers a mild slowdown.
Note that the compact-unwind-bad-reloc.s test was expanded because we
now handle the relocation for CUE's function address in a separate code
path from the rest of the CUE relocations. The extended test covers both
code paths.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D109946
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
Previously if you passed `-oso_prefix path/to/foo/` with a trailing
slash at the end, using `real_path` would remove that slash, but that
slash is necessary to make sure OSO prefix paths end up as valid
relative paths instead of starting with `/`.
Differential Revision: https://reviews.llvm.org/D113541
This brings back the original version of D81359.
I have found several use cases now.
* Unlike GNU ld, LLD's relocation processing is one pass. If we decide to
optimize(relax) R_X86_64_{,REX_}GOTPCRELX, we will suppress GOT generation and
cannot undo the decision later. Optimizing R_X86_64_REX_GOTPCRELX can usually
make it easy to hit `relocation R_X86_64_REX_GOTPCRELX out of range` because
the distance to GOT is usually shorter. Without --no-relax, the user has to
recompile with `-Wa,-mrelax-relocations=no`.
* The option would help during my investigationg of the root cause of https://git.kernel.org/linus/09e43968db40c33a73e9ddbfd937f46d5c334924
* There is need for relaxation for AArch64 & RISC-V. Implementing this for
x86-64 improves consistency with little target-specific cost (two-line
X86_64.cpp change).
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113615
Clang seems to emit all functionAddress relocs as section relocs, but
`ld -r` can turn those relocs into symbol ones. It turns out that we
weren't handling that case correctly when the symbol was a weak def
whose definition did not prevail.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113702
This change implements support for R_ARM_THM_JUMP8 relocation in
addition to R_ARM_THM_JUMP11 which is already supported by LLD.
Differential Revision: https://reviews.llvm.org/D21225
An orphan section should be placed in the same memory region as its
anchor section if the latter specifies the memory region explicitly.
If there is no explicit assignment for the anchor section in the linker
script, its memory region is selected by matching attributes, and the
same should be done for the orphan section.
Before the patch, some scripts that were handled smoothly in GNU ld
caused an "error: no memory region specified for section" in lld.
Differential Revision: https://reviews.llvm.org/D112925
Previously, our unwind info finalization logic assumed that the LSDA
section referenced by `__compact_unwind` was already finalized before
`__TEXT,__unwind_info` itself. However, that assumption could be broken
by the use of `-rename_section` -- it could be (and is) used to move
`__gcc_except_tab` it into a different segment later in the file.
(__TEXT is always the first non-zerofill segment, so any rename
basically guarantees that the section will be ordered after
`__unwind_info`.)
To handle this case, we compare LSDA relocations instead of their final
values in `UnwindInfoSection::finalize()`, and we actually relocate
those LSDAs in `UnwindInfoSection::writeTo()`. In order to do this, we
need an easy way to track which Symbol a given CUE corresponds to. My
solution was to change our `cuPtrVector` into a vector of indices, with
each index used for both the symbols vector (`symbolsVec`) as well as
the CUE vector (`cuVector`).
This change seems perf neutral. Numbers for linking chromium_framework
on my 16 core Mac Pro:
base diff difference (95% CI)
sys_time 1.248 ± 0.025 1.245 ± 0.026 [ -1.3% .. +0.8%]
user_time 3.588 ± 0.045 3.587 ± 0.037 [ -0.6% .. +0.5%]
wall_time 4.605 ± 0.069 4.595 ± 0.069 [ -1.0% .. +0.5%]
samples 42 26
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D113582
PR52408 reported an sh_info=0 instance. I have seen sh_info=0
independently before.
sh_info>=num_sections is probably very rare. Just use one diagnostic for
the two types of errors.
Delete invalid-relocations.test which is covered by invalid/bad-reloc-target.test
Differential Revision: https://reviews.llvm.org/D113466
PR/52372
Differential Revision: https://reviews.llvm.org/D112977
New changes:
- use llvm-otool instead of `otool` which doesn't in exist on non-OSX platforms
- add llvm-otool to the set of tools used by test so that the bot will use the <build_dir>/bin/llvm-otool instead of the unqualified `llvm-otool` (which may not exist)
- update tests since the latest (TOT) llvm-otool prints a space between two bytes and the old one doesn't.
This undocumented ld64 flag, based on the most recent ld64 source dump
from Xcode 12, only applies to i386. It seems like on all newer
architectures this behavior is the default.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113070
By default with ld64, architecture mismatches are just warnings, then
this flag can be passed to make these fail. This matches that behavior.
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D113082
D101513 means that we no longer need to specify `-pie` in most of our
test RUN commands. Let's clean up the unused flags so as not to confuse
future test writers.
Reviewed By: #lld-macho, oontvoo, MaskRay
Differential Revision: https://reviews.llvm.org/D113114
I'm not sure what the history is here but this test passes on macOS
today. It seems like we should unify these tests if they need to run
cross platform.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113085
This matches ld64, and it's conceivable that projects try to read
this information off stderr for that reason.
--version keeps writing to stdout.
Differential Revision: https://reviews.llvm.org/D113020
One fewer warning.
In practice, lld already "implements" it. (ie., it does not do dtrace-dof processing ever).
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D112934
LLD_IN_TEST determines how many times each port's `main` function is
run in each LLD process, and setting LLD_IN_TEST=2 (or higher) is useful
for checking if we're cleaning up and resetting global state correctly.
Add a test suite parameter to enable this easily. There's work in
progress to remove global state (e.g. D108850), but this seems useful in
the interim.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D112898
`fatal` should only be used for malformed inputs according to
ErrorHandler.h; `error` is more appropriate for missing arguments,
accompanied by a check to bail out early in case of the error. Some
tests need to be adjusted accordingly.
Makes `lld/test/MachO/arch.s` pass with `LLD_IN_TEST=2`.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D112879