- Don't subtract thunkSize from branchRange. Most places care about
the actual maximal branch range. Subtract thunkSize in the one place
that wants to leave room for a thunk.
- Set it to 0x800_0000 instead of 0xFF_FFFF
- Subtract 4 for the positive branch direction since it's a
two's complement 24bit number sign-extended mutiplied by 4,
so its range is -0x800_0000..+0x7FF_FFFC
- Make boundary checks include the boundary values
This doesn't make a huge difference in practice. It's preparation
for a "real" fix for PR51578 -- but it also lets the repro in comment 0
in that bug place one more thunk before hitting the TODO.
Differential Revision: https://reviews.llvm.org/D108897
Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it.
Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`.
FIXME:
* arm64-stubs.s test is broken
* add thunk tests
* Handle thunks to DylibSymbol in MergedOutputSection::finalize()
Differential Revision: https://reviews.llvm.org/D100818
ARM_RELOC_BR24 is used for BL/BLX instructions from within ARM (i.e. not
Thumb) code. This diff just handles the basic case: branches from ARM to
ARM, or from ARM to Thumb where no shimming is required. (See comments
in ARM.cpp for why shims are required.)
Note: I will likely be deprioritizing ARM work for the near future to
focus on other parts of LLD. Apologies for the half-done state of this;
I'm just trying to wrap up what I've already worked on.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D101814
This just parses the `-arch armv7` and emits the right header flags.
The rest will be slowly fleshed out in upcoming diffs.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101557
The minuend (but not the subtrahend) can reference a section.
Note that we do not yet properly validate that the subtrahend isn't
referencing a section; I've filed PR50034 to track that.
I've also extended the reloc-subtractor.s test to reorder symbols, to
make sure that the addends are being associated with the minuend (and not
the subtrahend) relocation.
Fixes PR49999.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D100804
MSVC from VSCode 2017 appears unhappy with it (causes an
internal compiler error.)
This also means that we need to avoid doing `sizeof(stubCode)` as
`sizeof(int[N])` on function array parameters decays into `sizeof(int *)`.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100605
arm64_32 uses 32-bit GOT loads, so we should accept those
instructions in `ARM64Common::relaxGotLoad()` too.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100229
From what I can tell, it's pretty similar to arm64. The two main differences
are:
1. No 64-bit relocations
2. Stub code writes to 32-bit registers instead of 64-bit
Plus of course the various on-disk structures like `segment_command` are using
the 32-bit instead of the 64-bit variants.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99822
arm64_32 uses 32-bit GOT loads, so we should accept those
instructions in `ARM64Common::relaxGotLoad()` too.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100229
From what I can tell, it's pretty similar to arm64. The two main differences
are:
1. No 64-bit relocations
2. Stub code writes to 32-bit registers instead of 64-bit
Plus of course the various on-disk structures like `segment_command` are using
the 32-bit instead of the 64-bit variants.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99822
It's likely redundant, per discussion with @gkm. The BYTE8
attribute covers the bit width requirement already.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100133
The main challenge was handling the different on-disk structures (e.g.
`mach_header` vs `mach_header_64`). I tried to strike a balance between
sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly)
and templatizing everything (causes code bloat, also ugly). I think I struck a
decent balance by judicious use of type erasure.
Note that LLD-ELF has a similar architecture, though it seems to use more templating.
Linking chromium_framework takes about the same time before and after this
change:
N Min Max Median Avg Stddev
x 20 4.52 4.67 4.595 4.5945 0.044423204
+ 20 4.5 4.71 4.575 4.582 0.056344803
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D99633
Within `lld/macho/`, only `InputFiles.cpp` and `Symbols.h` require the `macho::` namespace qualifier to disambiguate references to `class Symbol`.
Add braces to outer `for` of a 5-level single-line `if`/`for` nest.
Differential Revision: https://reviews.llvm.org/D99555
This diff required fixing `getEmbeddedAddend` to apply sign
extension to 32-bit values. We were previously passing around wrong
64-bit addend values that became "right" after being truncated back to
32-bit.
I've also made `getEmbeddedAddend` return a signed int, which is similar
to what LLD-ELF does for its `getImplicitAddend`.
`reportRangeError`, `checkUInt`, and `checkInt` are counterparts of similar
functions in LLD-ELF.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98387
SUBTRACTOR relocations are always paired with UNSIGNED
relocations to indicate a pair of symbols whose address difference we
want. Functionally they are like a single relocation: only one pointer
gets written / relocated. Previously, we would handle these pairs by
skipping over the SUBTRACTOR relocation and writing the pointer when
handling the UNSIGNED reloc. This diff reverses things, so we write
while handling SUBTRACTORs and skip over the UNSIGNED relocs instead.
Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the
write phase (i.e. inside `relocateOne`) is useful for the upcoming range
check diff: we want to check that SUBTRACTOR relocs write signed values,
but UNSIGNED relocs (naturally) write unsigned values.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98386
The previous implementation miscalculated the addend, resulting
in an underflow. This meant that every SIGNED_N section relocation would
be associated with the last subsection (since the addend would now be a
huge number). We were "lucky" that this mistake was typically cancelled
out -- 64-to-32-bit-truncation meant that the final value was correct,
as long as subsections were not rearranged.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D98385
With this, llvm-tblgen no longer tries and fails to allocate 7953 petabyte
when it runs during the build. Instead, `check-llvm` with lld/mac as host
linker now completes without any failures on an m1 mac.
This vector op handling code matches what happens in:
- ld64's OutputFile::applyFixUps() in OutputFile.cpp for kindStoreARM64PageOff12
- lld.ld64.darwinold's offset12KindFromInstruction() in
lld/lib/ReaderWriter/MachO/ArchHandler_arm64.cpp for offset12scale16
- RuntimeDyld's decodeAddend() in
llvm/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h for
ARM64_RELOC_PAGEOFF12
Fixes PR49444.
Differential Revision: https://reviews.llvm.org/D98053
On arm64, UNSIGNED relocs are the only ones that use embedded addends
instead of the ADDEND relocation.
Also ensure that the addend works when UNSIGNED is part of a SUBTRACTOR
pair.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D97105
Also add a few asserts to verify that we are indeed handling an
UNSIGNED relocation as the minued. I haven't made it an actual
user-facing error since I don't think llvm-mc is capable of generating
SUBTRACTOR relocations without an associated UNSIGNED.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D97103
`llvm-mc` doesn't generate any relocations for subtractions
between local symbols -- they must be global -- so the previous test
wasn't actually testing any relocation logic. I've fixed that and
extended the test to cover r_length=3 relocations as well as both x86_64
and arm64.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D97057
I've adjusted the RelocAttrBits to better fit the semantics of
the relocations. In particular:
1. *_UNSIGNED relocations are no longer marked with the `TLV` bit, even
though they can occur within TLV sections. Instead the `TLV` bit is
reserved for relocations that can reference thread-local symbols, and
*_UNSIGNED relocations have their own `UNSIGNED` bit. The previous
implementation caused TLV and regular UNSIGNED semantics to be
conflated, resulting in rebase opcodes being incorrectly emitted for TLV
relocations.
2. I've added a new `POINTER` bit to denote non-relaxable GOT
relocations. This distinction isn't important on x86 -- the GOT
relocations there are either relaxable or non-relaxable loads -- but
arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent
symbol is in (regardless of whether the symbol ends up in the GOT). This
relocation must reference a GOT symbol (so must have the `GOT` bit set)
but isn't itself relaxable (so must not have the `LOAD` bit). The
`POINTER` bit is used for relocations that *must* reference a GOT
slot.
3. A similar situation occurs for TLV relocations.
4. ld64 supports both a pcrel and an absolute version of
ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version
are pretty weird -- it results in the value of the GOT slot being
written, rather than the address. (That means a reference to a
dynamically-bound slot will result in zeroes being written.) The
programs I've tried linking don't use this form of the relocation, so
I've dropped our partial support for it by removing the relevant
RelocAttrBits.
Reviewed By: alexshap
Differential Revision: https://reviews.llvm.org/D97031
This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working.
I can add more tests to this base commit, or add them in follow-up commits.
It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated.
Differential Revision: https://reviews.llvm.org/D88629
Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes.
Many cleanups
Differential Revision: https://reviews.llvm.org/D95121
This is a refactor to pave the way for supporting paired-ADDEND for ARM64. The only paired reloc type for X86_64 is SUBTRACTOR. In a later diff, I will add SUBTRACTOR for both X86_64 and ARM64.
* s/`getImplicitAddend`/`getAddend`/ because it handles all forms of addend: implicit, explicit, paired.
* add predicate `bool isPairedReloc()`
* check range of `relInfo.r_symbolnum` is internal, unrelated to user-input, so use `assert()`, not `error()`
* minor cleanups & rearrangements in `InputFile::parseRelocations()`
Differential Revision: https://reviews.llvm.org/D90614
Their addresses are already encoded as section-relative offsets, so
there's no need to rebase them at runtime. {D85080} has some context
on the weirdness of TLV sections.
Fixes llvm.org/PR48491.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D93257
Speeds up linking Chromium's base_unittests almost 10%. According to ministat:
N Min Max Median Avg Stddev
x 5 0.72193289 0.73073196 0.72560811 0.72565799 0.0032265649
+ 5 0.64069581 0.67173195 0.65876389 0.65796089 0.011349451
Difference at 95.0% confidence
-0.0676971 +/- 0.0121682
-9.32906% +/- 1.67685%
(Student's t, pooled s = 0.00834328)
Differential Revision: https://reviews.llvm.org/D92734
Apparently this is used in real programs. I've handled this by reusing
the logic we already have for branch (function call) relocations.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D87852
* Implement rebase opcodes. Rebase opcodes tell dyld where absolute
addresses have been encoded in the binary. If the binary is not loaded
at its preferred address, dyld has to rebase these addresses by adding
an offset to them.
* Support `-pie` and use it to test rebase opcodes.
This is necessary for absolute address references in dylibs, bundles etc
to work.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D87199
We can have GOT_LOAD relocations that reference `__dso_handle`.
However, our binding opcode encoder doesn't support binding to the DSOHandle
symbol. Instead of adding support for that, I decided it would be cleaner to
implement GOT_LOAD relaxation since `__dso_handle`'s location is always
statically known.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D86641
Since there is no "weak lazy" lookup, function calls to weak symbols are
always non-lazily bound. We emit both regular non-lazy bindings as well
as weak bindings, in order that the weak bindings may overwrite the
non-lazy bindings if an appropriate symbol is found at runtime. However,
the bound addresses will still be written (non-lazily) into the
LazyPointerSection.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D86573
Previously, we were only emitting regular bindings to weak
dynamic symbols; this diff adds support for the weak bindings too, which
can overwrite the regular bindings at runtime. We also treat weak
defined global symbols similarly -- since they can also be interposed at
runtime, they need to be treated as potentially dynamic symbols.
Note that weak bindings differ from regular bindings in that they do not
specify the dylib to do the lookup in (i.e. weak symbol lookup happens
in a flat namespace.)
Differential Revision: https://reviews.llvm.org/D86572
Previously, the BindingEntry struct could only store bindings to offsets
within InputSections. Since the GOTSection and TLVPointerSections are
OutputSections, I handled those in a separate code path. However, this
makes it awkward to support weak bindings properly without code
duplication. This diff allows BindingEntries to point directly to
OutputSections, simplifying the upcoming weak binding implementation.
Along the way, I also converted a bunch of functions taking references
to symbols to take pointers instead. Given how much casting we do for
Symbol (especially in the upcoming weak binding diffs), it's cleaner
this way.
Differential Revision: https://reviews.llvm.org/D86571
References to symbols in dylibs work very similarly regardless of
whether the symbol is a TLV. The main difference is that we have a
separate `__thread_ptrs` section that acts as the GOT for these
thread-locals.
We can identify thread-locals in dylibs by a flag in their export trie
entries, and we cross-check it with the relocations that refer to them
to ensure that we are not using a GOT relocation to reference a
thread-local (or vice versa).
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D85081
Note: What ELF refers to as "TLS", Mach-O seems to refer to as "TLV", i.e.
thread-local variables.
This diff implements support for TLV relocations that reference defined
symbols. On x86_64, TLV relocations are always used with movq opcodes, so for
defined TLVs, we don't need to create a synthetic section to store the
addresses of the symbols -- we can just convert the `movq` to a `leaq`.
One notable quirk of Mach-O's TLVs is that absolute-address relocations
inside TLV-defining sections behave differently -- their addresses are
no longer absolute, but relative to the start of the target section.
(AFAICT, RIP-relative relocations are not allowed in these sections.)
Reviewed By: #lld-macho, compnerd, smeenai
Differential Revision: https://reviews.llvm.org/D85080
This diff adds support for weak definitions, though it doesn't handle weak
symbols in dylibs quite correctly -- we need to emit binding opcodes for them
in the weak binding section rather than the lazy binding section.
What *is* covered in this diff:
1. Reading the weak flag from symbol table / export trie, and writing it to the
export trie
2. Refining the symbol table's rules for choosing one symbol definition over
another. Wrote a few dozen test cases to make sure we were matching ld64's
behavior.
We can now link basic C++ programs.
Reviewed By: #lld-macho, compnerd
Differential Revision: https://reviews.llvm.org/D83532
Previously, we only supported binding dysyms to the GOT. This
diff adds support for binding them to any arbitrary section. C++
programs appear to use this, I believe for vtables and type_info.
This diff also makes our bind opcode encoding a bit smarter -- we now
encode just the differences between bindings, which will make things
more compact.
I was initially concerned about the performance overhead of iterating
over these relocations, but it turns out that the number of such
relocations is small. A quick analysis of my llvm-project build
directory showed that < 1.3% out of ~7M relocations are RELOC_UNSIGNED
bindings to symbols (including both dynamic and static symbols).
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D83103
Summary:
Turns out this case is actually really common -- it happens whenever there's
a reference to an `extern` variable that ends up statically linked.
Depends on D80856.
Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee
Reviewed By: smeenai
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80857
Summary:
As far as I can tell, it's identical to _GOT_LOAD. llvm-mc has the following
comment explaining why _GOT exists:
```
// x86_64 distinguishes movq foo@GOTPCREL so that the linker can
// rewrite the movq to an leaq at link time if the symbol ends up in
// the same linkage unit.
```
Depends on D80855.
Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee
Reviewed By: MaskRay, smeenai
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80856
Summary:
We should be reading / writing our addends / relocated addresses based on
r_length, and not just based on the type of the relocation. But since only
some r_length values are valid for a given reloc type, I've also added some
validation.
ld64 has code to allow for r_length = 0 in X86_64_RELOC_BRANCH relocs, but I'm
not sure how to create such a relocation...
Reviewed By: smeenai
Differential Revision: https://reviews.llvm.org/D80854
I considered making a `Target::validate()` method, but I wasn't sure how
I felt about the overhead of doing yet another switch-dispatch on the
relocation type, so I put the validation in `relocateOne` instead...
might be a bit of a micro-optimization, but `relocateOne` does assume
certain things about the relocations it gets, and this error handling
makes that explicit, so it's not a totally unreasonable code
organization.
Reviewed By: smeenai
Differential Revision: https://reviews.llvm.org/D80049