GNU ld.bfd supports linking directly against DLLs without using an
import library, and some projects have picked up on this habit.
(There's no one single unsurmountable issue with using import
libraries, but this is a regularly surfacing missing feature.)
As long as one is linking by name (instead of by ordinal), the DLL
export table contains most of the information needed. (One can
inspect what section a symbol points at, to see if it's a function
or data symbol. The practical implementation of this loops over all
sections for each symbol, but as long as they're not very many, that
should hopefully be tolerable performance wise.)
One exception where the information in the DLL isn't entirely enough
is on i386 with stdcall functions; depending on how they're done,
the exported function name can be a plain undecorated name, while
the import library would contain the full decorated symbol name. This
issue is addressed separately in a different patch.
This is implemented mimicing the structure of a regular import library,
with one InputFile corresponding to the static archive that just adds
lazy symbols, which then are fetched when they are needed. When such
a symbol is fetched, we synthesize a coff_import_header structure
in memory and create a regular ImportFile out of it.
The implementation could be even smaller by just creating ImportFiles
for every symbol available immediately, but that would have the
drawback of actually ending up importing all symbols unless running
with GC enabled (and mingw mode defaults to having it disabled for
historical reasons).
Differential Revision: https://reviews.llvm.org/D104530
`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.
With that in place, we then run the section through ICF.
This change is about perf-neutral when linking chromium_framework.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D105045
Previously, we only applied the renames to
ConcatOutputSections.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105079
For
```
SECTIONS {
text.0 : {}
text.1 : {}
text.2 : {}
} INSERT AFTER .data;
```
the current order is `.data text.2 text.1 text.0`. It makes more sense to
preserve the specified order and thus improve compatibility with GNU ld.
For
```
SECTIONS { text.0 : {} } INSERT AFTER .data;
SECTIONS { text.3 : {} } INSERT AFTER .data;
```
GNU ld somehow collects sections with `INSERT AFTER .data` together (IMO
inconsistent) but I think it makes more sense to execute the commands in order
and get `.data text.3 text.0` instead.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D105158
See the comment for my understanding of -no-pie and -shared expectation.
-no-pie has freedom on choices. We choose dynamic relocations to be consistent
with the handling of GOT-generating relocations.
Note: GNU ld has arch-varying behaviors and its x86 -pie has a very
complex rule:
if there is at least one GOT-generating or PLT-generating relocation and
-z dynamic-undefined-weak (enabled by default) is in effect, generate a
dynamic relocation.
We don't emulate its rule.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D105164
A couple of filecheck patterns had not been hooked up with
the patterns suffering from some drift. As this test is old
and llvm-objdump has improved a lot, take this opportunity to
hide the instruction encoding. I've also taken out a lot of
the explanatory comments that llvm-objdump improvements make
redundant, as these comments oftern don't get updated when addresses
change.
Differential Revision: https://reviews.llvm.org/D104907
There are a couple of problems with the code to patch
unrelocated BLX instructions:
1. The calculation of the PC needs to take into account
the alignment of the instruction. The Thumb BLX
uses alignDown(PC, 4) for the source address.
2. The calculation of the PC bias is hard-coded to 4
which works for Thumb, but when there is a BLX the
branch will be in Arm state so it needs an 8 byte
PC bias.
No asssembler generates an unrelocated BLX instruction
so these problems do not affect real world programs.
However we should still fix them.
Differential Revision: https://reviews.llvm.org/D104905
SymtabSection::emitStabs() writes the symbol table in the order
of externalSymbols, which has the order of symtab->getSymbols(),
which is just the order symbols are added to the symbol table.
In practice, symbols in the symbol files of input .o files are
sorted, but since that's not guaranteed we sort them in
ObjFile::parseSymbols(). To make sure several symbols with the same
address keep the order they're in the input file, we have to use
stable_sort().
In practice, std::sort() on already-sorted inputs won't change the order
of just adjacent elements, and while in theory std::sort() could use a
random pivot, in practice the code should be deterministic as it was
previously too.
But now lld/test/MachO/stabs.s passes with LLVM_ENABLE_EXPENSIVE_CHECKS=ON
(the last test that was failing with that set).
Fixes a regression from D99972.
While here, remove an empty section in stabs.s and move
.subsections_via_symbols to the end where it usually is (this part no
behavior change).
Differential Revision: https://reviews.llvm.org/D105071
Fixes PR50637.
Downstream bug: https://crbug.com/1218958
Currently, we split __cstring along symbol boundaries with .subsections_via_symbols
when not deduplicating, and along null bytes when deduplicating. This change splits
along null bytes unconditionally, and preserves original alignment in the non-
deduplicated case.
Removing subsections-section-relocs.s because with this change, __cstring
is never reordered based on the order file.
Differential Revision: https://reviews.llvm.org/D104919
The two different thread_local_regular sections (__thread_data and
more_thread_data) had nondeterminstic ordering for two reasons:
1. https://reviews.llvm.org/D102972 changed concatOutputSections
from MapVector to DenseMap, so when we iterate it to make
output segments, we would add the two sections to the __DATA
output segment in nondeterministic order.
2. The same change also moved the two stable_sort()s for segments
and sections to sort(). Since sections with assigned priority
(such as TLV data) have the same priority for all sections,
this is incorrect -- we must use stable_sort() so that the
initial (input-order-based) order remains.
As a side effect, we now (deterministically) put the __common
section in front of __bss (while previously we happened to
put it after it). (__common and __bss are both zerofill so
both have order INT_MAX, but common symbols are added to
inputSections before normal sections are collected.)
Makes lld/test/MachO/tlv.s and lld/test/MachO/tlv-dylib.s pass with
LLVM_ENABLE_EXPENSIVE_CHECKS=ON.
Differential Revision: https://reviews.llvm.org/D105054
Make sure we don't wrongly fold two sections that refer to
symbols with the same value if they are not both absolute /
non-absolute.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104876
Literal sections can be deduplicated before running ICF. That makes it
easy to compare them during ICF: we can tell if two literals are
constant-equal by comparing their offsets in their OutputSection.
LLD-ELF takes a similar approach.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104671
This test has always failed on 32 bit armv8 bots:
https://lab.llvm.org/buildbot/#/builders/178/builds/42
Due to the output order of some symbols changing.
I don't think this is an Arm specific issue so disabling
on 32 bit while it's investigated.
The patch reuses the common code to print memory operand addresses as
instruction comments. This helps to align the comments and enables using
target-specific comment markers when `evaluateMemoryOperandAddress()` is
implemented for them.
Differential Revision: https://reviews.llvm.org/D104861
libunwind uses unwind info to find the function address belonging
to the current instruction pointer. libunwind/src/CompactUnwinder.hpp's
step functions read functionStart for UNWIND_X86_64_MODE_STACK_IND
(and for nothing else), so these encodings need a dedicated entry
per function, so that the runtime can get the stacksize off the
`subq` instrunction in the function's prologue.
This matches ld64.
(CompactUnwinder.hpp from https://opensource.apple.com/source/libunwind/
also reads functionStart in a few more cases if `SUPPORT_OLD_BINARIES` is set,
but it defaults to 0, and ld64 seems to not worry about these additional
cases.)
Related upstream bug: https://crbug.com/1220175
Differential Revision: https://reviews.llvm.org/D104978
Modify the D13209 logic: for a script inside the sysroot, if an absolute path
does not exist, report an error instead of falling back to the path without the
sysroot prefix.
This matches GNU ld, which makes sense to me: we don't want to find an arbitrary
file in the host.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D104894
Commit 728cc0075e made comdat symbols
from LTO objects be treated as any regular comdat symbol. This works
great for symbols that actually are IMAGE_COMDAT_SELECT_ANY, but
if the symbols have a less trivial selection type that require comparing
either the section chunk size or contents, we can't check that before
actually doing the LTO compilation.
Therefore bring back one aspect of handling from before; that comdat
resolution with a leader from an LTO symbol is essentially skipped,
like it was before 728cc0075e.
Differential Revision: https://reviews.llvm.org/D104605
Add tests for pending TODOs, plus some global cleanups:
* No fold: func has personality/LSDA
* Fold: reference to absolute symbol with different name but identical value
* No fold: reloc references to absolute symbols with different values
* No fold: N_ALT_ENTRY symbols
Differential Revision: https://reviews.llvm.org/D104721
This particular linker invocation is only run to check that we accept
options, but we don't inspect the generated command line. As all other
commands in the file have their output piped to FileCheck, the lit test
doesn't print any other output; therefore silence this one for consistency
as well.
This is consistent with how clang prints its internal commands with
-### and -v.
When linking with -verbose, we get log messages from the actual
linking written to stderr. By printing the command to the same stream,
we make sure they appear in a sensible chronological order.
Differential Revision: https://reviews.llvm.org/D104527
Fixes PR50529. With this, lld-linked Chromium base_unittests passes on arm macs.
Surprisingly, no measurable impact on link time.
Differential Revision: https://reviews.llvm.org/D104681
Real zerofill sections go after __thread_bss, since zerofill sections
must all be at the end of their segment and __thread_bss must be right
after __thread_data.
Works fine already, but wasn't tested as far as I can tell.
Also tweak comment about zerofill sections a bit.
No behavior change.
Differential Revision: https://reviews.llvm.org/D104609
Pass the original argv[0] to the coff linker, as the coff linker uses
the basename of argv[0] as the log prefix.
This makes error messages to be printed with a "ld.lld:" prefix
instead of "lld-link:". The current "lld-link:" prefix can be confusing
to users, as they're invoking the MinGW linker (and might not even have
a lld-link executable).
Keep the first argument as lld-link when printing the command line, to
make it an actually reproducible standalone command.
Differential Revision: https://reviews.llvm.org/D104526
The exact location doesn't matter, but it should be in front
of __thread_bss. We put it right in front of __thread_data
which is where ld64 seems to put it as well.
Fixes PR50769.
(As mentioned on the bug, there is probably a more structural
fix too, see comment 5. If we don't address this, it's likely
we'll run into this again with other synthetic sections. But
for now, let's fix the immediate breakage.)
Differential Revision: https://reviews.llvm.org/D104596
...instead of S_NON_LAZY_SYMBOL_POINTERS. This matches ld64.
Part of PR50769.
While here, also remove an old TODO that was done in D87178.
Differential Revision: https://reviews.llvm.org/D104594
Previously, we asserted that such a case was invalid, but in fact
`ld -r` can emit such symbols if the input contained a (true) private
extern, or if it contained a symbol started with "L".
Non-extern symbols marked as private extern are essentially equivalent
to regular TU-scoped symbols, so no new functionality is needed.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104502
The `icf` command-line option is not present in ld64, so it should use the LLD option syntax, which begins with double dashes and separates primary option from any suboption with the equal sign.
Differential Revision: https://reviews.llvm.org/D104548
This change revisits https://reviews.llvm.org/D79248 which originally
added support for the --unresolved-symbols flag.
At the time I thought it would make sense to add a third option to this
flag called `import-functions` but it turns out (as was suspects by on
the reviewers IIRC) that this option can be authoganal.
Instead I've added a new option called `--import-undefined` that only
operates on symbols that can be imported (for example, function symbols
can always be imported as opposed to data symbols we can only be
imported when compiling with PIC).
This option gives us the full expresivitiy that emscripten needs to be
able allow reporting of undefined data symbols as well as the option to
disable that.
This change does remove the `--unresolved-symbols=import-functions`
option, which is been in the codebase now for about a year but I would
be extremely surprised if anyone was using it.
Differential Revision: https://reviews.llvm.org/D103290
ICF = Identical C(ode|OMDAT) Folding
This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.
`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.
We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.
Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
N Min Max Median Avg Stddev
x 20 4.27 4.44 4.34 4.349 0.043029977
+ 20 4.37 4.46 4.405 4.4115 0.025188761
Difference at 95.0% confidence
0.0625 +/- 0.0225658
1.43711% +/- 0.518873%
(Student's t, pooled s = 0.0352566)
```
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D103292
We need to dedup archive loads (similar to what we do for dylib
loads).
I noticed this issue after building some Swift stuff that used
`-force_load_swift_libs`, as it caused some Swift archives to be loaded
many times.
Reviewed By: #lld-macho, thakis, MaskRay
Differential Revision: https://reviews.llvm.org/D104353
After D77330, the comments are inconsistent with the disassembled code.
As the value of `far` has been changed, a thunk to reach it is now
generated, and target addresses of branch instructions are different
from what was initially expected.
The patch fixes that and makes the test closer to what it was originally.
Differential Revision: https://reviews.llvm.org/D104286
It's a warning in ld64. While having LLD be stricter would be nice, it
makes it harder for it to be a drop-in replacement into existing builds.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104333
During PHDR creation, the case where an output section does not require a
PT_LOAD header but still occupies memory in the current VMA region was not handled.
If such an output section interleaves two output sections that have the same
VMA and LMA regions set, we would previously re-use the existing PT_LOAD header
for the second output section.
However, since the memory region is not contiguous, we need to start a new PT_LOAD
segment.
This fixes https://bugs.llvm.org/show_bug.cgi?id=50558
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D103815
This implements https://sourceware.org/bugzilla/show_bug.cgi?id=26404
An `OVERWRITE_SECTIONS` command is a `SECTIONS` variant which contains several
output section descriptions. The output sections do not have specify an order.
Similar to `INSERT [BEFORE|AFTER]`, `LinkerScript::hasSectionsCommand` is not
set, so the built-in rules (see `docs/ELF/linker_script.rst`) still apply.
`OVERWRITE_SECTIONS` can be more convenient than `INSERT` because it does not
need an anchor section.
The initial syntax is intentionally narrow to facilitate backward compatible
extensions in the future. Symbol assignments cannot be used.
This feature is versatile. To list a few usage:
* Use `section : { KEEP(...) }` to retain input sections under GC
* Define encapsulation symbols (start/end) for an output section
* Use `section : ALIGN(...) : { ... }` to overalign an output section (similar to ld64 `-sectalign`)
When an output section is specified by both `OVERWRITE_SECTIONS` and
`INSERT`, `INSERT` is processed after overwrite sections. To make this work,
this patch changes `InsertCommand` to use name based matching instead of pointer
based matching. (This may cause a difference when `INSERT` moves one output
section more than once. Such duplicate commands should not be used in practice
(seems that in GNU ld the output sections may just disappear).)
A linker script can be used without -T/--script. The traditional `SECTIONS`
commands are concatenated, so a wrong rule can be more noticeable from the
section order. This feature if misused can be less noticeable, just like
`INSERT`.
Differential Revision: https://reviews.llvm.org/D103303
Sort the addresses stored in FunctionStarts section.
Previously we were encoding potentially large numbers (due to unsigned overflow).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D103662
Literal sections are not atomically live or dead. Rather,
liveness is tracked for each individual literal they contain. CStrings
have their liveness tracked via a `live` bit in StringPiece, and
fixed-width literals have theirs tracked via a BitVector.
The live-marking code now needs to track the offset within each section
that is to be marked live, in order to identify the literal at that
particular offset.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W
with both `-dead_strip` and `--deduplicate-literals`, with and without this diff
applied:
```
N Min Max Median Avg Stddev
x 20 4.32 4.44 4.375 4.372 0.03105174
+ 20 4.3 4.39 4.36 4.3595 0.023277502
No difference proven at 95.0% confidence
```
This gives us size savings of about 0.4%.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103979
Conceptually, the implementation is pretty straightforward: we put each
literal value into a hashtable, and then write out the keys of that
hashtable at the end.
In contrast with ELF, the Mach-O format does not support variable-length
literals that aren't strings. Its literals are either 4, 8, or 16 bytes
in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since
we don't need to worry about overly-long values, we should be able to do
a faster job by just hashing.
That said, the implementation right now is far from optimal, because we
add to those hashtables serially. To parallelize this, we'll need a
basic concurrent hashtable (only needs to support concurrent writes w/o
interleave reads), which shouldn't be to hard to implement, but I'd like
to punt on it for now.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.27 4.39 4.315 4.3225 0.033225703
+ 20 4.36 4.82 4.44 4.4845 0.13152846
Difference at 95.0% confidence
0.162 +/- 0.0613971
3.74783% +/- 1.42041%
(Student's t, pooled s = 0.0959262)
This corresponds to binary size savings of 2MB out of 335MB, or 0.6%.
It's not a great tradeoff as-is, but as mentioned our implementation can
be signficantly optimized, and literal dedup will unlock more
opportunities for ICF to identify identical structures that reference
the same literals.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D103113
Be less clever when writing the indirect symbols in LC_DYSYMTAB:
lld used to make point __stubs and __la_symbol_ptr point at the
same bytes in the indirect symbol table in the __LINKEDIT segment.
That confused strip, so write the same bytes twice and make
__stubs and __la_symbol_ptr point at one copy each, so that they
don't share data. This unconfuses strip, and seems to be what ld64
does too, so hopefully tools are generally more used to this.
This makes the output binaries a bit larger, but not much: 4 bytes
for roughly each called function from a dylib and each weak function.
Chromium Framewoork grows by 6536 bytes, clang-format by a few hundred.
With this, `strip -x Chromium\ Framework` works (244 MB before stripping
to 171 MB after stripping, compared to 236 MB=>164 MB with ld64). Running
strip without `-x` produces the same error message now for lld-linked
Chromium Framework as for when using ld64 as a linker.
`strip clang-format` also works now but didn't previously.
Fixes PR50657.
Differential Revision: https://reviews.llvm.org/D104081
In a -no-pie link we optimize R_PLT_PC to R_PC. Currently we resolve a branch
relocation to the link-time zero address. However such a choice tends to cause
relocation overflow possibility for RISC architectures.
* aarch64: GNU ld: rewrite the instruction to a NOP; ld.lld: branch to the next instruction
* mips: GNU ld: branch to the start of the text segment (?); ld.lld: branch to zero
* ppc32: GNU ld: rewrite the instruction to a NOP; ld.lld: branch to the current instruction
* ppc64: GNU ld: rewrite the instruction to a NOP; ld.lld: branch to the current instruction
* riscv: GNU ld: branch to the absolute zero address (with instruction rewriting)
* i386/x86_64: GNU ld/ld.lld: branch to the link-time zero address
I think that resolving to the same location is a good choice. The instruction,
if triggered, is clearly an undefined behavior. Resolving to the same location
can cause an infinite loop (making the user aware of the issue) while ensuring
no overflow.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D103001
For debugging dylib loading, it's useful to have some insight into what
the linker is doing.
ld64 has the undocumented RC_TRACE_DYLIB_SEARCHING env var
for this printing dylib search candidates.
This adds a flag --print-dylib-search to make lld print the seame information.
It's useful for users, but also for writing tests. The output is formatted
slightly differently than ld64, but we still support RC_TRACE_DYLIB_SEARCHING
to offer at least a compatible way to trigger this.
ld64 has both `-print_statistics` and `-trace_symbol_output` to enable
diagnostics output. I went with "print" since that seems like a more
straightforward name.
Differential Revision: https://reviews.llvm.org/D103985
In a framework Foo.framework, Foo.framework/Foo is usually a relative
symbolic link to Foo.framework/Versions/Current/Foo,
and Foo.framework/Versions/Current is usually a relative symbolic
link to A.
Our tests used absolute symbolic links. Now they use relative symbolic links.
No behavior change, just makes the tests more representative of the real world.
(implicit-dylib.s omits the "Current" folder too, but I'm not changing that
here.)
Differential Revision: https://reviews.llvm.org/D103998
This is important for Frameworks, which are usually symlinks.
ld64 gets this right for @rpath that's replaced with @loader_path, but not for
bare @loader_path -- ld64's code calls realpath() in that case too, but ignores
the result.
ld64 somehow manages to find libbar1.dylib in the test without the
explicit `-rpath` in Foo1. I don't understand why or how. But this
change is a step forward and fixes an immediate problem I'm having,
so let's start with this :)
Differential Revision: https://reviews.llvm.org/D103990
It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103709
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.
**Alignment Issues**
Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.
I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.
Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.
In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.
With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.
It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.
**Performance Numbers**
CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.99 4.14 4.015 4.0365 0.0492336
Difference at 95.0% confidence
0.0865 +/- 0.027245
2.18987% +/- 0.689746%
(Student's t, pooled s = 0.0425673)
As expected, cstring merging incurs some non-trivial overhead.
When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.89 4.02 3.935 3.9435 0.043197831
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D102964
When a library "host"'s reexports change their installName with
`$ld$os10.11$install_name$host`, we used to write a load command for "host" but
write the version numbers of the reexport instead of "host". This fixes that.
I first thought that the rule is to take the version numbers from the library
that originally had that install name (implemented in D103819), but that's not
what ld64 seems to be doing: It takes the version number from the first dylib
with that install name it loads, and it loads the reexporting library before
the reexports. We already did most of that, we just added reexports before the
reexporter. After this change, we add the reexporter before the reexports.
Addresses https://bugs.llvm.org/show_bug.cgi?id=49800#c11 part 1.
(ld64 seems to add reexports after processing _all_ files on the command line,
while we add them right after the reexporter. For the common case of reexport +
$ld$ symbol changing back to the exporter name, this doesn't make a difference,
but you can construct a case where it does. I expect this to not make a
difference in practice though.)
Differential Revision: https://reviews.llvm.org/D103821
Our behavior here already matched ld64, now we have a test for it.
(ld64 even strips the library here if you also pass -needed_library bar.dylib.
That seems wrong to me, and lld honors needed_library in that case.)
Differential Revision: https://reviews.llvm.org/D103812
Also adjust a few comments, and move the DylibFile comment talking about
umbrella next to the parameter again.
Differential Revision: https://reviews.llvm.org/D103783
This diff adds first bits to support special symbols $ld$previous* in LLD.
$ld$* symbols modify properties/behavior of the library
(e.g. its install name, compatibility version or hide/add symbols)
for specific target versions.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D103505
D103423 neglected to call `parseReexports()` for nested TBD
documents, leading to symbol resolution failures when trying to look up
a symbol nested more than one level deep in a TBD file. This fixes the
regression and adds a test.
It also appears that `umbrella` wasn't being set properly when calling
`parseLoadCommands` -- it's supposed to resolve to `this` if `nullptr`
is passed. I didn't write a failing test case for this but I've made
`umbrella` a member so the previous behavior should be preserved.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103586
Make sure that comdat symbols also have a non-null dummy
SectionChunk associated.
This requires moving around an existing FIXME regarding comdats in
LTO.
Differential Revision: https://reviews.llvm.org/D103012
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.
Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).
Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:
% ministat lld_*
x lld_nostrip.txt
+ lld_strip.txt
N Min Max Median Avg Stddev
x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794
+ 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651
Difference at 95.0% confidence
-0.144711 +/- 0.0336749
-3.60967% +/- 0.839989%
(Student's t, pooled s = 0.0358398)
This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols
It's possible it interacts with more features I didn't think of,
of course.
I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests
Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.
Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
(`.no_dead_strip` in asm)
Fixes PR49276.
Differential Revision: https://reviews.llvm.org/D103324
We used to not print dylibs referenced by other dylibs in `-t` mode. This
affected reexports, and with `-flat_namespace` also just dylibs loaded by
dylibs. Now we print them.
Fixes PR49514.
Differential Revision: https://reviews.llvm.org/D103428
In some cases, we end up with several distinct DylibFiles that
have the same install name. Only emit a single LC_LOAD_DYLIB in
those cases.
This happens in 3 cases I know of:
1. Some tbd files are symlinks. libpthread.tbd is a symlink against
libSystem.tbd for example, so `-lSystem -lpthread` loads
libSystem.tbd twice. We could (and maybe should) cache loaded
dylibs by realpath() to catch this.
2. Some tbd files are copies of each other. For example,
CFNetwork.framework/CFNetwork.tbd and
CFNetwork.framework/Versions/A/CFNetwork.tbd are two distinct
copies of the same file. The former is found by
`-framework CFNetwork` and the latter by the reexport in
CoreServices.tbd. We could conceivably catch this by
making `-framework` search look in `Versions/Current` instead
of in the root, and/or by using a content hash to cache
tbd files, but that's starting to sound complicated.
3. Magic $ld$ symbol processing can change the install name of
a dylib based on the target platform_version. Here, two
truly distinct dylibs can have the same install name.
So we need this code to deal with (3) anyways. Might as well use
it for 1 and 2, at least for now :)
With this (and D103430), clang-format links in the same dylibs
when linked with lld and ld64.
Differential Revision: https://reviews.llvm.org/D103488
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed
This matches ld64.
Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.
With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)
Differential Revision: https://reviews.llvm.org/D103430
.s files with `-g` generate __debug_aranges on darwin/arm64 for some
reason, and those lead to `nullptr` symbols. Don't crash on that.
Fixes PR50517.
Differential Revision: https://reviews.llvm.org/D103350
Ghashing is probably going to be faster in most cases, even without
precomputed ghashes in object files.
Here is my table of results linking clang.pdb:
-------------------------------
| threads | GHASH | NOGHASH |
-------------------------------
| j1 | 51.031s | 25.141s |
| j2 | 31.079s | 22.109s |
| j4 | 18.609s | 23.156s |
| j8 | 11.938s | 21.984s |
| j28 | 8.375s | 18.391s |
-------------------------------
This shows that ghashing is faster if at least four cores are available.
This may make the linker slower if most cores are busy in the middle of
a build, but in that case, the linker probably isn't on the critical
path of the build. Incremental build performance is arguably more
important than highly contended batch build link performance.
The -time output indicates that ghash computation is the dominant
factor:
Input File Reading: 924 ms ( 1.8%)
GC: 689 ms ( 1.3%)
ICF: 527 ms ( 1.0%)
Code Layout: 414 ms ( 0.8%)
Commit Output File: 24 ms ( 0.0%)
PDB Emission (Cumulative): 49938 ms ( 94.8%)
Add Objects: 46783 ms ( 88.8%)
Global Type Hashing: 38983 ms ( 74.0%)
GHash Type Merging: 5640 ms ( 10.7%)
Symbol Merging: 2154 ms ( 4.1%)
Publics Stream Layout: 188 ms ( 0.4%)
TPI Stream Layout: 18 ms ( 0.0%)
Commit to Disk: 2818 ms ( 5.4%)
--------------------------------------------------
Total Link Time: 52669 ms (100.0%)
We can speed that up with a faster content hash (not SHA1).
Differential Revision: https://reviews.llvm.org/D102888
This diff paves the way for {D102964} which adds a new kind of
InputSection.
We previously maintained section ordering implicitly: we created
InputSections as we parsed each file in command-line order, and passed
on this ordering when we created OutputSections and OutputSegments by
iterating over these InputSections. The implicitness of the ordering
made it difficult to refactor the code to e.g. handle a new type of
InputSection. As such, I've codified the ordering explicitly via
`inputOrder` fields. This also allows us to use `sort` instead of
`stable_sort`.
Benchmarking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.23 4.35 4.27 4.274 0.030157481
+ 20 4.24 4.38 4.27 4.2815 0.033759989
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102972
Given the following scenario:
```
// Cat.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };
extern "C" int puts(char const *);
void Cat::makeNoise() const { puts("Meow"); }
void doThingWithCat(Animal *a) { static_cast<Cat *>(a)->makeNoise(); }
// CatUser.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };
void doThingWithCat(Animal *a);
void useDoThingWithCat() {
Cat *d = new Cat;
doThingWithCat(d);
}
// cat.ver
{
global: _Z17useDoThingWithCatv;
local: *;
};
$ clang++ Cat.cpp CatUser.cpp -fpic -flto=thin -fwhole-program-vtables
-shared -O3 -fuse-ld=lld -Wl,--lto-whole-program-visibility
-Wl,--version-script,cat.ver
```
We cannot devirtualize `Cat::makeNoise`. The issue is complex:
Due to `-fsplit-lto-unit` and usage of type metadata, we place the Cat
vtable declaration into module 0 and the Cat vtable definition with type
metadata into module 1, causing duplicate entries (Undefined followed by
Defined) in the `lto::InputFile::symbols()` output.
In `BitcodeFile::parse`, after processing the `Undefined` then the
`Defined`, the final state is `Defined`.
In `BitcodeCompiler::add`, for the first symbol, `computeBinding`
returns `STB_LOCAL`, then we reset it to `Undefined` because it is
prevailing (`versionId` is `preserved`). For the second symbol, because
the state is now `Undefined`, `computeBinding` returns `STB_GLOBAL`,
causing `ExportDynamic` to be true and suppressing devirtualization.
In D77280, the `computeBinding` change used a stricter `isDefined()`
condition to make weak``Lazy` symbol work.
This patch relaxes the condition to weaker `!isLazy()` to keep it
working while making the devirtualization work as well.
Differential Revision: https://reviews.llvm.org/D98686
If we support local signature symbols (PR43094), these tests would fail.
When the support is added, new tests (local signature symbol specific) should be developed.
Prior to this change build with `-shared/-pie` and using TLS (but
without -shared-memory) would hit this assert:
"Currenly only a single data segment is supported in PIC mode"
This is because we were not including TLS data when merging data
segments. However, when we build without shared-memory (i.e. without
threads) we effectively lower away TLS into a normal active data
segment.. so we were ending up with two active data segments: the merged
data, and the lowered TLS data.
To fix this problem we can instead avoid combining data segments at
all when running in shared memory mode (because in this case all
segment initialization is passive). And then in non-shared memory
mode we know that TLS has been lowered and therefore we can can
and should combine all segments.
So with this new behavior we have two different modes:
1. With shared memory / mutli-threaded: Never combine data segments
since it is not necessary. (All data segments as passive already).
2. Wihout shared memory / single-threaded: Combine *all* data segments
since we treat TLS as normal data. (We end up with a single
active data segment).
Differential Revision: https://reviews.llvm.org/D102937
The COFF driver produces an ABSOLUTE relocation base for an ADDR32
relocation type and the system is 64 bits (machine=AMD64). The
relocation information won't be added in the output and could
produce an incorrect address access during run-time. This change
set checks if the relocation type is IMAGE_REL_AMD64_ADDR32 and
if so, adds the relocated symbol as IMAGE_REL_BASED_HIGHLOW base.
Differential Revision: https://reviews.llvm.org/D96619
Previously we simply didn't check this. Prereq to make the test suite
pass with ghash enabled by default.
Differential Revision: https://reviews.llvm.org/D102885
__table_base is know 64-bit, since in LLVM it represents a function pointer offset
__table_base32 is a copy in wasm32 for use in elem init expr, since no truncation may be used there.
New reloc R_WASM_TABLE_INDEX_REL_SLEB64 added
Differential Revision: https://reviews.llvm.org/D101784
These symbols are long, and they tend to cause the PDB file size to
overflow. They are generally not necessary when debugging problems in
user code.
This change reduces the size of chrome.dll.pdb with coverage from
6,937,108,480 bytes to 4,690,210,816 bytes.
Differential Revision: https://reviews.llvm.org/D102719
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
In LC_DYSYMTAB, private externs were still emitted as exported symbols instead
of as locals.
Fixes PR50373. See bug for details.
Differential Revision: https://reviews.llvm.org/D102662
Besides -Bdynamic and -Bstatic, ld documents additional aliases for both of these options. Instead of -Bstatic, one may write -dn, -non_shared or -static. Instead of -Bdynamic one may write -dy or -call_shared. Source: https://sourceware.org/binutils/docs-2.36/ld/Options.html
This patch adds those aliases to the MinGW driver of lld for the sake of ld compatibility.
Encountered this case while compiling a static Qt 6.1 distribution and got build failures as -static was passed directly to the linker, instead of through the compiler driver.
Differential Revision: https://reviews.llvm.org/D102637
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.
(Corresponds to symbolTableInAndNeverStrip in ld64.)
Differential Revision: https://reviews.llvm.org/D102619
D62727 removed GotEntrySize and GotPltEntrySize with a comment that they
are always equal to wordsize(), but that is not entirely true: X32 has a
word size of 4, but needs 8-byte GOT entries. This restores gotEntrySize
for both, adjusted for current naming conventions, but defaults it to
config->wordsize to keep things simple for architectures other than
x86_64.
This partially reverts D62727.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102509
This option will be available in GNU ld 2.27 (https://sourceware.org/bugzilla/show_bug.cgi?id=27834).
This option can cancel previously specified -Bsymbolic and
-Bsymbolic-functions. This is useful for excluding some links when the
default uses -Bsymbolic-functions.
Reviewed By: jhenderson, peter.smith
Differential Revision: https://reviews.llvm.org/D102383
Previously there was no test checking that -Bsymbolic-functions only applies to STT_FUNC symbols.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102461
This fixes a bug with string merging with string symbols that contain
NULLs, as is the case in the `merge-string.s` test.
The bug only showed when we run with `--relocatable` and then try read
the resulting object back in. In this case we would end up with string
symbols that extend past the end of the segment in which they live.
The problem comes from the fact that sections which are flagged as
string mergable assume that all strings are NULL terminated. The
merging algorithm will drop trailing chars that follow a NULL since they
are essentially unreachable. However, the "size" attribute (in the
symbol table) of such a truncated symbol is not updated resulting a
symbol size that can overlap the end of the segment.
I verified that this can happen in ELF too given the right conditions
and the its harmless enough. In practice Strings that contain embedded
null should not be part of a mergable section.
Differential Revision: https://reviews.llvm.org/D102281
Since c579a5b1d9 we don't traverse
.eh_frame when doing GC. But the exception handling personality
function needs to be included, and is only referenced from within
.eh_frame.
Differential Revision: https://reviews.llvm.org/D102138
Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it.
Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`.
FIXME:
* arm64-stubs.s test is broken
* add thunk tests
* Handle thunks to DylibSymbol in MergedOutputSection::finalize()
Differential Revision: https://reviews.llvm.org/D100818
Don't include the relocation addend when calculating the
virtual address of a symbol. Instead just pass the symbol's
offset and add the addend afterwards.
Without this fix we hit the `offset is outside the section`
error in MergeInputSegment::getSegmentPiece.
This fixes a real world error we were are seeing in emscripten.
Differential Revision: https://reviews.llvm.org/D102271
We have this extra step in wasm-ld that doesn't exist in other lld
backend which verifies the existing contents of the relocation targets.
This was originally intended as an extra form of double checking and an
aid to compiler developers. However it has always been somewhat
controversial and there have been suggestions in the past the we simply
remove it.
My motivation for removing it now is that its causing me a headache
when trying to fix an issue with negative addends. In the case of
negative addends that final result can be wrapped/negative but this
checking code would require significant modification to be able to deal
with that case. For example with some test cases I'm looking at I'm
seeing error like this:
```
wasm-ld: warning: /usr/local/google/home/sbc/dev/wasm/llvm-build/tools/lld/test/wasm/Output/merge-string.s.tmp.o:(.rodata_relocs): unexpected existing value for R_WASM_MEMORY_ADDR_I32: existing=FFFFFFFA expected=FFFFFFFFFFFFFFFA
```
Rather than try to refactor `calcExpectedValue` to somehow return two
different types of results (32 and 64-bit) depending on the relocation
type, I think we can just remove this code.
Differential Revision: https://reviews.llvm.org/D102265
Currently, when reporting unresolved symbols in shared libraries, if an
undefined symbol is firstly seen in a regular object file that shadows
the reference for the same symbol in a shared object. As a result, the
error for the unresolved symbol in the shared library is not reported.
If referencing sections in regular object files are discarded because of
'--gc-sections', no reports about such symbols are generated, and the
linker finishes successfully, generating an output image that fails on
the run.
The patch fixes the issue by keeping symbols, which should be checked,
for each shared library separately.
Differential Revision: https://reviews.llvm.org/D101996
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
We had a hardcoded check and a stale TODO, written back when we only had
support for one architecture.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102154
In particular, we should apply the `-undefined` behavior to all
such symbols, include those that are specified via the command line
(i.e. `-e`, `-u`, and `-exported_symbol`). ld64 supports this too.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102143
On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.
Fixes PR50274. (The same bug could have happened with -order_file
previously.)
Differential Revision: https://reviews.llvm.org/D102139
This would cause us to pull in symbols (and code) that should
be unused.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102137
Symbols explicitly exported via command-line options `--exported_symbol SYM` and `--exported_symbols_list FILE` must be defined. Before this fix, lazy symbols defined in archives would be left to languish. We now force them to be included in the linked output.
Differential Revision: https://reviews.llvm.org/D102100
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.
Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:
N Min Max Median Avg Stddev
x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097
+ 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012
Difference at 95.0% confidence
-0.172162 +/- 0.083847
-4.14165% +/- 2.01709%
(Student's t, pooled s = 0.0892373)
Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)
Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
(that is, catch block unwind information) and Personality Routines
associated with weak functions still not stripped. This is wasteful,
but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
one symbol (eg from .alt_entry) for now
Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
found it a bit more explicit)
- exports
Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).
This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.
I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.
Differential Revision: https://reviews.llvm.org/D102076
GNU as documentation states that a `.thumb_func` directive implies `.thumb`, teach the asm parser to switch mode whenever it's encountered. On the other hand the labeled form, exclusive to Apple's toolchain, doesn't switch mode at all.
Reviewed By: nickdesaulniers, peter.smith
Differential Revision: https://reviews.llvm.org/D101975
This fixes an issue where mixed TOC / NOTOC calls can call the incorrect
thunks if a previous thunk already exists. The issue appears when a TOC
funciton calls a NOTOC callee and then a different NOTOC function calls the same
NOTOC callee. In this case the linker would sometimes incorrectly call the
same thunk for both cases.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101837
ld64 can emit dylibs that support more than one platform (typically macOS and
macCatalyst). This diff allows LLD to read in those dylibs. Note that this is a
super bare-bones implementation -- in particular, I haven't added support for
LLD to emit those multi-platform dylibs, nor have I added a variety of
validation checks that ld64 does. Until we have a use-case for emitting zippered
dylibs, I think this is good enough.
Fixes PR49597.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D101954
It doesn't seem like TBDv3 allows for specifying multiple platforms, so I'm
upgrading us to TBDv4. (We need to support multiple platforms in order to test
that we can handle zippered dylibs; that functionality will be added in an
upcoming diff.)
Differential Revision: https://reviews.llvm.org/D101953
The Halide project uses `#pragma comment(linker, "/STACK:...")` to set
the stack size high enough for our embedded compiler to run in end-user
programs on Windows.
Unfortunately, lld-link.exe breaks on this when embedded in a COFF
object, despite supporting the flag on the command line. MSVC's link.exe
supports this fine. This patch extends support for this to lld-link.exe
for better compatibility with MSVC projects.
Differential Revision: https://reviews.llvm.org/D99680
Currently the linker causes unnecessary errors when either the target or the config's platform is a simulator.
Differential Revision: https://reviews.llvm.org/D101855
ARM_RELOC_BR24 is used for BL/BLX instructions from within ARM (i.e. not
Thumb) code. This diff just handles the basic case: branches from ARM to
ARM, or from ARM to Thumb where no shimming is required. (See comments
in ARM.cpp for why shims are required.)
Note: I will likely be deprioritizing ARM work for the near future to
focus on other parts of LLD. Apologies for the half-done state of this;
I'm just trying to wrap up what I've already worked on.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D101814
We need to account for path rerooting when generating the response
file. We could either reroot the paths before generating the file, or pass
through the original filenames and change just the syslibroot. I've opted for
the latter, in order that the reproduction run more closely mirrors the
original.
We must also be careful *not* to make an absolute path relative if it is
shadowed by a rerooted path. See repro6.tar in reroot-path.s for
details.
I've moved the call to `createResponseFile()` after the initialization of
`config->systemLibraryRoots`, since it now needs to know what those roots are.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D101224
This is a followup to 2b01a417d7ccb001ccc1185ef5fdc967c9fac8d7;
previously the RVAs of the exported symbols from comdats were left
zero.
Thanks to Kleis Auke Wolthuizen for the fix suggestion and pointing
out the omission.
Differential Revision: https://reviews.llvm.org/D101615
When dumping multiple pieces of information (e.g. --all-headers),
there is sometimes no separator between two pieces.
This patch uses the "\nheader:\n" style, which generally improves
compatibility with GNU objdump.
Note: objdump -t/-T does not add a newline before "SYMBOL TABLE:" and "DYNAMIC SYMBOL TABLE:".
We add a newline to be consistent with other information.
`objdump -d` prints two empty lines before the first 'Disassembly of section'.
We print just one with this patch.
Differential Revision: https://reviews.llvm.org/D101796
ld64 automatically renames many sections depending on output type and assorted flags. Here, we implement the most common configs. We can add more obscure flags and behaviors as needed.
Depends on D101393
Differential Revision: https://reviews.llvm.org/D101395
`shouldImport` was not returning true in PIC mode even though out
assumption elsewhere (in Relocations.cpp:scanRelocations) is that we
don't report undefined symbols in PIC mode today. This was resulting
functions that were undefined and but also not imported which hits an
assert later on that all functions have valid indexes.
Differential Revision: https://reviews.llvm.org/D101716
Two of these are trivial. The third (shared.s) did have some
expectations changes but only due to two data symbols being re-ordered.
Differential Revision: https://reviews.llvm.org/D101711
When running in relocatable mode any input data segments that are part
of a comdat group should not be merged with other segments of the same
name. This is because the final linker needs to keep the separate so
they can be included/excluded individually.
Often this is not a problem since normally only one section with a given
name `foo` ends up in the output object file. However, the problem
occurs when one input contains `foo` which part of a comdat and another
object contains a local symbol `foo` we were attempting to merge them.
This behaviour matches (I believe) that of the ELF linker. See
`LinkerScript.cpp:addInputSec`.
Fixes: https://github.com/emscripten-core/emscripten/issues/9726
Differential Revision: https://reviews.llvm.org/D101703
@thakis pointed out that `mach_header` and `mach_header_64`
actually have the same set of (used) fields, with the 64-bit version
having extra padding. So we can access the fields we need using the
single `mach_header` type instead of using templates to switch between
the two.
I also spotted a potential issue where hasObjCSection tries to parse a
file w/o checking if it does indeed match the target arch... As such,
I've added a quick magic number check to ensure we don't access invalid
memory during `findCommand()`.
Addresses PR50180.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D101724
Adopt my suggestion in https://reviews.llvm.org/D91426#2653926 ,
generalizing the ppc64 specific code.
GNU ld and glibc ld.so has a contract about the first few entries of .got .
There are somewhat complex conditions when the header is needed. This patch
switches to a simpler approach: add a header unconditionally if
_GLOBAL_OFFSET_TABLE_ is used or the number of entries is more than just the
header.
The right symbol flag mask is ~0x7, not ~0xf.
Also emit string names for the other flags (we were missing some).
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101548
This just parses the `-arch armv7` and emits the right header flags.
The rest will be slowly fleshed out in upcoming diffs.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101557
as a way of not running it on Windows, where the file paths when
extracting repro2.tar can become longer than the maximum file length
limit (depending on the build dir name) and cause the test to fail.
(See https://crbug.com/1204463 for example test failure.)
When looking for the "all" symbols that are supposed to be exported,
we can't look at the live flag - the symbols we mark as to be
exported will become GC roots even if they aren't yet marked as live.
With this in place, building an LLVM library with BUILD_SHARED_LIBS
produces the same set of symbols exported regardless of whether the
--gc-sections flag is specified, both with and without being built
with -ffunction-sections.
Differential Revision: https://reviews.llvm.org/D101522
I don't think it's super worthwhile to test the dylib headers outputs of
all the different archs when x86_64 is the only one that has interesting
behavior.
Motivated by my upcoming addition of arm32...
Modern versions of macOS (>= 10.7) and in general all modern Mach-O
target archs want PIEs by default. ld64 defaults to PIE for iOS >= 4.3,
as well as for all versions of watchOS and simulators. Basically all the
platforms LLD is likely to target want PIE. So instead of cluttering LLD's
code with legacy version checks, I think it's simpler to just default to
PIE for everything.
Note that `-no_pie` still works, so users can still opt out of it.
Reviewed By: #lld-macho, thakis, MaskRay
Differential Revision: https://reviews.llvm.org/D101513
GNU ld -r can create .rela.eh_frame with unordered r_offset values.
(With LLD, we can craft such a case by reordering sections in .eh_frame.)
This is currently unsupported and will trigger
`assert(pieces[i].inputOff <= off ...` in `OffsetGetter::get`
(the content is corrupted in a -DLLVM_ENABLE_ASSERTIONS=off build).
This patch supports this case.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D101116
Just like the in case for function and data symbols this is needed to
support relocations in debug info sections which are allowed contains
relocations against non-live symbols.
The motivating use case is an object file that contains debug info that
references `__stack_pointer` (a local symbol) but does not actually
contain any uses of `__stack_pointer`.
Fixes: https://github.com/emscripten-core/emscripten/issues/14025
Differential Revision: https://reviews.llvm.org/D101399
This is the same problem as 127176e59e, but for static TLS rather than
dynamic TLS. Although we know the symbol will be the one in our own TLS
segment, and thus the offset of it within that, we don't know where in
the static TLS block our data will be allocated and thus we must emit a
dynamic relocation for this case.
Reviewed By: MaskRay, atanasyan
Differential Revision: https://reviews.llvm.org/D101381
Whilst not wrong (unless using static PIE where the relocations are
likely not implemented by the runtime), this is inefficient, as the TLS
module indices and offsets are independent of the executable's load
address.
Reviewed By: MaskRay, atanasyan
Differential Revision: https://reviews.llvm.org/D101382
Add option to limit (or remove limits) on the number of errors printed before exiting. This option exists in the other lld ports: COFF & ELF.
Differential Revision: https://reviews.llvm.org/D101274
When I added this assert in D93609, it asserted that a symbol that
is privateExtern is also isExternal().
In D98381 the privateExtern check moved into shouldExportSymbol()
but the assert didn't -- now it checked that _every_ non-exported
symbol is isExternal(), which isn't true. Move the assert into the
privateExtern check where it used to be.
Fixes PR50098.
Differential Revision: https://reviews.llvm.org/D101223
- the macro seems needlessly clever -- shorter and imho clearer without it
- give all filenames an extension so they look like filenames
- rename .private_extern symbol from _private to _private_extern
to prepare for follow-up that adds a truly private symbol
No behavior change.
Differential Revision: https://reviews.llvm.org/D101222
I was a bit confused by the comment because I thought that "Tests
that..." was describing the tests contained within the same file.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D101160
I went through the callers of `readFile()` and `addFile()` in Driver.cpp
and checked that the options that use them all get rewritten in the
--reproduce response file. -(un)exported_symbols_list and -bundle_loader
weren't, so add them.
Also spruce up the test for reproduce a bit and actually try linking
with the exptracted repro archive.
Motivated by the response file in PR50098 complaining abou the
-exported_symbols_list path being wrong :)
Differential Revision: https://reviews.llvm.org/D101182
D101114 enforced proper version checks, which exposed a variety of version
mismatch issues in our tests. We previously changed the test inputs to
target 10.0, which was the simpler thing to do, but we should really
just have our lit.local.cfg default to targeting 10.15, which is what is done
here. We're not likely to ever have proper support for the older versions
anyway, as that would require more work for unclear benefit; for instance,
llvm-mc seems to generate a different compact unwind format for older macOS
versions, which would cause our compact-unwind.s test to fail.
Targeting 10.15 by default causes the following behavioral changes:
* `__mh_execute_header` is now a section symbol instead of an absolute symbol
* LC_BUILD_VERSION gets emitted instead of LC_VERSION_MIN_MACOSX. The former is
32 bytes in size whereas the latter is 16 bytes, so a bunch of hardcoded
address offsets in our tests had to be updated.
* >= 10.6 executables are PIE by default
Note that this diff was stacked atop of a local revert of most of the test
changes in rG8c17a875150f8e736e8f9061ddf084397f45f4c5, to make review easier.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D101119
I first had a more invasive patch for this (D101069), but while trying
to get that polished for review I realized that lld's current symbol
merging semantics mean that only a very small code change is needed.
So this goes with the smaller patch for now.
This has no effect on projects that build with -fvisibility=hidden
(e.g. chromium), since these see .private_extern symbols instead.
It does have an effect on projects that build with -fvisibility-inlines-hidden
(e.g. llvm) in -O2 builds, where LLVM's GlobalOpt pass will promote most inline
functions from .weak_definition to .weak_def_can_be_hidden.
Before this patch:
% ls -l out/gn/bin/clang out/gn/lib/libclang.dylib
-rwxr-xr-x 1 thakis staff 113059936 Apr 22 11:51 out/gn/bin/clang
-rwxr-xr-x 1 thakis staff 86370064 Apr 22 11:51 out/gn/lib/libclang.dylib
% out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang | wc -l
8291
% out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib | wc -l
5698
With this patch:
% ls -l out/gn/bin/clang out/gn/lib/libclang.dylib
-rwxr-xr-x 1 thakis staff 111721096 Apr 22 11:55 out/gn/bin/clang
-rwxr-xr-x 1 thakis staff 85291208 Apr 22 11:55 out/gn/lib/libclang.dylib
thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/bin/clang | wc -l
725
thakis@MBP llvm-project % out/gn/bin/llvm-objdump --macho --weak-bind out/gn/lib/libclang.dylib | wc -l
542
Linking clang becomes a tiny bit faster with this patch:
x 100 0.67263818 0.77847815 0.69430709 0.69877208 0.017715892
+ 100 0.67209601 0.73323393 0.68600798 0.68917346 0.012824377
Difference at 95.0% confidence
-0.00959861 +/- 0.00428661
-1.37364% +/- 0.613449%
(Student's t, pooled s = 0.0154648)
This only happens if lld with the patch and lld without the patch are both
linked with an lld with the patch or both linked with an lld without the patch
(...or with ld64). I accidentally linked the lld with the patch with an lld
without the patch and the other way round at first. In that setup, no
difference is found. That makese sense, since having fewer weak imports will
make the linked output a bit faster too. So not only does this make linking
binaries such as clang a bit faster (since fewer exports need to be written to
the export trie by lld), the linked output binary binary is also a bit faster
(since dyld needs to process fewer dynamic imports).
This also happens to fix the one `check-clang` failure when using lld as host
linker, but mostly for silly reasons: See crbug.com/1183336, mostly comment 26.
The real bug here is that c-index-test links all of LLVM both statically and
dynamically, which is an ODR violation. Things just happen to work with this
patch.
So after this patch, check-clang, check-lld, check-llvm all pass with lld as
host linker :)
Differential Revision: https://reviews.llvm.org/D101080
- __got is in --bind output, so print that too (makes the test
a bit stronger)
- WEAK_DEFINES, BINDS_TO_WEAK are in the mach-o header, so
--private-header is enough, no need for --all-headers
(makes the test a bit easier to work with when it fails)
Differential Revision: https://reviews.llvm.org/D101065
We had got it backwards... the minimum version of the target
should be higher than the min version of the object files, presumably
since new platforms are backwards-compatible with older formats.
Fixes PR50078.
The original commit (aa05439c9c) broke many tests that had inputs too
new for our target platform (10.0). This commit changes the inputs to
target 10.0, which was the simpler thing to do, but we should really
just have our lit.local.cfg default to targeting 10.15... we're not
likely to ever have proper support for the older versions anyway. I will
follow up with a change to that effect.
Differential Revision: https://reviews.llvm.org/D101114
We had got it backwards... the minimum version of the target
should be higher than the min version of the object files, presumably
since new platforms are backwards-compatible with older formats.
Fixes PR50078.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D101114
This load command records a range spanning from the end of the load
commands to the end of the `__TEXT` segment. Presumably the kernel will encrypt
all this data.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D100973
This diff adds initial support for the legacy LC_VERSION_MIN_* load commands.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D100523
XCode 12 ships with mismatched platforms for these libraries,
so this hack is necessary...
Fixes PR49799.
Reviewed By: #lld-macho, gkm, smeenai
Differential Revision: https://reviews.llvm.org/D100913
codesign/libstuff checks that the `__LLVM` segment is directly
before `__LINKEDIT` by checking that `fileOff + fileSize == next segment
fileOff`. Previously, there would be gaps between the segments due to
the fact that their fileOffs are page-aligned but their fileSizes
aren't. In order to satisfy codesign, we page-align fileOff *before*
calculating fileSize. (I don't think codesign checks for the relative
ordering of other segments, so in theory we could do this just for
`__LLVM`, but ld64 seems to do it for all segments.)
Note that we *don't* round up the fileSize of the `__LINKEDIT` segment.
Since it's the last segment, it doesn't need to worry about contiguity;
in addition, codesign checks that the last (hidden) section in
`__LINKEDIT` covers the last byte of the segment, so if we rounded up
`__LINKEDIT`'s size we would have to do the same for its last section,
which is a bother.
While at it, I also addressed a FIXME in the linkedit-contiguity.s test
to cover more `__LINKEDIT` sections.
Reviewed By: #lld-macho, thakis, alexshap
Differential Revision: https://reviews.llvm.org/D100848
The minuend (but not the subtrahend) can reference a section.
Note that we do not yet properly validate that the subtrahend isn't
referencing a section; I've filed PR50034 to track that.
I've also extended the reloc-subtractor.s test to reorder symbols, to
make sure that the addends are being associated with the minuend (and not
the subtrahend) relocation.
Fixes PR49999.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D100804
An unfetched lazy symbol (undefined weak) should be considered to have its
original versionId which is VER_NDX_GLOBAL, instead of the lazy symbol's
versionId. (The original versionId cannot be non-VER_NDX_GLOBAL because a
undefined versioned symbol is an error.)
The regression was introduced in D77280 when making version scripts work
with lazy symbols fetched by LTO calls.
Fix PR49915
Differential Revision: https://reviews.llvm.org/D100624
Fix PR49897: if `__real_foo` has the isUsedInRegularObj bit set, we need to
retain `foo` in .symtab, even if `foo` is undefined. The new behavior will match
GNU ld.
Before the patch, we produced an R_X86_64_JUMP_SLOT relocation referencing the
index 0 undefined symbol, which would be erroed by glibc
(see f96ff3c0f8).
While here, fix another bug: if `__wrap_foo` does not exist, its initial binding
should be `foo`'s.
This diff creates an empty XAR file and copies it into
`__LLVM,__bundle`. Follow-up work will actually populate the contents of
that XAR.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100650
Change the default to facilitate GC for metadata section usage, so that they
don't need SHF_LINK_ORDER or SHF_GROUP just to drop the unhelpful rule (if they
want to be unconditionally retained, use SHF_GNU_RETAIN
(`__attribute__((retain))`) or linker script `KEEP`).
The dropped SHF_GROUP special case makes the behavior of -z start-stop-gc and -z
nostart-stop-gc closer to GNU ld>=2.37 (https://sourceware.org/PR27451).
However, we default to -z start-stop-gc (which actually matches more closely to
GNU ld before 2015-10 https://sourceware.org/PR19167), which is different from
modern GNU ld (which has the unhelpful rule to work around glibc). As a
compensation, we special case `__libc_` sections as a workaround for glibc<2.34
(https://sourceware.org/PR27492).
Since -z start-stop-gc as the default actually matches the traditional GNU ld
behavior, there isn't much to be aware of. There was a systemd usage which has
been fixed by https://github.com/systemd/systemd/pull/19144
The `e_flags` for a ELF file targeting the AVR ISA contains two fields at the time of writing:
- A 7-bit integer field specifying the ISA revision being targeted
- A 1-bit flag specifying whether the object files being linked are suited for applying the relaxations at link time
The linked ELF file is blessed with the arch revision shared among all the files.
The behaviour in case of mismatch is purposefully different than the one implemented in libbfd: LLD will raise a fatal error while libbfd silently picks a default value of `avr2`.
The relaxation-ready flag is handled as done by libbfd, in order for it to appear in the linked object every source object must be tagged with it.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99754
arm64_32 uses 32-bit GOT loads, so we should accept those
instructions in `ARM64Common::relaxGotLoad()` too.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100229
This could probably have been part of D99633, but I split it up to make
things a bit more reviewable. I also fixed some bugs in the implementation that
were masked through integer underflows when operating in 64-bit mode.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99823
From what I can tell, it's pretty similar to arm64. The two main differences
are:
1. No 64-bit relocations
2. Stub code writes to 32-bit registers instead of 64-bit
Plus of course the various on-disk structures like `segment_command` are using
the 32-bit instead of the 64-bit variants.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99822
This allows LLVM's LTO to internalize symbols that are not referenced
directly by regular objects. Naturally, this means we need to track
which symbols are referenced by regular objects. The approach taken here
is similar to LLD-COFF's: like the COFF port, we extend
`SymbolTable::insert()` to set the isVisibleToRegularObj bit. (LLD-ELF
relies on the Symbol constructor and `Symbol::mergeProperties()`, but
the Mach-O port does not have a `mergeProperties()` equivalent.)
From what I can tell, ld64 (which uses libLTO) doesn't do this
optimization at all. I'm not even sure libLTO provides a way to do this.
Not having ld64's behavior as a reference implementation is unfortunate;
instead, I am relying on LLD-ELF/COFF's behavior as references while
erring on the conservative side. In particular, LLD-MachO will only do
this optimization for executables right now.
We also don't attempt it when `-flat_namespace` is used -- otherwise
we'd need scan the symbol table to find matches for every un-namespaced
symbol reference, which is expensive.
internalize.ll is based off the LLD-ELF tests `internalize-basic.ll` and
`internalize-undef.ll`. Looks like @davide added some of LLD-ELF's internalize
tests, so adding him as a reviewer...
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99105
This is a different approach from D98993 that should achieve most of the
same benefit. The two changes are:
1. Sort the list of associated child sections by section name
2. Do not consider associated sections to have children themselves
This fixes the main issue, which was that we sometimes considered an
.xdata section to have a child .pdata section. That lead to slow links
and larger binaries (less xdata folding).
Otherwise, this should be NFC: we go back to ignoring .debug/.gljmp and
other metadata sections rather than only looking at pdata/xdata. We
discovered that we do care about other associated sections, like ASan
global registration metadata.
chmod tries to be very helpful on some platforms and prevent naive mistakes, by warning the user. This results in the following error during the test:
```chmod: ...resolution-err.ll.tmp.resolution.txt: new permissions are r--rw-rw-, not r--r--r--```
To fix the test, call chmod with u.
Differential Revision: https://reviews.llvm.org/D100417
arm64_32 uses 32-bit GOT loads, so we should accept those
instructions in `ARM64Common::relaxGotLoad()` too.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D100229
It doesn't make sense to take just the base filename for archives when we emit
the full path for object files. (LLD-ELF emits the full path too.)
This will also make it easier to write a proper test for {D100147}.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D100357
This could probably have been part of D99633, but I split it up to make
things a bit more reviewable. I also fixed some bugs in the implementation that
were masked through integer underflows when operating in 64-bit mode.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99823
From what I can tell, it's pretty similar to arm64. The two main differences
are:
1. No 64-bit relocations
2. Stub code writes to 32-bit registers instead of 64-bit
Plus of course the various on-disk structures like `segment_command` are using
the 32-bit instead of the 64-bit variants.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99822
Swift builds seem to use it. All it requires is emitting the
corresponding paths as STABS.
Fixes llvm.org/PR49385.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D100076
The test is loosely based off LLD-ELF's `thinlto.ll`. However, I
found that test questionable because the the -save_temps behavior it
checks for is identical regardless of whether we are running in single-
or multi-threaded mode. I tried writing a test based on `--time-trace`
but couldn't get it to run deterministically... so I've opted to just
skip checking that behavior for now.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D99356
If either `time-trace-granularity` or `time-trace-file` is specified, then don't make users specify `-time-trace`.
It seems silly that I have to type all three options, eg, `-time-trace -time-trace-file=- -time-trace-granularity=...`.
Differential Revision: https://reviews.llvm.org/D100011
We had been giving them a string index of zero, which actually corresponds to a
string with a single space due to {D89639}.
This was far from obvious in the old test because llvm-nm doesn't quote the
symbol names, making the empty string look identical to a string of a single
space. `dsymutil -s` quotes its strings, so I've changed the test accordingly.
Fixes llvm.org/PR48714. Thanks @clayborg for the tips!
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D100003
I noticed two problems with the previous implementation:
* N_ALT_ENTRY symbols weren't being handled correctly -- they should
determine the size of the previous symbol, even though they don't
cause a new section to be created
* The last symbol in a section had its size calculated wrongly;
the first subsection's size was used instead of the last one
I decided to take the opportunity to refactor things as well, mainly to
realize my observation
[here](https://reviews.llvm.org/D98837#inline-931511) that we could
avoid doing a binary search to match symbols with subsections. I think
the resulting code is a bit simpler too.
N Min Max Median Avg Stddev
x 20 4.31 4.43 4.37 4.3775 0.034162922
+ 20 4.32 4.43 4.38 4.3755 0.02799906
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D99972
"stub" is a bit too overloaded... we were using it to refer to
TAPI files, but it's also the name for the PLT trampolines in Mach-O.
Going ahead, let's just use "TAPI" or ".tbd" to refer to TAPI stuff.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D99807
From the PowerPC ELFv2 ABI section 4.2.3. Global Offset Table.
```
The GOT consists of an 8-byte header that contains the TOC base (the first TOC
base when multiple TOCs are present), followed by an array of 8-byte addresses.
```
Due to the introduction of PC Relative code it is now possible to require a GOT
without having a .TOC. symbol in the object that is being linked. Since LLD uses
the .TOC. symbol to determine whether or not a GOT is required the GOT header is
not setup correctly and the 8-byte header is missing.
This patch allows the Power PC GOT setup to happen when an element is added to
the GOT instead of at the very begining. When this header is added a .TOC.
symbol is also added.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D91426
The functionality was originally added in {D95265}, but the test in that
diff only checked if `-ObjC` would cause bitcode containing ObjC class
symbols to be loaded. It neglected to test for bitcode containing
categories but no class symbols.
This diff also changes the lto-archive.ll test to use `-why_load`
instead of inspecting the output binary's symbol table. This is
motivated by the stacked diff {D99105}, which will hide irrelevant
bitcode symbols.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D99215
Summary: We needed to use `Joined` instead of `Flag`. This wasn't caught
because the relevant test that was copied from LLD-ELF was still
invoking LLD-ELF instead of LLD-MachO...
Differential Revision: https://reviews.llvm.org/D99313
Pretty simple code-wise. Also threw in some refactoring:
* Put the functionStartSection under Writer instead of InStruct, since
it doesn't need to be accessed outside of Writer
* Adjusted the test to put all files under the temp dir instead of at
the top-level
* Added some CHECK-LABELs to make it clearer where the function starts
data is
Differential Revision: https://reviews.llvm.org/D99112
Don't run the 'tar' tool in a cleared environment with only the
LANG variable set, just set LANG on top of the existing environment.
If the 'tar' tool is an MSYS based tool, running it in a Windows
Container hangs if all environment variables are cleared - in
particular, the USERPROFILE variable needs to be kept intact.
This is the same issue fixed as was fixed in other places in
9de63b2e05, but contrary to running
the actual tests, running with an as-cleared-as-possible environment
here is less important.
Differential Revision: https://reviews.llvm.org/D99304
This patch renames the "Initial" member of WasmLimits to the name used
in the spec, "Minimum".
In the core WebAssembly specification, the Limits data type has one
required "min" member and one optional "max" member, indicating the
minimum required size of the corresponding table or memory, and the
maximum size, if any.
Although the WebAssembly spec does instantiate locally-defined tables
and memories with the initial size being equal to the minimum size, it
can't impose such a requirement for imports. It doesn't make sense to
require an initial size for a memory import, for example. The compiler
can only sensibly express the minimum and maximum sizes.
See
https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits
for a related discussion that agrees that the right name of "initial" is
"minimum" when querying the type of a table or memory from JavaScript.
(Of course it still makes sense for JS to speak in terms of an initial
size when it explicitly instantiates memories and tables.)
Differential Revision: https://reviews.llvm.org/D99186
This reverts commit 4876ba5b2d.
Third-attemp relanding D98559, new change:
- explicitly cast enum to underlying type to avoid ambiguity (workaround to clang's bug).
This reverts commit 3c21166a94.
The build is broken (clang-8 host compiler):
lld/MachO/DriverUtils.cpp:271:8: error: use of overloaded operator '<<' is ambiguous (with operand types 'llvm::raw_fd_ostream' and 'lld::macho::DependencyTracker::DepOpCode')
os << opcode;
~~ ^ ~~~~~~
This reverts commit 9670d2e4af.
Second attemp to reland D98559. New changes:
- inline functions removed from cpp file.
- updated tests to use CHECK-DAG instead of CHECK-NEXT
- fixed ambiguous "<<" operator by switching `char` to uint8_t
The only known reason why ICF should not merge otherwise identical
sections with differing associated sections has to do with exception
handling tables. It's not clear what ICF should do when there are other
kinds of associated sections. In every other case when this has come up,
debug info and CF guard metadata, we have opted to make ICF ignore the
associated sections.
For comparison, ELF doesn't do anything for comdat groups. Instead,
.eh_frame is parsed to figure out if a section has an LSDA, and if so,
ICF is disabled.
Another issue is that the order of associated sections is not defined.
We have had issues in the past (crbug.com/1144476) where changing the
order of the .xdata/.pdata sections in the object file lead to large ICF
slowdowns.
To address these issues, I decided it would be best to explicitly
consider only .pdata and .xdata sections during ICF. This makes it easy
to ignore the object file order, and I think it makes the intention of
the code clearer.
I've also made the children() accessor return an empty list for
associated sections. This mostly only affects ICF and GC. This was the
behavior before I made this a linked list, so the behavior change should
be good. This had positive effects on chrome.dll: more .xdata sections
were merged that previously could not be merged because they were
associated with distinct .pdata sections.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D98993
This reverts commit 2554b95db5.
Relanding [lld-macho] Implement -dependency_info (D98559) with changes:
- inline functions removed from cpp file.
- updated tests to not check libSystem.tbd with other input files (because of possible indeterministic ordering)