`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.
With that in place, we then run the section through ICF.
This change is about perf-neutral when linking chromium_framework.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D105045
This is a pretty big refactoring diff, so here are the motivations:
Previously, ICF ran after scanRelocations(), where we emitting
bind/rebase opcodes etc. So we had a bunch of redundant leftovers after
ICF. Having ICF run before Writer seems like a better design, and is
what LLD-ELF does, so this diff refactors it accordingly.
However, ICF had two dependencies on things occurring in Writer: 1) it
needs literals to be deduplicated beforehand and 2) it needs to know
which functions have unwind info, which was being handled by
`UnwindInfoSection::prepareRelocations()`.
In order to do literal deduplication earlier, we need to add literal
input sections to their corresponding output sections. So instead of
putting all input sections into the big `inputSections` vector, and then
filtering them by type later on, I've changed things so that literal
sections get added directly to their output sections during the 'gather'
phase. Likewise for compact unwind sections -- they get added directly
to the UnwindInfoSection now. This latter change is not strictly
necessary, but makes it easier for ICF to determine which functions have
unwind info.
Adding literal sections directly to their output sections means that we
can no longer determine `inputOrder` from iterating over
`inputSections`. Instead, we store that order explicitly on
InputSection. Bloating the size of InputSection for this purpose would
be unfortunate -- but LLD-ELF has already solved this problem: it reuses
`outSecOff` to store this order value.
One downside of this refactor is that we now make an additional pass
over the unwind info relocations to figure out which functions have
unwind info, since want to know that before `processRelocations()`. I've
made sure to run that extra loop only if ICF is enabled, so there should
be no overhead in non-optimizing runs of the linker.
The upside of all this is that the `inputSections` vector now contains
only ConcatInputSections that are destined for ConcatOutputSections, so
we can clean up a bunch of code that just existed to filter out other
elements from that vector.
I will test for the lack of redundant binds/rebases in the upcoming
cfstring deduplication diff. While binds/rebases can also happen in the
regular `.text` section, they're more common in `.data` sections, so it
seems more natural to test it that way.
This change is perf-neutral when linking chromium_framework.
Reviewed By: oontvoo
Differential Revision: https://reviews.llvm.org/D105044
Everything (including test) modified from ELF/COFF. Using the same syntax
(--lto-O3, etc) as ELF.
Differential Revision: https://reviews.llvm.org/D105223
Previously, we only applied the renames to
ConcatOutputSections.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105079
SymtabSection::emitStabs() writes the symbol table in the order
of externalSymbols, which has the order of symtab->getSymbols(),
which is just the order symbols are added to the symbol table.
In practice, symbols in the symbol files of input .o files are
sorted, but since that's not guaranteed we sort them in
ObjFile::parseSymbols(). To make sure several symbols with the same
address keep the order they're in the input file, we have to use
stable_sort().
In practice, std::sort() on already-sorted inputs won't change the order
of just adjacent elements, and while in theory std::sort() could use a
random pivot, in practice the code should be deterministic as it was
previously too.
But now lld/test/MachO/stabs.s passes with LLVM_ENABLE_EXPENSIVE_CHECKS=ON
(the last test that was failing with that set).
Fixes a regression from D99972.
While here, remove an empty section in stabs.s and move
.subsections_via_symbols to the end where it usually is (this part no
behavior change).
Differential Revision: https://reviews.llvm.org/D105071
Fixes PR50637.
Downstream bug: https://crbug.com/1218958
Currently, we split __cstring along symbol boundaries with .subsections_via_symbols
when not deduplicating, and along null bytes when deduplicating. This change splits
along null bytes unconditionally, and preserves original alignment in the non-
deduplicated case.
Removing subsections-section-relocs.s because with this change, __cstring
is never reordered based on the order file.
Differential Revision: https://reviews.llvm.org/D104919
The two different thread_local_regular sections (__thread_data and
more_thread_data) had nondeterminstic ordering for two reasons:
1. https://reviews.llvm.org/D102972 changed concatOutputSections
from MapVector to DenseMap, so when we iterate it to make
output segments, we would add the two sections to the __DATA
output segment in nondeterministic order.
2. The same change also moved the two stable_sort()s for segments
and sections to sort(). Since sections with assigned priority
(such as TLV data) have the same priority for all sections,
this is incorrect -- we must use stable_sort() so that the
initial (input-order-based) order remains.
As a side effect, we now (deterministically) put the __common
section in front of __bss (while previously we happened to
put it after it). (__common and __bss are both zerofill so
both have order INT_MAX, but common symbols are added to
inputSections before normal sections are collected.)
Makes lld/test/MachO/tlv.s and lld/test/MachO/tlv-dylib.s pass with
LLVM_ENABLE_EXPENSIVE_CHECKS=ON.
Differential Revision: https://reviews.llvm.org/D105054
Literal sections can be deduplicated before running ICF. That makes it
easy to compare them during ICF: we can tell if two literals are
constant-equal by comparing their offsets in their OutputSection.
LLD-ELF takes a similar approach.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104671
libunwind uses unwind info to find the function address belonging
to the current instruction pointer. libunwind/src/CompactUnwinder.hpp's
step functions read functionStart for UNWIND_X86_64_MODE_STACK_IND
(and for nothing else), so these encodings need a dedicated entry
per function, so that the runtime can get the stacksize off the
`subq` instrunction in the function's prologue.
This matches ld64.
(CompactUnwinder.hpp from https://opensource.apple.com/source/libunwind/
also reads functionStart in a few more cases if `SUPPORT_OLD_BINARIES` is set,
but it defaults to 0, and ld64 seems to not worry about these additional
cases.)
Related upstream bug: https://crbug.com/1220175
Differential Revision: https://reviews.llvm.org/D104978
`icfEqClass` only makes sense on ConcatInputSections since (in contrast
to literal sections) they are deduplicated as an atomic unit.
Similarly, `hasPersonality` and `replacement` don't make sense on
literal sections.
This mirrors LLD-ELF, which stores `icfEqClass` only on non-mergeable
sections.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D104670
We previously did this only for x86_64, but it turns out that
arm64 needs this too -- see PR50791.
Ultimately this is a hack, and we should avoid over-aligning strings
that don't need it. I'm just having a hard time figuring out how ld64 is
determining the right alignment.
No new test for this since we were already testing this behavior for
x86_64, and extending it to arm64 seems too trivial.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104835
Add tests for pending TODOs, plus some global cleanups:
* No fold: func has personality/LSDA
* Fold: reference to absolute symbol with different name but identical value
* No fold: reloc references to absolute symbols with different values
* No fold: N_ALT_ENTRY symbols
Differential Revision: https://reviews.llvm.org/D104721
"""Bitcode symbols only exist before LTO runs, and only serve the purpose of
resolving visibility so LTO can better optimize. Running LTO creates ObjFiles
from BitcodeFiles, and those ObjFiles contain regular Defined symbols (with
isec set and all) that will replace the bitcode symbols. So things should
(hopefully) work as-is :)"""
-- https://reviews.llvm.org/rGdbbc8d8333f29cf4ad6f4793da1adf71bbfdac69#inline-6081
Fixes PR50529. With this, lld-linked Chromium base_unittests passes on arm macs.
Surprisingly, no measurable impact on link time.
Differential Revision: https://reviews.llvm.org/D104681
The variable used to need the wider scope, but doesn't after the
reland. See LC_LINKER_OPTIONS-related discussion on
https://reviews.llvm.org/D104353 for background.
Real zerofill sections go after __thread_bss, since zerofill sections
must all be at the end of their segment and __thread_bss must be right
after __thread_data.
Works fine already, but wasn't tested as far as I can tell.
Also tweak comment about zerofill sections a bit.
No behavior change.
Differential Revision: https://reviews.llvm.org/D104609
We make it less than INT_MAX in order not to conflict with the ordering
of zerofill sections, which must always be placed at the end of their
segment.
This is the more structural fix for the issue addressed in {D104596}.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104607
The exact location doesn't matter, but it should be in front
of __thread_bss. We put it right in front of __thread_data
which is where ld64 seems to put it as well.
Fixes PR50769.
(As mentioned on the bug, there is probably a more structural
fix too, see comment 5. If we don't address this, it's likely
we'll run into this again with other synthetic sections. But
for now, let's fix the immediate breakage.)
Differential Revision: https://reviews.llvm.org/D104596
...instead of S_NON_LAZY_SYMBOL_POINTERS. This matches ld64.
Part of PR50769.
While here, also remove an old TODO that was done in D87178.
Differential Revision: https://reviews.llvm.org/D104594
findLibrary() returned a StringRef while findFramework & other helper
functions returned std::strings. Standardize on std::string.
(I initially tried making the helper functions all return StringRefs,
but I realized we shouldn't return input StringRefs since their
lifetimes would not be obvious from the calling code.)
Previously, we asserted that such a case was invalid, but in fact
`ld -r` can emit such symbols if the input contained a (true) private
extern, or if it contained a symbol started with "L".
Non-extern symbols marked as private extern are essentially equivalent
to regular TU-scoped symbols, so no new functionality is needed.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104502
The `icf` command-line option is not present in ld64, so it should use the LLD option syntax, which begins with double dashes and separates primary option from any suboption with the equal sign.
Differential Revision: https://reviews.llvm.org/D104548
ICF = Identical C(ode|OMDAT) Folding
This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.
`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.
We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.
Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
N Min Max Median Avg Stddev
x 20 4.27 4.44 4.34 4.349 0.043029977
+ 20 4.37 4.46 4.405 4.4115 0.025188761
Difference at 95.0% confidence
0.0625 +/- 0.0225658
1.43711% +/- 0.518873%
(Student's t, pooled s = 0.0352566)
```
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D103292
We need to dedup archive loads (similar to what we do for dylib
loads).
I noticed this issue after building some Swift stuff that used
`-force_load_swift_libs`, as it caused some Swift archives to be loaded
many times.
Reviewed By: #lld-macho, thakis, MaskRay
Differential Revision: https://reviews.llvm.org/D104353
I removed them in rG5de7467e982 but @thakis pointed out that
they were useful to keep, so here they are again. I've also converted
the `!isCoalescedWeak()` asserts into `!shouldOmitFromOutput()` asserts,
since the latter check subsumes the former.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104169
It's a warning in ld64. While having LLD be stricter would be nice, it
makes it harder for it to be a drop-in replacement into existing builds.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104333
I *think* this is the fix, with the regression being introduced by
D104199. Not 100% sure since MSAN isn't supported on my Mac machine, and
it'll take some time to spin up a Linux box... will look at the
buildbots for answers
I wanted to see if we would get any perf wins out of this, but
it doesn't seem to be the case. But it still seems worth committing.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D104200
We don't need to define any special behavior for this section,
so creating a subclass for it is redundant.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104199
`outSecFileOff` and the associated `getFileOffset()` accessors were
unnecessary.
For all the cases we care about, `outSecFileOff` is the same as
`outSecOff`. The only time they deviate is if there are zerofill
sections within a given segment. But since zerofill sections are always
at the end of a segment, the only sections where the two values deviate
are zerofill sections themselves. And we never actually query the
outSecFileOff of zerofill sections.
As for `getFileOffset()`, the only place it was being used was to
calculate the offset of the entry symbol. However, we can compute that
value by just taking the difference between the address of the entry
symbol and the address of the Mach-O header. In fact, this appears to be
what ld64 itself does. This difference is the same as the file offset as
long as there are no intervening zerofill sections, but since `__text`
is the first section in `__TEXT`, this never happens, so our previous
use of `getFileOffset()` was not wrong -- just inefficient.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D104177
Sort the addresses stored in FunctionStarts section.
Previously we were encoding potentially large numbers (due to unsigned overflow).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D103662
D103977 broke a bunch of stuff as I had only tested the release build
which eliminated asserts.
I've retained the asserts where possible, but I also removed a bunch
instead of adding a whole lot of verbose ConcatInputSection casts.
Literal sections are not atomically live or dead. Rather,
liveness is tracked for each individual literal they contain. CStrings
have their liveness tracked via a `live` bit in StringPiece, and
fixed-width literals have theirs tracked via a BitVector.
The live-marking code now needs to track the offset within each section
that is to be marked live, in order to identify the literal at that
particular offset.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W
with both `-dead_strip` and `--deduplicate-literals`, with and without this diff
applied:
```
N Min Max Median Avg Stddev
x 20 4.32 4.44 4.375 4.372 0.03105174
+ 20 4.3 4.39 4.36 4.3595 0.023277502
No difference proven at 95.0% confidence
```
This gives us size savings of about 0.4%.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103979
This is motivated by an upcoming diff in which the
WordLiteralInputSection ctor sets itself up based on the value of its
section flags. As such, it needs to be passed the `flags` value as part
of its ctor parameters, instead of having them assigned after the fact
in `parseSection()`. While refactoring code to make that possible, I
figured it would make sense for the other InputSections to also take
their initial values as ctor parameters.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103978
These fields currently live in the parent InputSection class,
but they should be specific to ConcatInputSection, since the other
InputSection classes (that contain literals) aren't atomically live or
dead -- rather their component string/int literals should have
individual liveness states. (An upcoming diff will add liveness bits for
StringPieces and fixed-sized literals.)
I also factored out some asserts for isCoalescedWeak() in MarkLive.cpp.
We now avoid putting coalesced sections in the `inputSections` vector,
so we don't have to check/assert against it everywhere.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103977
Conceptually, the implementation is pretty straightforward: we put each
literal value into a hashtable, and then write out the keys of that
hashtable at the end.
In contrast with ELF, the Mach-O format does not support variable-length
literals that aren't strings. Its literals are either 4, 8, or 16 bytes
in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since
we don't need to worry about overly-long values, we should be able to do
a faster job by just hashing.
That said, the implementation right now is far from optimal, because we
add to those hashtables serially. To parallelize this, we'll need a
basic concurrent hashtable (only needs to support concurrent writes w/o
interleave reads), which shouldn't be to hard to implement, but I'd like
to punt on it for now.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.27 4.39 4.315 4.3225 0.033225703
+ 20 4.36 4.82 4.44 4.4845 0.13152846
Difference at 95.0% confidence
0.162 +/- 0.0613971
3.74783% +/- 1.42041%
(Student's t, pooled s = 0.0959262)
This corresponds to binary size savings of 2MB out of 335MB, or 0.6%.
It's not a great tradeoff as-is, but as mentioned our implementation can
be signficantly optimized, and literal dedup will unlock more
opportunities for ICF to identify identical structures that reference
the same literals.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D103113
Not sure sectionType() carries its weight, but while we have it
we should use it consistently.
No behavior change.
Differential Revision: https://reviews.llvm.org/D104027
Be less clever when writing the indirect symbols in LC_DYSYMTAB:
lld used to make point __stubs and __la_symbol_ptr point at the
same bytes in the indirect symbol table in the __LINKEDIT segment.
That confused strip, so write the same bytes twice and make
__stubs and __la_symbol_ptr point at one copy each, so that they
don't share data. This unconfuses strip, and seems to be what ld64
does too, so hopefully tools are generally more used to this.
This makes the output binaries a bit larger, but not much: 4 bytes
for roughly each called function from a dylib and each weak function.
Chromium Framewoork grows by 6536 bytes, clang-format by a few hundred.
With this, `strip -x Chromium\ Framework` works (244 MB before stripping
to 171 MB after stripping, compared to 236 MB=>164 MB with ld64). Running
strip without `-x` produces the same error message now for lld-linked
Chromium Framework as for when using ld64 as a linker.
`strip clang-format` also works now but didn't previously.
Fixes PR50657.
Differential Revision: https://reviews.llvm.org/D104081
For debugging dylib loading, it's useful to have some insight into what
the linker is doing.
ld64 has the undocumented RC_TRACE_DYLIB_SEARCHING env var
for this printing dylib search candidates.
This adds a flag --print-dylib-search to make lld print the seame information.
It's useful for users, but also for writing tests. The output is formatted
slightly differently than ld64, but we still support RC_TRACE_DYLIB_SEARCHING
to offer at least a compatible way to trigger this.
ld64 has both `-print_statistics` and `-trace_symbol_output` to enable
diagnostics output. I went with "print" since that seems like a more
straightforward name.
Differential Revision: https://reviews.llvm.org/D103985
This is important for Frameworks, which are usually symlinks.
ld64 gets this right for @rpath that's replaced with @loader_path, but not for
bare @loader_path -- ld64's code calls realpath() in that case too, but ignores
the result.
ld64 somehow manages to find libbar1.dylib in the test without the
explicit `-rpath` in Foo1. I don't understand why or how. But this
change is a step forward and fixes an immediate problem I'm having,
so let's start with this :)
Differential Revision: https://reviews.llvm.org/D103990
It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103709
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.
**Alignment Issues**
Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.
I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.
Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.
In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.
With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.
It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.
**Performance Numbers**
CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.99 4.14 4.015 4.0365 0.0492336
Difference at 95.0% confidence
0.0865 +/- 0.027245
2.18987% +/- 0.689746%
(Student's t, pooled s = 0.0425673)
As expected, cstring merging incurs some non-trivial overhead.
When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.89 4.02 3.935 3.9435 0.043197831
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D102964
When a library "host"'s reexports change their installName with
`$ld$os10.11$install_name$host`, we used to write a load command for "host" but
write the version numbers of the reexport instead of "host". This fixes that.
I first thought that the rule is to take the version numbers from the library
that originally had that install name (implemented in D103819), but that's not
what ld64 seems to be doing: It takes the version number from the first dylib
with that install name it loads, and it loads the reexporting library before
the reexports. We already did most of that, we just added reexports before the
reexporter. After this change, we add the reexporter before the reexports.
Addresses https://bugs.llvm.org/show_bug.cgi?id=49800#c11 part 1.
(ld64 seems to add reexports after processing _all_ files on the command line,
while we add them right after the reexporter. For the common case of reexport +
$ld$ symbol changing back to the exporter name, this doesn't make a difference,
but you can construct a case where it does. I expect this to not make a
difference in practice though.)
Differential Revision: https://reviews.llvm.org/D103821
Also adjust a few comments, and move the DylibFile comment talking about
umbrella next to the parameter again.
Differential Revision: https://reviews.llvm.org/D103783
The flag to set it is called `-install_name`, and it's called `installName` in tbd files.
No behavior change.
Differential Revision: https://reviews.llvm.org/D103776
This diff adds first bits to support special symbols $ld$previous* in LLD.
$ld$* symbols modify properties/behavior of the library
(e.g. its install name, compatibility version or hide/add symbols)
for specific target versions.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D103505
ca6751043d added a dependency on XAR (at
least for the shared libs build), so without this change we get the
following linker error:
Undefined symbols for architecture x86_64:
"_xar_close", referenced from:
lld::macho::BitcodeBundleSection::finalize() in SyntheticSections.cpp.o
Reviewed By: #lld-macho, int3, thakis
Differential Revision: https://reviews.llvm.org/D100999
D103423 neglected to call `parseReexports()` for nested TBD
documents, leading to symbol resolution failures when trying to look up
a symbol nested more than one level deep in a TBD file. This fixes the
regression and adds a test.
It also appears that `umbrella` wasn't being set properly when calling
`parseLoadCommands` -- it's supposed to resolve to `this` if `nullptr`
is passed. I didn't write a failing test case for this but I've made
`umbrella` a member so the previous behavior should be preserved.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103586
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.
Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).
Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:
% ministat lld_*
x lld_nostrip.txt
+ lld_strip.txt
N Min Max Median Avg Stddev
x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794
+ 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651
Difference at 95.0% confidence
-0.144711 +/- 0.0336749
-3.60967% +/- 0.839989%
(Student's t, pooled s = 0.0358398)
This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols
It's possible it interacts with more features I didn't think of,
of course.
I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests
Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.
Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
(`.no_dead_strip` in asm)
Fixes PR49276.
Differential Revision: https://reviews.llvm.org/D103324
I forgot to move the message() call around as requested in D103428
before committing that change. Move it now.
Also, improve the ordinal uniq'ing comment. I hadn't realized that the
distinct-but-identical files happen with --reproduce and not in general.
No behavior change.
Differential Revision: https://reviews.llvm.org/D103522
We used to not print dylibs referenced by other dylibs in `-t` mode. This
affected reexports, and with `-flat_namespace` also just dylibs loaded by
dylibs. Now we print them.
Fixes PR49514.
Differential Revision: https://reviews.llvm.org/D103428
In all of these cases, the functions could simply return a nullptr instead of {}.
There is no case where Optional<nullptr> has a special meaning.
Differential Revision: https://reviews.llvm.org/D103489
In some cases, we end up with several distinct DylibFiles that
have the same install name. Only emit a single LC_LOAD_DYLIB in
those cases.
This happens in 3 cases I know of:
1. Some tbd files are symlinks. libpthread.tbd is a symlink against
libSystem.tbd for example, so `-lSystem -lpthread` loads
libSystem.tbd twice. We could (and maybe should) cache loaded
dylibs by realpath() to catch this.
2. Some tbd files are copies of each other. For example,
CFNetwork.framework/CFNetwork.tbd and
CFNetwork.framework/Versions/A/CFNetwork.tbd are two distinct
copies of the same file. The former is found by
`-framework CFNetwork` and the latter by the reexport in
CoreServices.tbd. We could conceivably catch this by
making `-framework` search look in `Versions/Current` instead
of in the root, and/or by using a content hash to cache
tbd files, but that's starting to sound complicated.
3. Magic $ld$ symbol processing can change the install name of
a dylib based on the target platform_version. Here, two
truly distinct dylibs can have the same install name.
So we need this code to deal with (3) anyways. Might as well use
it for 1 and 2, at least for now :)
With this (and D103430), clang-format links in the same dylibs
when linked with lld and ld64.
Differential Revision: https://reviews.llvm.org/D103488
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed
This matches ld64.
Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.
With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)
Differential Revision: https://reviews.llvm.org/D103430
loadDylib() keeps a name->DylibFile cache, but it only writes
to the cache once the DylibFile constructor has completed.
So dylib loads done recursively from the DylibFile constructor
wouldn't use the cache.
Now, we load additional dylibs after writing to the cache,
which means the cache now gets used for dylibs loaded because
they're referenced from other dylibs.
Related to PR49514 and PR50101, but no dramatic behavior change in itself.
(Technically we no longer crash when a tbd file reexports itself,
but that doesn't happen in practice. We now accept it silently instead
of crashing; ld64 has a diag for the reexport cycle.)
Differential Revision: https://reviews.llvm.org/D103423
.s files with `-g` generate __debug_aranges on darwin/arm64 for some
reason, and those lead to `nullptr` symbols. Don't crash on that.
Fixes PR50517.
Differential Revision: https://reviews.llvm.org/D103350
This diff paves the way for {D102964} which adds a new kind of
InputSection.
We previously maintained section ordering implicitly: we created
InputSections as we parsed each file in command-line order, and passed
on this ordering when we created OutputSections and OutputSegments by
iterating over these InputSections. The implicitness of the ordering
made it difficult to refactor the code to e.g. handle a new type of
InputSection. As such, I've codified the ordering explicitly via
`inputOrder` fields. This also allows us to use `sort` instead of
`stable_sort`.
Benchmarking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.23 4.35 4.27 4.274 0.030157481
+ 20 4.24 4.38 4.27 4.2815 0.033759989
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102972
The ELF format has the concept of merge sections (marked by SHF_MERGE),
which contain data that can be safely deduplicated. The Mach-O
equivalents are called literal sections (marked by S_CSTRING_LITERALS or
S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word
'merge', to avoid confusion, I've renamed our MergedOutputSection to
ConcatOutputSection. I believe it's a more descriptive name too.
This renaming sets the stage for {D102964}.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102971
* Move `static_asserts` into cpp instead of header file. I noticed they
had been separated from the main class definition in the header, so I
set about to clean that up, then figured it made more sense as part of
the cpp file so as not to incur unnecessary compile-time overhead.
* Remove unnecessary `virtual`s
* Remove unnecessary comment / reword another comment
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
In LC_DYSYMTAB, private externs were still emitted as exported symbols instead
of as locals.
Fixes PR50373. See bug for details.
Differential Revision: https://reviews.llvm.org/D102662
That way, it's done only once instead of every time shouldExportSymbol() is
called.
Possibly a bit faster:
% ministat at_main at_symtodo
x at_main
+ at_symtodo
N Min Max Median Avg Stddev
x 30 3.9732189 4.114846 4.024621 4.0304692 0.037022865
+ 30 3.93766 4.0510042 3.9973931 3.991469 0.028472565
Difference at 95.0% confidence
-0.0390002 +/- 0.0170714
-0.967635% +/- 0.423559%
(Student's t, pooled s = 0.0330256)
In other runs with n=30 it makes no perf difference, so maybe it's just noise.
But being able to quickly and conveniently answer "is this symbol exported?"
is useful for fixing PR50373 and for implementing -dead_strip, so this seems
like a good change regardless.
No behavior change.
Differential Revision: https://reviews.llvm.org/D102661
This diff changes the type of the argument of isCodeSection to const InputSection *.
NFC.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D102664