It causes libraries whose names start with "swift" to be force-loaded.
Note that unlike the more general `-force_load`, this flag only applies
to libraries specified via LC_LINKER_OPTIONS, and not those passed on
the command-line. This is what ld64 does.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103709
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.
**Alignment Issues**
Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.
I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.
Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.
In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.
With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.
It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.
**Performance Numbers**
CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.99 4.14 4.015 4.0365 0.0492336
Difference at 95.0% confidence
0.0865 +/- 0.027245
2.18987% +/- 0.689746%
(Student's t, pooled s = 0.0425673)
As expected, cstring merging incurs some non-trivial overhead.
When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.
N Min Max Median Avg Stddev
x 20 3.91 4.03 3.935 3.95 0.034641016
+ 20 3.89 4.02 3.935 3.9435 0.043197831
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D102964
When a library "host"'s reexports change their installName with
`$ld$os10.11$install_name$host`, we used to write a load command for "host" but
write the version numbers of the reexport instead of "host". This fixes that.
I first thought that the rule is to take the version numbers from the library
that originally had that install name (implemented in D103819), but that's not
what ld64 seems to be doing: It takes the version number from the first dylib
with that install name it loads, and it loads the reexporting library before
the reexports. We already did most of that, we just added reexports before the
reexporter. After this change, we add the reexporter before the reexports.
Addresses https://bugs.llvm.org/show_bug.cgi?id=49800#c11 part 1.
(ld64 seems to add reexports after processing _all_ files on the command line,
while we add them right after the reexporter. For the common case of reexport +
$ld$ symbol changing back to the exporter name, this doesn't make a difference,
but you can construct a case where it does. I expect this to not make a
difference in practice though.)
Differential Revision: https://reviews.llvm.org/D103821
Also adjust a few comments, and move the DylibFile comment talking about
umbrella next to the parameter again.
Differential Revision: https://reviews.llvm.org/D103783
The flag to set it is called `-install_name`, and it's called `installName` in tbd files.
No behavior change.
Differential Revision: https://reviews.llvm.org/D103776
This diff adds first bits to support special symbols $ld$previous* in LLD.
$ld$* symbols modify properties/behavior of the library
(e.g. its install name, compatibility version or hide/add symbols)
for specific target versions.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D103505
ca6751043d added a dependency on XAR (at
least for the shared libs build), so without this change we get the
following linker error:
Undefined symbols for architecture x86_64:
"_xar_close", referenced from:
lld::macho::BitcodeBundleSection::finalize() in SyntheticSections.cpp.o
Reviewed By: #lld-macho, int3, thakis
Differential Revision: https://reviews.llvm.org/D100999
D103423 neglected to call `parseReexports()` for nested TBD
documents, leading to symbol resolution failures when trying to look up
a symbol nested more than one level deep in a TBD file. This fixes the
regression and adds a test.
It also appears that `umbrella` wasn't being set properly when calling
`parseLoadCommands` -- it's supposed to resolve to `this` if `nullptr`
is passed. I didn't write a failing test case for this but I've made
`umbrella` a member so the previous behavior should be preserved.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D103586
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.
Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).
Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:
% ministat lld_*
x lld_nostrip.txt
+ lld_strip.txt
N Min Max Median Avg Stddev
x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794
+ 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651
Difference at 95.0% confidence
-0.144711 +/- 0.0336749
-3.60967% +/- 0.839989%
(Student's t, pooled s = 0.0358398)
This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols
It's possible it interacts with more features I didn't think of,
of course.
I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests
Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.
Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
(`.no_dead_strip` in asm)
Fixes PR49276.
Differential Revision: https://reviews.llvm.org/D103324
I forgot to move the message() call around as requested in D103428
before committing that change. Move it now.
Also, improve the ordinal uniq'ing comment. I hadn't realized that the
distinct-but-identical files happen with --reproduce and not in general.
No behavior change.
Differential Revision: https://reviews.llvm.org/D103522
We used to not print dylibs referenced by other dylibs in `-t` mode. This
affected reexports, and with `-flat_namespace` also just dylibs loaded by
dylibs. Now we print them.
Fixes PR49514.
Differential Revision: https://reviews.llvm.org/D103428
In all of these cases, the functions could simply return a nullptr instead of {}.
There is no case where Optional<nullptr> has a special meaning.
Differential Revision: https://reviews.llvm.org/D103489
In some cases, we end up with several distinct DylibFiles that
have the same install name. Only emit a single LC_LOAD_DYLIB in
those cases.
This happens in 3 cases I know of:
1. Some tbd files are symlinks. libpthread.tbd is a symlink against
libSystem.tbd for example, so `-lSystem -lpthread` loads
libSystem.tbd twice. We could (and maybe should) cache loaded
dylibs by realpath() to catch this.
2. Some tbd files are copies of each other. For example,
CFNetwork.framework/CFNetwork.tbd and
CFNetwork.framework/Versions/A/CFNetwork.tbd are two distinct
copies of the same file. The former is found by
`-framework CFNetwork` and the latter by the reexport in
CoreServices.tbd. We could conceivably catch this by
making `-framework` search look in `Versions/Current` instead
of in the root, and/or by using a content hash to cache
tbd files, but that's starting to sound complicated.
3. Magic $ld$ symbol processing can change the install name of
a dylib based on the target platform_version. Here, two
truly distinct dylibs can have the same install name.
So we need this code to deal with (3) anyways. Might as well use
it for 1 and 2, at least for now :)
With this (and D103430), clang-format links in the same dylibs
when linked with lld and ld64.
Differential Revision: https://reviews.llvm.org/D103488
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed
This matches ld64.
Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.
With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)
Differential Revision: https://reviews.llvm.org/D103430
loadDylib() keeps a name->DylibFile cache, but it only writes
to the cache once the DylibFile constructor has completed.
So dylib loads done recursively from the DylibFile constructor
wouldn't use the cache.
Now, we load additional dylibs after writing to the cache,
which means the cache now gets used for dylibs loaded because
they're referenced from other dylibs.
Related to PR49514 and PR50101, but no dramatic behavior change in itself.
(Technically we no longer crash when a tbd file reexports itself,
but that doesn't happen in practice. We now accept it silently instead
of crashing; ld64 has a diag for the reexport cycle.)
Differential Revision: https://reviews.llvm.org/D103423
.s files with `-g` generate __debug_aranges on darwin/arm64 for some
reason, and those lead to `nullptr` symbols. Don't crash on that.
Fixes PR50517.
Differential Revision: https://reviews.llvm.org/D103350
This diff paves the way for {D102964} which adds a new kind of
InputSection.
We previously maintained section ordering implicitly: we created
InputSections as we parsed each file in command-line order, and passed
on this ordering when we created OutputSections and OutputSegments by
iterating over these InputSections. The implicitness of the ordering
made it difficult to refactor the code to e.g. handle a new type of
InputSection. As such, I've codified the ordering explicitly via
`inputOrder` fields. This also allows us to use `sort` instead of
`stable_sort`.
Benchmarking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.23 4.35 4.27 4.274 0.030157481
+ 20 4.24 4.38 4.27 4.2815 0.033759989
No difference proven at 95.0% confidence
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102972
The ELF format has the concept of merge sections (marked by SHF_MERGE),
which contain data that can be safely deduplicated. The Mach-O
equivalents are called literal sections (marked by S_CSTRING_LITERALS or
S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word
'merge', to avoid confusion, I've renamed our MergedOutputSection to
ConcatOutputSection. I believe it's a more descriptive name too.
This renaming sets the stage for {D102964}.
Reviewed By: #lld-macho, alexshap
Differential Revision: https://reviews.llvm.org/D102971
* Move `static_asserts` into cpp instead of header file. I noticed they
had been separated from the main class definition in the header, so I
set about to clean that up, then figured it made more sense as part of
the cpp file so as not to incur unnecessary compile-time overhead.
* Remove unnecessary `virtual`s
* Remove unnecessary comment / reword another comment
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
In LC_DYSYMTAB, private externs were still emitted as exported symbols instead
of as locals.
Fixes PR50373. See bug for details.
Differential Revision: https://reviews.llvm.org/D102662
That way, it's done only once instead of every time shouldExportSymbol() is
called.
Possibly a bit faster:
% ministat at_main at_symtodo
x at_main
+ at_symtodo
N Min Max Median Avg Stddev
x 30 3.9732189 4.114846 4.024621 4.0304692 0.037022865
+ 30 3.93766 4.0510042 3.9973931 3.991469 0.028472565
Difference at 95.0% confidence
-0.0390002 +/- 0.0170714
-0.967635% +/- 0.423559%
(Student's t, pooled s = 0.0330256)
In other runs with n=30 it makes no perf difference, so maybe it's just noise.
But being able to quickly and conveniently answer "is this symbol exported?"
is useful for fixing PR50373 and for implementing -dead_strip, so this seems
like a good change regardless.
No behavior change.
Differential Revision: https://reviews.llvm.org/D102661
This diff changes the type of the argument of isCodeSection to const InputSection *.
NFC.
Test plan: make check-lld-macho
Differential revision: https://reviews.llvm.org/D102664
`match()` can only return for non-empty vectors, but at least in
non-LTO builds that isn't clear to the compiler. Help it out.
This is a minor but measurable speedup on my machine (but less
than what we might've lost in https://reviews.llvm.org/D100818#2764272 --
bot note higher N on this measurement here, so higher confidence here):
% ministat at_main at_branch
x at_main
+ at_branch
N Min Max Median Avg Stddev
x 30 3.9243979 4.0395119 3.987375 3.9826236 0.027567796
+ 30 3.8495831 4.0009291 3.931325 3.9347135 0.037832878
Difference at 95.0% confidence
-0.0479101 +/- 0.0171102
-1.20298% +/- 0.429622%
(Student's t, pooled s = 0.0331007)
No behavior change.
Eventually we should apply these lists at symbol parse time instead of
every time shouldExportSymbol() though :)
Differential Revision: https://reviews.llvm.org/D102655
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.
(Corresponds to symbolTableInAndNeverStrip in ld64.)
Differential Revision: https://reviews.llvm.org/D102619
Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it.
Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`.
FIXME:
* arm64-stubs.s test is broken
* add thunk tests
* Handle thunks to DylibSymbol in MergedOutputSection::finalize()
Differential Revision: https://reviews.llvm.org/D100818
We had a hardcoded check and a stale TODO, written back when we only had
support for one architecture.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102154
In particular, we should apply the `-undefined` behavior to all
such symbols, include those that are specified via the command line
(i.e. `-e`, `-u`, and `-exported_symbol`). ld64 supports this too.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102143
On a section with alignment of 16, subsections aligned to 16-byte
boundaries should keep their 16-byte alignment.
Fixes PR50274. (The same bug could have happened with -order_file
previously.)
Differential Revision: https://reviews.llvm.org/D102139
This would cause us to pull in symbols (and code) that should
be unused.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102137
Symbols explicitly exported via command-line options `--exported_symbol SYM` and `--exported_symbols_list FILE` must be defined. Before this fix, lazy symbols defined in archives would be left to languish. We now force them to be included in the linked output.
Differential Revision: https://reviews.llvm.org/D102100
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.
Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:
N Min Max Median Avg Stddev
x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097
+ 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012
Difference at 95.0% confidence
-0.172162 +/- 0.083847
-4.14165% +/- 2.01709%
(Student's t, pooled s = 0.0892373)
Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)
Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
(that is, catch block unwind information) and Personality Routines
associated with weak functions still not stripped. This is wasteful,
but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
one symbol (eg from .alt_entry) for now
Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
found it a bit more explicit)
- exports
Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).
This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.
I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.
Differential Revision: https://reviews.llvm.org/D102076
ld64 can emit dylibs that support more than one platform (typically macOS and
macCatalyst). This diff allows LLD to read in those dylibs. Note that this is a
super bare-bones implementation -- in particular, I haven't added support for
LLD to emit those multi-platform dylibs, nor have I added a variety of
validation checks that ld64 does. Until we have a use-case for emitting zippered
dylibs, I think this is good enough.
Fixes PR49597.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D101954
Currently the linker causes unnecessary errors when either the target or the config's platform is a simulator.
Differential Revision: https://reviews.llvm.org/D101855