It's in libSystem, so it doesn't bring in any new deps, and it's
currently much faster than LLVM's current SHA256 implementation.
Makes linking (arm64) Chromium Framework with ld64.lld 17% faster.
See also PR56121.
No behavior change.
Differential Revision: https://reviews.llvm.org/D128290
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
If DWARF line information is available, we now show where in the source
the references are coming from:
ld64.lld: error: unreferenced symbol: _foo
>>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
>>> /path/to/bar.o:(symbol _baz+0x4)
Differential Revision: https://reviews.llvm.org/D128184
Patch created by running:
rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'
No behavior change.
Differential Revision: https://reviews.llvm.org/D128140
As an optimization for ld64 sometimes it can be useful to not export any
symbols for top level binaries that don't need any exports, to do this
you can pass `-exported_symbols_list /dev/null`, or new with Xcode 14
(ld64 816) there is a `-no_exported_symbols` flag for the same behavior.
This reproduces this behavior where previously an empty exported symbols
list file would have been ignored.
Differential Revision: https://reviews.llvm.org/D127562
ld64.lld used to print the "undefined symbol" line for each reference to
an undefined symbol previously:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)
Now they are deduplicated:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x0)
>>> referenced by /path/to/bar.o:(symbol _quux+0x1)
As with the other lld ports, only the first 3 references are printed.
Differential Revision: https://reviews.llvm.org/D127753
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o
Now it displays the name of the function that contains the undefined
reference as well:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
Differential Revision: https://reviews.llvm.org/D127696
This commit fixes the issue that getLocation always printed the name of
the first symbol in the section.
For clarity, upper_bound is used instead of a linear search for finding
the closest symbol name. Note that this change does not affect
performance: this function is only called when printing errors and
`symbols` typically contains a single symbol because of
.subsections_via_symbols.
Differential Revision: https://reviews.llvm.org/D127670
This reverts commit 942f4e3a7c.
The additional change required to avoid the assertion errors seen
previously is:
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
/*relocVA=*/0);
isec->data = copy;
}
- } else {
+ } else if (!isEhFrameSection(isec)) {
+ // EH frames are gathered as hashables from unwindEntry above; give a
+ // unique ID to everything else.
isec->icfEqClass[0] = ++icfUniqueID;
}
}
Differential Revision: https://reviews.llvm.org/D123435
Just matter of enabling the config option.
(Also changed the platform of the input test file to macOS, since that's
the default that we specify in the `%lld` substitution. The conflict was
causing errors when linking with LTO.)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D127600
This flag suppresses warnings produced by the linker. In ld64 this has
an interesting interaction with -fatal_warnings, it silences the
warnings but the link still fails. Instead of doing that here we still
print the warning and eagerly fail the link in case both are passed,
this seems more reasonable so users can understand why the link fails.
Differential Revision: https://reviews.llvm.org/D127564
For arm64, llvm-mc emits relocations for the target function
address like so:
ltmp:
<CIE start>
...
<CIE end>
... multiple FDEs ...
<FDE start>
<target function address - (ltmp + pcrel offset)>
...
If any of the FDEs in `multiple FDEs` get dead-stripped, then `FDE start`
will move to an earlier address, and `ltmp + pcrel offset` will no longer
reflect an accurate pcrel value. To avoid this problem, we "canonicalize"
our relocation by adding an `EH_Frame` symbol at `FDE start`, and updating
the reloc to be `target function address - (EH_Frame + new pcrel offset)`.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D124561
== Background ==
`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.
== Caveats ==
It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.
**Known limitations of the current code:**
* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
relocations as described in 618def651b.
`llvm-mc` emits regular relocations for ARM EH frames, which we do not
yet handle correctly.
Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:
base diff difference (95% CI)
sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%]
user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%]
wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%]
samples 30 31
== Design ==
Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)
In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.
== Next Steps ==
In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.
== Misc ==
The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123435
This reduces linking time by ~8% for my project (1.19s -> 0.53s for
writeSections()). writeTo is const, which bodes well for it being
parallelizable, and I've looked through the different overridden versions and
can't see any race conditions. It produces the same byte-for-byte output for my
project.
Differential Revision: https://reviews.llvm.org/D126800
This reduces the time emitStabs() takes by about 275ms, or 3% of overall
linking time for the project I'm on. Although the parent function is run in
parallel, it's one of the slowest tasks in that concurrent batch (I have
another optimization for another slow task as well).
Differential Revision: https://reviews.llvm.org/D126785
The <internal> symbol was tripping an assertion in getVA() because it
was not marked as used. Per the comment above that symbols creation,
dead stripping has already occurred so marking this symbol as used is
accurate.
Fixes https://github.com/llvm/llvm-project/issues/55565
Differential revision: https://reviews.llvm.org/D126072
With -platform_version flags for two distinct platforms,
this writes a LC_BUILD_VERSION header for each.
The motivation is that this is needed for self-hosting with lld as linker
after D124059.
To create a zippered output at the clang driver level, pass
-target arm64-apple-macos -darwin-target-variant arm64-apple-ios-macabi
to create a zippered dylib.
(In Xcode's clang, `-darwin-target-variant` is spelled just `-target-variant`.)
(If you pass `-target arm64-apple-ios-macabi -target-variant arm64-apple-macos`
instead, ld64 crashes!)
This results in two -platform_version flags being passed to the linker.
ld64 also verifies that the iOS SDK version is at least 13.1. We don't do that
yet. But ld64 also does that for other platforms and we don't. So we need to
do that at some point, but not in this patch.
Only dylib and bundle outputs can be zippered.
I verified that a Catalyst app linked against a dylib created with
clang -shared foo.cc -o libfoo.dylib \
-target arm64-apple-macos \
-target-variant arm64-apple-ios-macabi \
-Wl,-install_name,@rpath/libfoo.dylib \
-fuse-ld=$PWD/out/gn/bin/ld64.lld
runs successfully. (The app calls a function `f()` in libfoo.dylib
that returns a const char* "foo", and NSLog(@"%s")s it.)
ld64 is a bit more permissive when writing zippered outputs,
see references to "unzippered twins". That's not implemented yet.
(If anybody wants to implement that, D124275 is a good start.)
Differential Revision: https://reviews.llvm.org/D124887
This change implements --icf=safe for MachO based on addrsig section that is implemented in D123751.
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D123752
Before this,
clang empty.cc -target x86_64-apple-ios13.1-macabi \
-framework CoreServices -fuse-ld=lld
would error out with
ld64.lld: error: path/to/MacOSX.sdk/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore.tbd(
/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore) is incompatible with x86_64 (macCatalyst)
Now it works, like with ld64.
Differential Revision: https://reviews.llvm.org/D124336
It seems like we are overly asserting when running `-dead_strip` with
exported symbols. ld64 treats exported private extern symbols as a liveness
root. Loosen the assert to match ld64's behavior.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D124143
Previously, when encountering a symbol reloc located in a literal section, we
would look up the contents of the literal at the `symbol value + addend` offset
within the literal section. However, it seems that this offset is not guaranteed
to be valid. Instead, we should use just the symbol value to retrieve the
literal's contents, and compare the addend values separately. ld64 seems to do
this.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D124223
Previously, we stored a pointer from the ObjFile to its compact unwind
section in order to avoid iterating over the file's sections a second
time. However, given the small number of sections (not subsections) per
file, this caching was really quite unnecessary. We will soon do lookups
for more sections (such as the `__eh_frame` section), so let's simplify
the code first.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123434
A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually
one each for "normal" macOS and one for macCatalyst.
These are usually created by passing something like
-shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi
to clang, which turns it into
-platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4
for the linker.
ld64.lld can read these files fine, but it can't write them. Before this
change, it would just silently use the last -platform_version flag and ignore
the rest.
This change adds a warning that writing zippered dylibs isn't implemented yet
instead.
Sadly, parts of ld64.lld's test suite relied on the previous
"silently use last flag" semantics for its test suite: `%lld` always expanded
to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a
different value passed a 2nd `-platform_version` flag later on. But this now
produces a warning if the platform passed to `-platform_version` is not `macos`.
There weren't very many cases of this, so move these to use `%no-arg-lld` and
manually pass `-arch`.
Differential Revision: https://reviews.llvm.org/D124106
Follow-on to {D123276}. Now that we work with an internal
representation of compact unwind entries, we no longer need to template
our UnwindInfoSectionImpl code based on the pointer size of the target
architecture.
I've still kept the split between `UnwindInfoSectionImpl` and
`UnwindInfoSection`. I'd introduced that split in order to do type
erasure, but I think it's still useful to have in order to keep
`UnwindInfoSection`'s definition in the header file clean.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123277
{D123302} got me looking deeper at `includeInSymtab`. I thought it was a
little odd that there were excluded (live) symbols for which
`includeInSymtab` was false; we shouldn't have so many different ways to
exclude a symbol. As such, this diff makes the `L`-prefixed-symbol
exclusion code use `includeInSymtab` too. (Note that as part of our
support for `__eh_frame`, we will also be excluding all `__eh_frame`
symbols from the symtab in a future diff.)
Another thing I noticed is that the `emitStabs` code never has to deal
with excluded symbols because `SymtabSection::finalize()` already
filters them out. As such, I've updated the comments and asserts from
{D123302} to reflect this.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D123433
The previous implementation of UnwindInfoSection materialized
all the compact unwind entries & applied their relocations, then parsed
the resulting data to generate the final unwind info. This design had
some unfortunate conseqeuences: since relocations can only be applied
after their referents have had addresses assigned, operations that need
to happen before address assignment must contort themselves. (See
{D113582} and observe how this diff greatly simplifies it.)
Moreover, it made synthesizing new compact unwind entries awkward.
Handling PR50956 will require us to do this synthesis, and is the main
motivation behind this diff.
Previously, instead of generating a new CompactUnwindEntry directly, we
would have had to generate a ConcatInputSection with a number of
`Reloc`s that would then get "flattened" into a CompactUnwindEntry.
This diff introduces an internal representation of `CompactUnwindEntry`
(the former `CompactUnwindEntry` has been renamed to
`CompactUnwindLayout`). The new CompactUnwindEntry stores references to
its personality symbol and LSDA section directly, without the use of
`Reloc` structs.
In addition to being easier to work with, this diff also allows us to
handle unwind info whose personality symbols are located in sections
placed after the `__unwind_info`.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123276
Using a portable format specifier avoids a "format specifies type
'unsigned long long' but the argument has type 'uint64_t' (aka 'unsigned
long') [-Werror,-Wformat]" error depending on the exact definition of
`uint64_t`.
This diff is motivated by my work to add proper DWARF unwind support. As
detailed in PR50956 functions that need DWARF unwind need to have
compact unwind entries synthesized for them. These CU entries encode an
offset within `__eh_frame` that points to the corresponding DWARF FDE.
In order to encode this offset during
`UnwindInfoSectionImpl::finalize()`, we need to first assign values to
`InputSection::outSecOff` for each `__eh_frame` subsection. But
`__eh_frame` is ordered after `__unwind_info` (according to ld64 at
least), which puts us in a bit of a bind: `outSecOff` gets assigned
during finalization, but `__eh_frame` is being finalized after
`__unwind_info`.
But it occurred to me that there's no real need for most
ConcatOutputSections to be finalized sequentially. It's only necessary
for text-containing ConcatOutputSections that may contain branch relocs
which may need thunks. ConcatOutputSections containing other types of
data can be finalized in any order.
This diff moves the finalization logic for non-text sections into a
separate `finalizeContents()` method. This method is called before
section address assignment & unwind info finalization takes place. In
theory we could call these `finalizeContents()` methods in parallel, but
in practice it seems to be faster to do it all on the main thread.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123279
I was wondering if SymtabSection::emitStabs() should check
defined->includeInSymtab. Add asserts and comments explaining why that's not
necessary.
No behavior change.
Differential Revision: https://reviews.llvm.org/D123302
{D118797} means that we can now check the name/segname of a given
section directly, instead of having to look those properties up on one
of its subsections. This allows us to simplify our code.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123275
Our compact unwind handling code currently has some logic to locate a
symbol at a given offset in an InputSection. The EH frame code will need
to do something similar, so let's factor out the code.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D123301
This matches ld64, and makes dsymutil work better with lld's output.
Fixes PR54783, see there for details.
Reduces time needed to run dsymutil on Chromium Framework from 8m30s
(which is already down from 26 min with D123218) to 6m30s and removes
many lines of "could not find object file symbol for symbol" from dsymutil output
(previously: several MB of those messages, now dsymutil is completely silent).
Differential Revision: https://reviews.llvm.org/D123252
This removes options for performing LTO with the legacy pass
manager in LLD. Options that explicitly enable the new pass manager
are retained as no-ops.
Differential Revision: https://reviews.llvm.org/D123219
Or rather, error out if it is set to something other than ON. This
removes the ability to enable the legacy pass manager by default,
but does not remove the ability to explicitly enable it through
various flags like -flegacy-pass-manager or -enable-new-pm=0.
I checked, and our test suite definitely doesn't pass with
LLVM_ENABLE_NEW_PASS_MANAGER=OFF anymore.
Differential Revision: https://reviews.llvm.org/D123126
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:
* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`
As part of this patch also:
* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.
Differential Revision: https://reviews.llvm.org/D123100
When two local symbols (think: file-scope static functions, or functions in
unnamed namespaces) with the same name in two different translation units
both needed thunks, ld64.lld previously created external thunks for both
of them. These thunks ended up with the same name, leading to a duplicate
symbol error for the thunk symbols.
Instead, give thunks for local symbols local visibility.
(Hitting this requires a jump to a local symbol from over 128 MiB away.
It's unlikely that a single .o file is 128 MiB large, but with ICF
you can end up with a situation where the local symbol is ICF'd with
a symbol in a separate translation unit. And that can introduce a
large enough jump to require a thunk.)
Fixes PR54599.
Differential Revision: https://reviews.llvm.org/D122624
`config->priorities` has been used to hold the intermediate state during the construction of the order in which sections should be laid out. This is not a good place to hold this state since the intermediate state is not a "configuration" for LLD. It should be encapsulated in a class for building a mapping from section to priority (which I created in this diff as the `PriorityBuilder` class).
The same thing is being done for `config->callGraphProfile`.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D122156
Update DataInCode's calculation of `endAddr` to use `getSize()` instead
of `getFileSize()` -- while in practice they're the same for
non-zerofill sections (which code sections are), we still should treat
address sizes / offsets as distinct from file sizes / offsets.
Since Mach-O has a two-level namespace (unlike ELF), we can usually set
this property to true.
(I believe this setting is only available in the new LTO backend, so I
can't really use ld64 / libLTO's behavior as a reference here... I'm
just doing what I think is correct.)
See {D119294} for the work done to calculate the `interposable` used in
this diff.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D119506
All references to interposable symbols can be redirected at runtime to
point to a different symbol definition (with the same name). For
example, if both dylib A and B define symbol _foo, and we load A before
B at runtime, then all references to _foo within dylib B will point to
the definition in dylib A.
ld64 makes all extern symbols interposable when linking with
`-flat_namespace`.
TODO 1: Support `-interposable` and `-interposable_list`, which should
just be a matter of parsing those CLI flags and setting the
`Defined::interposable` bit.
TODO 2: Set Reloc::FinalDefinitionInLinkageUnit correctly with this info
(we are currently not setting it at all, so we're erring on the
conservative side, but we should help the LTO backend generate more
optimal code.)
Reviewed By: modimo, MaskRay
Differential Revision: https://reviews.llvm.org/D119294
Previously, we only allowed this for DylibSymbols. However, in order to
properly support `-flat_namespace` as well as `-interposable`, we need
to allow this for Defined symbols too. Therefore we hoist the
`lazyBindOffset` and the `stubsHelperIndex` into the parent Symbol
class.
The actual change to support interposition under `-flat_namespace` is in
{D119294}; the NFC changes here have been split out for easier review.
Perf regression isn't stat sig on my 3.2 GHz 16-Core Intel Xeon W linking
chromium_framework:
base diff difference (95% CI)
sys_time 1.227 ± 0.021 1.234 ± 0.031 [ -0.3% .. +1.5%]
user_time 3.665 ± 0.036 3.674 ± 0.035 [ -0.2% .. +0.7%]
wall_time 4.596 ± 0.055 4.609 ± 0.064 [ -0.3% .. +0.9%]
samples 34 47
Max RSS regression is barely stat sig:
base diff difference (95% CI)
time 1003664356.324 ± 15404053.912 1010380403.613 ± 10578309.455 [ +0.0% .. +1.3%]
samples 37 31
Reviewed By: modimo
Differential Revision: https://reviews.llvm.org/D121351
This reverts commit e049a87f04.
That commit breaks the build with errors of the form:
/usr/local/google/home/saugustine/llvm/llvm-project/lld/MachO/ExportTrie.cpp:148:11: error: definition of implicitly declared destructor
TrieNode::~TrieNode() {
The code can be used in multi-threads and the allocator is not thread safe.
fixes PR/54378
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D121638
Previously, we aligned every cstring to 16 bytes as a temporary hack to
deal with https://github.com/llvm/llvm-project/issues/50135. However, it
was highly wasteful in terms of binary size.
To recap, in contrast to ELF, which puts strings that need different
alignments into different sections, `clang`'s Mach-O backend puts them
all in one section. Strings that need to be aligned have the .p2align
directive emitted before them, which simply translates into zero padding
in the object file. In other words, we have to infer the alignment of
the cstrings from their addresses.
We differ slightly from ld64 in how we've chosen to align these
cstrings. Both LLD and ld64 preserve the number of trailing zeros in
each cstring's address in the input object files. When deduplicating
identical cstrings, both linkers pick the cstring whose address has more
trailing zeros, and preserve the alignment of that address in the final
binary. However, ld64 goes a step further and also preserves the offset
of the cstring from the last section-aligned address. I.e. if a cstring
is at offset 18 in the input, with a section alignment of 16, then both
LLD and ld64 will ensure the final address is 2-byte aligned (since
`18 == 16 + 2`). But ld64 will also ensure that the final address is of
the form 16 * k + 2 for some k (which implies 2-byte alignment).
Note that ld64's heuristic means that a dedup'ed cstring's final address is
dependent on the order of the input object files. E.g. if in addition to the
cstring at offset 18 above, we have a duplicate one in another file with a
`.cstring` section alignment of 2 and an offset of zero, then ld64 will pick
the cstring from the object file earlier on the command line (since both have
the same number of trailing zeros in their address). So the final cstring may
either be at some address `16 * k + 2` or at some address `2 * k`.
I've opted not to follow this behavior primarily for implementation
simplicity, and secondarily to save a few more bytes. It's not clear to me
that preserving the section alignment + offset is ever necessary, and there
are many cases that are clearly redundant. In particular, if an x86_64 object
file contains some strings that are accessed via SIMD instructions, then the
.cstring section in the object file will be 16-byte-aligned (since SIMD
requires its operand addresses to be 16-byte aligned). However, there will
typically also be other cstrings in the same file that aren't used via SIMD
and don't need this alignment. They will be emitted at some arbitrary address
`A`, but ld64 will treat them as being 16-byte aligned with an offset of
`16 % A`.
I have verified that the two repros in https://github.com/llvm/llvm-project/issues/50135
work well with the new alignment behavior.
Fixes https://github.com/llvm/llvm-project/issues/54036.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D121342
ld64 breaks down `__objc_classrefs` on a per-word level and deduplicates
them. This greatly reduces the number of bind entries emitted (and
therefore the amount of work `dyld` has to do at runtime). For
chromium_framework, this change to LLD cuts the number of (non-lazy)
binds from 912 to 190, getting us to parity with ld64 in this aspect.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D121053
`__cfstring` has embedded addends that foil ICF's hashing / equality
checks. (We can ignore embedded addends when doing ICF because the same
information gets recorded in our Reloc structs.) Therefore, in order to
properly dedup CFStrings, we create a mutable copy of the CFString and
zero out the embedded addends before performing any hashing / equality
checks.
(We did in fact have a partial implementation of CFString deduplication
already. However, it only worked when the cstrings they point to are at
identical offsets in their object files.)
I anticipate this approach can be extended to other similar
statically-allocated struct sections in the future.
In addition, we previously treated all references with differing addends
as unequal. This is not true when the references are to literals:
different addends may point to the same literal in the output binary. In
particular, `__cfstring` has such references to `__cstring`. I've
adjusted ICF's `equalsConstant` logic accordingly, and I've added a few
more tests to make sure the addend-comparison code path is adequately
covered.
Fixes https://github.com/llvm/llvm-project/issues/51281.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D120137
... from a `uint64_t` to a `uint32_t`. (LLD-ELF uses a `uint32_t` too.)
About a 1.7% reduction in peak RSS when linking chromium_framework on my
3.2 GHz 16-Core Intel Xeon W Mac Pro, and no stat sig change in wall
time.
</Users/jezng/test2.sh ["before"]> </Users/jezng/test2.sh ["after"]> difference (95% CI)
RSS 1003036672.000 ± 9891065.259 985539505.231 ± 10272748.749 [ -2.3% .. -1.2%]
samples 27 26
base diff difference (95% CI)
sys_time 1.277 ± 0.023 1.277 ± 0.024 [ -0.9% .. +0.9%]
user_time 6.682 ± 0.046 6.598 ± 0.043 [ -1.6% .. -0.9%]
wall_time 5.904 ± 0.062 5.895 ± 0.063 [ -0.7% .. +0.4%]
samples 46 28
No appreciable change (~0.01%) in number of `equals` comparisons either:
Before:
ld64.lld: ICF needed 8 iterations
ld64.lld: equalsConstant() called 701643 times
ld64.lld: equalsVariable() called 3438526 times
After:
ld64.lld: ICF needed 8 iterations
ld64.lld: equalsConstant() called 701729 times
ld64.lld: equalsVariable() called 3438526 times
Reviewed By: #lld-macho, MaskRay, thakis
Differential Revision: https://reviews.llvm.org/D121052
The existing hashing of stubsHelperIndex has mostly been a no-op* for
some time now (ever since we made ICF run before dylib symbols get their
stubs indices assigned). I guess we could consider hashing the name +
filename of the DylibSymbol instead, but I'm not sure the overhead's
worth it... moreover, LLD/ELF only hashes their Defined symbols as well.
*: Technically it does change the hash value since stubsHelperIndex is
initialized to `UINT32_MAX` by default. But since all stubsHelperIndex
values are the same at when ICF runs, they don't add any useful
information to the hash.
This is debug code that is disabled by default. It'll provide a easy way
to figure out the impact (if any) of tweaking ICF's hashing algorithm
(since a poor quality hash will result in many more `equals*` calls).
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D121051
This was based off @thakis' draft in {D103517}. I employed templates to ensure
the support for `-why_live` wouldn't slow down the regular non-why-live code
path.
No stat sig perf difference on my 3.2 GHz 16-Core Intel Xeon W:
base diff difference (95% CI)
sys_time 1.195 ± 0.015 1.199 ± 0.022 [ -0.4% .. +1.0%]
user_time 3.716 ± 0.022 3.701 ± 0.025 [ -0.7% .. -0.1%]
wall_time 4.606 ± 0.034 4.597 ± 0.046 [ -0.6% .. +0.2%]
samples 44 37
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120377
This mirrors the code structure in `lld/ELF`. It also paves the way for
an upcoming diff where I templatize things.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120376
Symbols for which `canBeOmittedFromSymbolTable()` is true should be
treated as private externs. This diff tries to do that by unsetting the
ExportDynamic bit. It seems to mostly work with the FullLTO backend, but
with the ThinLTO backend, the `local_unnamed_addr` symbols still fail to
be properly hidden. Nonetheless, this is a step in the right direction.
I've documented all the remaining differences between our behavior and
LD64's in the lto-internalized-unnamed-addr.ll test.
See also https://discourse.llvm.org/t/mach-o-lto-handling-of-linkonce-odr-unnamed-addr/60015
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D119767
If both an order file and a call graph profile are present, the edges of the
call graph which use symbols present in the order file are not used. All of
the symbols in the order file will appear at the beginning of the section just
as they do currently. In other words, the highest priority derived from the
call graph will be below the lowest priority derived from the order file.
Practically, this change renames CallGraphSort.{h,cpp} to SectionPriorities.{h,cpp},
and most order file and call graph profile related code is moved into the new
file to reduce duplication.
Differential Revision: https://reviews.llvm.org/D117354
Main motivation: including `llvm/CodeGen/CommandFlags.h` in
`CommonLinkerContext.h` means that the declaration of `llvm::Reloc` is
visible in any file that includes `CommonLinkerContext.h`. Since our
cpp files have both `using namespace llvm` and `using namespace
lld::macho`, this results in conflicts with `lld::macho::Reloc`.
I suppose we could put `llvm::Reloc` into a nested namespace, but in general,
I think we should avoid transitively including too many header files in
a very widely used header like `CommonLinkerContext.h`.
RegisterCodeGenFlags' ctor initializes a bunch of function-`static`
structures and does nothing else, so it should be fine to "initialize"
it as a temporary stack variable rather than as a file static.
Reviewed By: aganea
Differential Revision: https://reviews.llvm.org/D119913
`parseSections()` is a getting a bit large unwieldy, let's factor out
logic where we can.
Other minor changes in this diff:
* `"__cg_profile"` is now a global constexpr
* We now use `checkError()` instead of `fatal()`-ing without handling
the Error
* Check for `callGraphProfileSort` before checking the section name,
since the boolean comparison is likely cheaper
Reviewed By: #lld-macho, lgrey, oontvoo
Differential Revision: https://reviews.llvm.org/D119892
By unsetting this property, we are now able to internalize more symbols
during LTO. I compared the output of `-save-temps` for both LLD and
ld64, and we now match ld64's behavior as far as `lto-internalize.ll` is
concerned.
(Thanks @smeenai for working on an initial version of this diff!)
Fixes https://github.com/llvm/llvm-project/issues/50574.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D119372
This makes it easier to pinpoint the source of the problem.
TODO: Have more relocation error messages make use of this
functionality.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118798
Adds `-pagezero_size`. `-pagezero_size` commonly used for kernel development.
`-pagezero_size` changes the `__PAGEZERO` size, removing that segment if it is set to zero.
One of the four flags from {D118570}
Now with error messages and tests.
Differential Revision: https://reviews.llvm.org/D118724
Xcode 13 comes with a mismatched platform in libcompiler_rt.dylib,
so this creates a linker error on mac catalyst.
Fix it by adding it to the skip list.
Reviewed By: MaskRay, #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D117925
Earlier in LLD's evolution, I tried to create the illusion that
subsections were indistinguishable from "top-level" sections. Thus, even
though the subsections shared many common field values, I hid those
common values away in a private Shared struct (see D105305). More
recently, however, @gkm added a public `Section` struct in D113241 that
served as an explicit way to store values that are common to an entire
set of subsections (aka InputSections). Now that we have another "common
value" struct, `Shared` has been rendered redundant. All its fields can
be moved into `Section` instead, and the pointer to `Shared` can be replaced
with a pointer to `Section`.
This `Section` pointer also has the advantage of letting us inspect other
subsections easily, simplifying the implementation of {D118798}.
P.S. I do think that having both `Section` and `InputSection` makes for
a slightly confusing naming scheme. I considered renaming `InputSection`
to `Subsection`, but that would break the symmetry with `OutputSection`.
It would also make us deviate from LLD-ELF's naming scheme.
This change is perf-neutral on my 3.2 GHz 16-Core Intel Xeon W machine:
base diff difference (95% CI)
sys_time 1.258 ± 0.031 1.248 ± 0.023 [ -1.6% .. +0.1%]
user_time 3.659 ± 0.047 3.658 ± 0.041 [ -0.5% .. +0.4%]
wall_time 4.640 ± 0.085 4.625 ± 0.063 [ -1.0% .. +0.3%]
samples 49 61
There's also no stat sig change in RSS (as measured by `time -l`):
base diff difference (95% CI)
time 998038627.097 ± 13567305.958 1003327715.556 ± 15210451.236 [ -0.2% .. +1.2%]
samples 31 36
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118797
In the case your framework bundles contain relocatable objects, and your
objects include LC_LINKER_OPTIONs for the framework, previously they
would not be deduplicated like they would have if they were static
archives. This was also the case if you passed `-framework` for the
framework as well.
Reviewed By: #lld-macho, thakis, oontvoo
Differential Revision: https://reviews.llvm.org/D114841
Added some comments (particularly around finalize() and
finalizeContents()) as well as doing some rephrasing / grammar fixes for
existing comments.
Also did some minor style fixups, such as by putting methods together in
a class definition and having fields of similar types next to each
other.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118714
This is a ld64 option equivalent to `-sectcreate seg sect /dev/null`
that's useful for creating sections like the RESTRICT section.
Differential Revision: https://reviews.llvm.org/D117749
This is in preparation for moving the code that parses and processes
order files into this file.
See https://reviews.llvm.org/D117354 for context and discussion.
If you're building this on macOS 12.x+ this produces a deprecation
warning. I'm not sure what this means for the bitcode format going
forward, but it seems safe to silence for now.
Do we need to worry about GCC for this?
Differential Revision: https://reviews.llvm.org/D117718
This flag is the default, so in ld64 it is not implemented, but it can
be useful to negate previous -all_load arguments. Specifically if your
build system has some global linker flags, that you may want to negate
for specific links. We use something like this today to make sure some
C++ symbols are automatically discovered for all links, which passing
-all_load hides.
Differential Revision: https://reviews.llvm.org/D117629
In ld.lld, when an ObjFile/BitcodeFile is read in --start-lib state, the file is
given archive semantics. --end-lib closes the previous --start-lib. A build
system can use this feature as an alternative to archives. This patch ports
the feature to lld-macho.
--start-lib and --end-lib are positional, unlike usual ld64 options.
I think the slight drawback does not matter as (a) reusing option names
make build systems convenient (b) `--start-lib a.o b.o --end-lib` conveys more
information than an alternative design: `-objlib a.o -objlib b.o` because
--start-lib makes it clear which objects are in the same conceptual archive.
This provides flexibility (c) `-objlib`/`-filelist` interaction may be weird.
Close https://github.com/llvm/llvm-project/issues/52931
Reviewed By: #lld-macho, Jez Ng, oontvoo
Differential Revision: https://reviews.llvm.org/D116913
It's still uncertain but whether we want to have `deduplicate-literals` be the
default flag for LLD out of the box or not. If `deduplicate-literals` is the default
behavior, then we will need a way override it and not deduplicate. Luckily, we
have `no_deduplicate` to fill this gap. For now, I've set the default to be false
which aligns with the existing behavior. That can only always be changed after
discussions on D117250.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D117387
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
Tail merge is slow and of low value. With regular string deduplication, we can
just use the return value of StringTableBuilder::add.
There is no noticeable performance increase because without deduplication
`__cstring` is quite small (7.6MiB for chromium_framework).
Reviewed By: #lld-macho, Jez Ng
Differential Revision: https://reviews.llvm.org/D117273
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.
The majority of the changes were in LLD and TextAPI.
Reviewed By: cishida
Differential Revision: https://reviews.llvm.org/D117163
Depends on D112160
This adds the new options `--call-graph-profile-sort` (default),
`--no-call-graph-profile-sort` and `--print-symbol-order=`. If call graph
profile sorting is enabled, reads `__LLVM,__cg_profile` sections from object
files and uses the resulting graph to put callees and callers close to each
other in the final binary via the C3 clustering heuristic.
Differential Revision: https://reviews.llvm.org/D112164
After {D115416}, the "Write map file" event no longer shows up
in the time trace. Each time trace profiler instance is thread-local,
but we had neglected to initialize a separate instance for the mapfile
worker thread.
Reviewed By: keith
Differential Revision: https://reviews.llvm.org/D117069
D116913 will add LazyObject. Rename LazySymbol to LazyArchive to avoid confusion
and mirror ELF.
Reviewed By: #lld-macho, Jez Ng
Differential Revision: https://reviews.llvm.org/D116914
One of our internal arm64 apps hit a thunk out of range error when building
with LLD. Per the comment, I'm arbitrarily increasing slop size to 256.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D116705
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.
Reviewed By: MaskRay, #lld-macho
Differential Revision: https://reviews.llvm.org/D116279
This reverts commit e60d6dfd5a.
clang-ppc64le-rhel buildbot failed (https://lab.llvm.org/buildbot#builders/57/builds/13424):
tools/lld/MachO/CMakeFiles/lldMachO.dir/Symbols.cpp.o: In function `lld::demangle(llvm::StringRef, bool)':
Symbols.cpp:(.text._ZN3lld8demangleEN4llvm9StringRefEb[_ZN3lld8demangleEN4llvm9StringRefEb]+0x90): undefined reference to `llvm::demangle(std::string const&)'
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D116279
References from thread-local variable sections are treated as offsets
relative to the start of the thread-local data memory area, which is
initialized via copying all the TLV data sections (which are all
contiguous). If later data sections require a greater alignment than
earlier ones, the offsets of data within those sections won't be
guaranteed to aligned unless we normalize alignments. We therefore use
the largest alignment for all TLV data sections.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D116263
Fixes#52778.
Probably fixes Chromium crashing on startup on macOS 10.15 (and older) systems
when building with LTO, but I haven't verified that yet.
Differential Revision: https://reviews.llvm.org/D115949
For large applications that write to map files, writing map files can take quite
a bit of time. Sorting the biggest contributors to link times, writing map files
ranks in at 2nd place, with load input files being the biggest contributor of
link times. Avoiding writing map files on the critical path (and having its own
thread) saves ~2-3 seconds when linking chromium framework on a 16-Core
Intel Xeon W.
```
base diff difference (95% CI)
sys_time 1.617 ± 0.034 1.657 ± 0.026 [ +1.5% .. +3.5%]
user_time 28.536 ± 0.245 28.609 ± 0.180 [ -0.1% .. +0.7%]
wall_time 23.833 ± 0.271 21.684 ± 0.194 [ -9.5% .. -8.5%]
samples 31 24
```
Reviewed By: #lld-macho, oontvoo, int3
Differential Revision: https://reviews.llvm.org/D115416
1. After D113241, we have the section address easily accessible and no
longer need to iterate across the LC_SEGMENT commands to emit
LC_DATA_IN_CODE.
2. There's no need to store a pointer to the data in code entries during
the parse step; we can just look it up as part of the output step.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D115556
... only whether they have more than zero. This simplifies the code slightly.
I've also moved the field into the ConcatInputSection subclass since it doesn't
actually get used by the other InputSections.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D115539
We were fetching archive symbols too eagerly, bloating binary size as well as
just screwing up binaries that expected to look up certain symbols only at
runtime.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D115092
std::vector can have different sizes depending on the STL's debug level,
so account for its size separately. (You could argue that we should be
accounting for all the other members separately as well, but that would
be very unergonomic, and std::vector is the only one that's caused
problems so far.)
Follup-up to D107533, where we replaced local syms with non-local.
It doesn't make sense to replace local symbol with lazy.
Differential Revision: https://reviews.llvm.org/D110040
Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.
Differential Revision: https://reviews.llvm.org/D114195
ld64 doesn't warn on builds using `-install_name` if it's a bundle. But, the
current warning is nice to have because `install_name` only works with dylib.
To prevent an overflow of warnings in build logs and have parity with ld64,
create a `--warn-dylib-install-name` and `--warn-no-dylib-install-name` flag
that enables this LLD specific warning.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D113534
In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here.
Differential Revision: https://reviews.llvm.org/D114017
As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit
unfortunate to move the asserts away from the structs whose sizes
they're checking, but it's a far better developer experience when one of
the asserts is violated, because you get a single error instead of every
single source file including the header erroring out.
The `r_address` field of `relocation_info` is only 4 bytes, so our
offset field (which is the `r_address` field adjusted for subsection
splitting) also only needs to be 4 bytes. This reduces the structure
size from 32 bytes to 24 bytes.
Combined with https://reviews.llvm.org/D113813, this is a minor perf
improvement for linking an internal app, tested on two machines:
```
smol-relocs baseline difference (95% CI)
sys_time 7.367 ± 0.138 7.543 ± 0.157 [ +0.9% .. +3.8%]
user_time 21.843 ± 0.351 21.861 ± 0.450 [ -1.3% .. +1.4%]
wall_time 20.301 ± 0.307 20.556 ± 0.324 [ +0.1% .. +2.4%]
samples 16 16
smol-relocs baseline difference (95% CI)
sys_time 2.923 ± 0.050 2.992 ± 0.018 [ +1.4% .. +3.4%]
user_time 10.345 ± 0.039 10.448 ± 0.023 [ +0.8% .. +1.2%]
wall_time 12.068 ± 0.071 12.229 ± 0.021 [ +1.0% .. +1.7%]
samples 15 12
```
More importantly though, this change by itself reduces our maximum
resident set size by 220 MB (2.75%, from 7.85 GB to 7.64 GB) on the
first machine. On the second machine, it reduces it by 125 MB (1.94%,
from 6.31 GB to 6.19 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113818
We can lay out Symbol more optimally to reduce its size from 56 bytes to
48 bytes by eliminating unnecessary padding, and we can lay out Defined
such that its bitfield members are placed in the tail padding of Symbol
(on ABIs which support this), to reduce it from 96 bytes to 80 bytes (8
bytes from the Symbol reduction, and 8 bytes from the tail padding
reuse).
This is perf-neutral for an internal app (results from two different
machines):
```
smol-syms baseline difference (95% CI)
sys_time 7.430 ± 0.202 7.440 ± 0.193 [ -2.6% .. +2.9%]
user_time 21.443 ± 0.513 21.206 ± 0.396 [ -3.3% .. +1.1%]
wall_time 20.453 ± 0.534 20.222 ± 0.488 [ -3.7% .. +1.5%]
samples 9 8
smol-syms baseline difference (95% CI)
sys_time 3.011 ± 0.050 3.040 ± 0.052 [ -0.4% .. +2.3%]
user_time 10.416 ± 0.075 10.496 ± 0.091 [ +0.1% .. +1.4%]
wall_time 12.229 ± 0.144 12.354 ± 0.192 [ -0.1% .. +2.1%]
samples 14 13
```
However, on the first machine, it reduces maximum resident set size by
65.9 MB (0.8%, from 7.92 GB to 7.85 GB). On the second machine, it
reduces it by 92 MB (1.4%, from 6.40 GB to 6.31 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113813
It was checking for 64-bit builds incorrectly. Unfortunately,
ConcatInputSection has grown a bit in the meantime, and I don't see any
obvious way to shrink it. Perhaps icfEqClass could use 32-bit hashes
instead of 64-bit ones, but xxHash64 is supposed to be much faster than
xxHash32 (https://github.com/Cyan4973/xxHash#benchmarks), so that sounds
like a loss. (Unrelatedly, we should really look at using XXH3 instead
of xxHash64 now.)
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113809
This is an NFC diff that prepares for pruning & relocating `__eh_frame`.
Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.
Differential Revision: https://reviews.llvm.org/D113241
```
/Users/ksmiley/dev/llvm-project/lld/MachO/Symbols.cpp:43:27: warning: field 'external' will be initialized after field 'weakDefCanBeHidden' [-Wreorder-ctor]
weakDef(isWeakDef), external(isExternal),
^
1 warning generated.
```
Differential Revision: https://reviews.llvm.org/D113823