Combined with the previous change, lld executable is ~2K smaller and some code
paths using InputSection::getParent are more efficient.
The fragmented headers lead to a design limitation that OutputSection has to be
incomplete, so we cannot use static_cast.
Add an OutputDesc class inheriting from SectionCommand. An OutputDesc wraps an
OutputSection. This change allows InputSection::getParent to be inlined.
Differential Revision: https://reviews.llvm.org/D120650
ld64 breaks down `__objc_classrefs` on a per-word level and deduplicates
them. This greatly reduces the number of bind entries emitted (and
therefore the amount of work `dyld` has to do at runtime). For
chromium_framework, this change to LLD cuts the number of (non-lazy)
binds from 912 to 190, getting us to parity with ld64 in this aspect.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D121053
`__cfstring` has embedded addends that foil ICF's hashing / equality
checks. (We can ignore embedded addends when doing ICF because the same
information gets recorded in our Reloc structs.) Therefore, in order to
properly dedup CFStrings, we create a mutable copy of the CFString and
zero out the embedded addends before performing any hashing / equality
checks.
(We did in fact have a partial implementation of CFString deduplication
already. However, it only worked when the cstrings they point to are at
identical offsets in their object files.)
I anticipate this approach can be extended to other similar
statically-allocated struct sections in the future.
In addition, we previously treated all references with differing addends
as unequal. This is not true when the references are to literals:
different addends may point to the same literal in the output binary. In
particular, `__cfstring` has such references to `__cstring`. I've
adjusted ICF's `equalsConstant` logic accordingly, and I've added a few
more tests to make sure the addend-comparison code path is adequately
covered.
Fixes https://github.com/llvm/llvm-project/issues/51281.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D120137
... from a `uint64_t` to a `uint32_t`. (LLD-ELF uses a `uint32_t` too.)
About a 1.7% reduction in peak RSS when linking chromium_framework on my
3.2 GHz 16-Core Intel Xeon W Mac Pro, and no stat sig change in wall
time.
</Users/jezng/test2.sh ["before"]> </Users/jezng/test2.sh ["after"]> difference (95% CI)
RSS 1003036672.000 ± 9891065.259 985539505.231 ± 10272748.749 [ -2.3% .. -1.2%]
samples 27 26
base diff difference (95% CI)
sys_time 1.277 ± 0.023 1.277 ± 0.024 [ -0.9% .. +0.9%]
user_time 6.682 ± 0.046 6.598 ± 0.043 [ -1.6% .. -0.9%]
wall_time 5.904 ± 0.062 5.895 ± 0.063 [ -0.7% .. +0.4%]
samples 46 28
No appreciable change (~0.01%) in number of `equals` comparisons either:
Before:
ld64.lld: ICF needed 8 iterations
ld64.lld: equalsConstant() called 701643 times
ld64.lld: equalsVariable() called 3438526 times
After:
ld64.lld: ICF needed 8 iterations
ld64.lld: equalsConstant() called 701729 times
ld64.lld: equalsVariable() called 3438526 times
Reviewed By: #lld-macho, MaskRay, thakis
Differential Revision: https://reviews.llvm.org/D121052
The existing hashing of stubsHelperIndex has mostly been a no-op* for
some time now (ever since we made ICF run before dylib symbols get their
stubs indices assigned). I guess we could consider hashing the name +
filename of the DylibSymbol instead, but I'm not sure the overhead's
worth it... moreover, LLD/ELF only hashes their Defined symbols as well.
*: Technically it does change the hash value since stubsHelperIndex is
initialized to `UINT32_MAX` by default. But since all stubsHelperIndex
values are the same at when ICF runs, they don't add any useful
information to the hash.
This is debug code that is disabled by default. It'll provide a easy way
to figure out the impact (if any) of tweaking ICF's hashing algorithm
(since a poor quality hash will result in many more `equals*` calls).
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D121051
ObjFile::parse combines symbol initialization and resolution. Many tasks
unrelated to symbol resolution can be postponed and parallelized. This patch
extracts local symbol initialization and parallelizes it.
Technically the new function initializeLocalSymbols can be merged into
ObjFile::postParse, but functions like getSrcMsg may access the
uninitialized (all nullptr) local part of InputFile::symbols.
Linking chrome: 1.02x as fast with glibc malloc, 1.04x as fast with mimalloc
Depends on f456c3ae3f and D119908
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D119909
addWrappedSymbols may trigger archive extraction: split stack implementation
uses --wrap=pthread_create, which extracts libgcc.a(generic-morestack-thread.o).
This fixes the regression caused by 09602d3b47 by
making the invariant satisfied: no more non-compileBitcodeFiles object file is
produced at postParseObjectFile.
LLD (and ld64) emits uppercase hex addresses in the mapfile. The
map-file.s test passes right now because the addresses we emit happen
not to include any alphabets, but that can easily change.
I noticed this while dealing with
https://github.com/llvm/llvm-project/issues/54184.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120941
Now all the tests that cover symbol resolution / precedence have
"resolution" in their filename.
I also added a couple of extra comments.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120938
If we fix https://github.com/llvm/llvm-project/issues/54184, we will end
up including libSystem in every %lld invocation, which would break
tapi-link.s as it assumes that libSystem isn't directly linked (instead
it goes through libReexportSystem).
Let's remove this unnecessary coupling, as well as use `split-file`
instead of having a separate file under `Inputs`.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D120939
Per discussion on
https://reviews.llvm.org/D59709#inline-1148734, this seems like the
right course of action. `canBeOmittedFromSymbolTable()` subsumes and
generalizes the previous logic. In addition to handling `linkonce_odr`
`unnamed_addr` globals, we now also internalize `linkonce_odr` +
`local_unnamed_addr` constants.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D120173
Previously, we were using a syslibroot that pointed to macos while
linking against arch arm64_32, which didn't really make sense. It isn't
currently an issue, but will be if we add the `-lSystem` as part of
dealing with https://github.com/llvm/llvm-project/issues/54184.
If we fix https://github.com/llvm/llvm-project/issues/54184, the
`dyld_stub_binder` symbol will get included in every output dylib. This
would cause the addresses of the other symbols to shift, breaking the
test as it currently stands. Let's make the test more flexible.
Reviewed By: lgrey
Differential Revision: https://reviews.llvm.org/D120940
So far, we sort all discardable sections at the end, with only some
extra logic to make sure that the .reloc section is at the start
of that group of sections. But if there are other discardable
sections, other than .reloc, they must also be ordered before
.debug_* sections, to avoid leaving gaps if the executable is
stripped.
(Stripping executables doesn't remove all discardable sections,
only the ones named .debug_*).
Rust binaries seem to include a .rmeta section, which is marked
discardable. This fixes stripping such binaries if built with
dwarf debug info included.
This fixes issues observed in MSYS2 in
https://github.com/msys2/MINGW-packages/pull/10555.
Differential Revision: https://reviews.llvm.org/D120805
Show the name of of the archive in the error message as well as the name
of the object within it.
Differential Revision: https://reviews.llvm.org/D120689
This does tail merging (and deduplication) of the strings.
On a statically linked clang.exe, this shrinks the ~17 MB string
table by around 0.5 MB. This adds ~160 ms to the linking time
which originally was around 950 ms.
For cases where `-debug:symtab` or `-debug:dwarf` isn't set, the
string table is only used for long section names, where this
shouldn't make any difference at all.
Differential Revision: https://reviews.llvm.org/D120677
The previous code used an unbounded sprintf, which in theory can
overflow, writing either the null terminator or the last digits
into the next struct member.
In practice, in LLD, all long section names are written sequentially
first at the start of the string table, followed by all the long
symbol names. Due to this, even if the total string table would
end up large, the long section names have fairly short offsets,
which is why this hasn't been an issue in practice.
I don't think it's worth trying to write a test that produces an
executable with enough long section names to make the section names
themselves exceed 10^6 bytes, which is currently necessary to trigger
faults with the previous form.
Differential Revision: https://reviews.llvm.org/D120676
https://github.com/ClangBuiltLinux/linux/issues/1606
When GNU_PROPERTY_X86_FEATURE_1_IBT is enabled, ld.lld will create .plt output
section even if there is no PLT entry. Fix this by implementing
IBTPltSection::isNeeded instead of using the default code path (which always
returns true).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D120600
ObjFile::parse combines symbol initialization and resolution. Many tasks
unrelated to symbol resolution can be postponed and parallelized. This patch
extracts local symbol initialization and parallelizes it.
Technically the new function initializeLocalSymbols can be merged into
ObjFile::postParse, but functions like getSrcMsg may access the
uninitialized (all nullptr) local part of InputFile::symbols.
Linking chrome: 1.02x as fast with glibc malloc, 1.04x as fast with mimalloc
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D119909
This was based off @thakis' draft in {D103517}. I employed templates to ensure
the support for `-why_live` wouldn't slow down the regular non-why-live code
path.
No stat sig perf difference on my 3.2 GHz 16-Core Intel Xeon W:
base diff difference (95% CI)
sys_time 1.195 ± 0.015 1.199 ± 0.022 [ -0.4% .. +1.0%]
user_time 3.716 ± 0.022 3.701 ± 0.025 [ -0.7% .. -0.1%]
wall_time 4.606 ± 0.034 4.597 ± 0.046 [ -0.6% .. +0.2%]
samples 44 37
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120377