Linking fails when targeting `x86_64-apple-darwin` for runtimes. The issue
is that LLD strictly assumes the target architecture be present in the tbd
files (which isn't always true). For example, when targeting `x86_64h`, it should
work with `x86_64` because they are ABI compatible. This is also inline with what
ld64 does.
An environment variable (which ld64 also supports) is also added to preserve the
existing behavior of strict architecture matching.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D130683
We were previously doing it after LTO, which did have the desired effect
of having the un-exported symbols marked as private extern in the final
output binary, but doing it before LTO creates more optimization
opportunities.
One observable difference is that LTO can now elide un-exported symbols
entirely, so they may not even be present as private externs in the
output.
This is also what ld64 implements.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D130429
Similarly to -load_hidden, this flag instructs the linker to not export
symbols from the specified archive. While that flag takes a path,
-hidden-l looks for the specified library name in the search path.
The test changes are needed because -hidden-lfoo resolves to libfoo.a,
not foo.a.
Differential Revision: https://reviews.llvm.org/D130529
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
This reland adds bitcode file handling, so we won't get any compile
errors due to BitcodeFile::forceHidden being unused.
Differential Revision: https://reviews.llvm.org/D130473
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
Differential Revision: https://reviews.llvm.org/D130473
Previously, we treated it as a regular ConcatInputSection. However, ld64
actually parses its contents and uses that to synthesize a single image
info struct, generating one 8-byte section instead of `8 * number of
object files with ObjC code`.
I'm not entirely sure what impact this section has on the runtime, so I
just tried to follow ld64's semantics as closely as possible in this
diff. My main motivation though was to reduce binary size.
No significant perf change on chromium_framework on my 16-core Mac Pro:
base diff difference (95% CI)
sys_time 1.764 ± 0.062 1.748 ± 0.032 [ -2.4% .. +0.5%]
user_time 5.112 ± 0.104 5.106 ± 0.046 [ -0.9% .. +0.7%]
wall_time 6.111 ± 0.184 6.085 ± 0.076 [ -1.6% .. +0.8%]
samples 30 32
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130125
Similar to cstrings ld64 always deduplicates cfstrings. This was already
being done when enabling ICF, but for debug builds you may want to flip
this on if you cannot eliminate your instances of this, so this change
makes --deduplicate-literals also apply to cfstrings.
Differential Revision: https://reviews.llvm.org/D130134
This is a follow-on to {D129556}. I've refactored the code such that
`addFile()` no longer needs to take an extra parameter. Additionally,
the "do we force-load or not" policy logic is now fully contained within
addFile, instead of being split between `addFile` and
`parseLCLinkerOptions`. This also allows us to move the `ForceLoad` (now
`LoadType`) enum out of the header file.
Additionally, we can now correctly report loads induced by
`LC_LINKER_OPTION` in our `-why_load` output.
I've also added another test to check that CLI library non-force-loads
take precedence over `LC_LINKER_OPTION` + `-force_load_swift_libs`. (The
existing logic is correct, just untested.)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130137
This fixes https://github.com/llvm/llvm-project/issues/56059 and
https://github.com/llvm/llvm-project/issues/56440. This is inspired by
tapthaker's patch (https://reviews.llvm.org/D127941), and has reused his
test cases. This patch adds an bool "isCommandLineLoad" to indicate
where archives are from. If lld tries to load the same library loaded
previously by LC_LINKER_OPTION from CLI, it will use this
isCommandLineLoad to determine if it should be affected by -all_load &
-ObjC flags. This also prevents -force_load from affecting archives
loaded previously from CLI without such flag, whereas tapthaker's patch
will fail such test case (introduced by
https://reviews.llvm.org/D128025).
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D129556
This creates a symbol alias similar to --defsym in the elf linker. This
is used by swiftpm for all executables, so it's useful to support. This
doesn't implement -alias_list but that could be done pretty easily as
needed.
Differential Revision: https://reviews.llvm.org/D129938
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:
**chromium_framework**
This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:
base diff difference (95% CI)
sys_time 1.826 ± 0.019 1.962 ± 0.034 [ +6.5% .. +8.4%]
user_time 9.306 ± 0.054 9.926 ± 0.082 [ +6.2% .. +7.1%]
wall_time 8.225 ± 0.068 8.947 ± 0.128 [ +8.0% .. +9.6%]
samples 15 22
With that flag enabled, the regression mostly disappears, as hoped:
base diff difference (95% CI)
sys_time 1.839 ± 0.062 1.866 ± 0.068 [ -0.9% .. +3.8%]
user_time 9.452 ± 0.068 9.490 ± 0.067 [ -0.1% .. +0.9%]
wall_time 8.383 ± 0.127 8.452 ± 0.114 [ -0.1% .. +1.8%]
samples 17 21
**Unnamed internal app**
Without `-femit-dwarf-unwind`, this is the perf hit:
base diff difference (95% CI)
sys_time 1.372 ± 0.029 1.317 ± 0.024 [ -4.6% .. -3.5%]
user_time 2.835 ± 0.028 2.980 ± 0.027 [ +4.8% .. +5.4%]
wall_time 3.205 ± 0.079 3.383 ± 0.066 [ +4.9% .. +6.2%]
samples 102 83
With `-femit-dwarf-unwind`, the perf hit almost disappears:
base diff difference (95% CI)
sys_time 1.274 ± 0.026 1.270 ± 0.025 [ -0.9% .. +0.3%]
user_time 2.812 ± 0.023 2.822 ± 0.035 [ +0.1% .. +0.7%]
wall_time 3.166 ± 0.047 3.174 ± 0.059 [ -0.2% .. +0.7%]
samples 95 97
Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):
base diff difference (95% CI)
sys_time 1.128 ± 0.010 1.124 ± 0.023 [ -1.3% .. +0.6%]
user_time 7.176 ± 0.030 7.106 ± 0.094 [ -1.5% .. -0.4%]
wall_time 7.874 ± 0.041 7.795 ± 0.121 [ -1.7% .. -0.3%]
samples 16 25
And for LLD:
base diff difference (95% CI)
sys_time 1.315 ± 0.019 1.280 ± 0.019 [ -3.2% .. -2.0%]
user_time 2.980 ± 0.022 2.822 ± 0.016 [ -5.5% .. -5.0%]
wall_time 3.369 ± 0.038 3.175 ± 0.033 [ -6.2% .. -5.3%]
samples 47 47
So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D129540
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.
This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
ADR+NOP if they reference the same page
These two kinds already cover more than 50% of all LOHs in
chromium_framework.
Differential Review: https://reviews.llvm.org/D128093
`--time-trace=foo` has the same behavior as `--time-trace --time-trace-file=<file>`
had previously.
Also, for mac, make --time-trace-granularity *not* imply --time-trace, to match
behavior of the ELF port.
Differential Revision: https://reviews.llvm.org/D128451
Identical literal folding takes ~1.4% of the time, and was missing
from the trace.
Signature computation still needs ~2.2% of the time, so probably worth
explicitly marking its contribution to "Write output file" (9.1%)
Differential Revision: https://reviews.llvm.org/D128343
As an optimization for ld64 sometimes it can be useful to not export any
symbols for top level binaries that don't need any exports, to do this
you can pass `-exported_symbols_list /dev/null`, or new with Xcode 14
(ld64 816) there is a `-no_exported_symbols` flag for the same behavior.
This reproduces this behavior where previously an empty exported symbols
list file would have been ignored.
Differential Revision: https://reviews.llvm.org/D127562
This reverts commit 942f4e3a7c.
The additional change required to avoid the assertion errors seen
previously is:
--- a/lld/MachO/ICF.cpp
+++ b/lld/MachO/ICF.cpp
@@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
/*relocVA=*/0);
isec->data = copy;
}
- } else {
+ } else if (!isEhFrameSection(isec)) {
+ // EH frames are gathered as hashables from unwindEntry above; give a
+ // unique ID to everything else.
isec->icfEqClass[0] = ++icfUniqueID;
}
}
Differential Revision: https://reviews.llvm.org/D123435
== Background ==
`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.
== Caveats ==
It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.
**Known limitations of the current code:**
* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
relocations as described in 618def651b.
`llvm-mc` emits regular relocations for ARM EH frames, which we do not
yet handle correctly.
Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:
base diff difference (95% CI)
sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%]
user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%]
wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%]
samples 30 31
== Design ==
Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)
In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.
== Next Steps ==
In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.
== Misc ==
The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123435
With -platform_version flags for two distinct platforms,
this writes a LC_BUILD_VERSION header for each.
The motivation is that this is needed for self-hosting with lld as linker
after D124059.
To create a zippered output at the clang driver level, pass
-target arm64-apple-macos -darwin-target-variant arm64-apple-ios-macabi
to create a zippered dylib.
(In Xcode's clang, `-darwin-target-variant` is spelled just `-target-variant`.)
(If you pass `-target arm64-apple-ios-macabi -target-variant arm64-apple-macos`
instead, ld64 crashes!)
This results in two -platform_version flags being passed to the linker.
ld64 also verifies that the iOS SDK version is at least 13.1. We don't do that
yet. But ld64 also does that for other platforms and we don't. So we need to
do that at some point, but not in this patch.
Only dylib and bundle outputs can be zippered.
I verified that a Catalyst app linked against a dylib created with
clang -shared foo.cc -o libfoo.dylib \
-target arm64-apple-macos \
-target-variant arm64-apple-ios-macabi \
-Wl,-install_name,@rpath/libfoo.dylib \
-fuse-ld=$PWD/out/gn/bin/ld64.lld
runs successfully. (The app calls a function `f()` in libfoo.dylib
that returns a const char* "foo", and NSLog(@"%s")s it.)
ld64 is a bit more permissive when writing zippered outputs,
see references to "unzippered twins". That's not implemented yet.
(If anybody wants to implement that, D124275 is a good start.)
Differential Revision: https://reviews.llvm.org/D124887
This change implements --icf=safe for MachO based on addrsig section that is implemented in D123751.
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D123752
Before this,
clang empty.cc -target x86_64-apple-ios13.1-macabi \
-framework CoreServices -fuse-ld=lld
would error out with
ld64.lld: error: path/to/MacOSX.sdk/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore.tbd(
/System/Library/Frameworks/
CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/
Versions/A/CarbonCore) is incompatible with x86_64 (macCatalyst)
Now it works, like with ld64.
Differential Revision: https://reviews.llvm.org/D124336
A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually
one each for "normal" macOS and one for macCatalyst.
These are usually created by passing something like
-shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi
to clang, which turns it into
-platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4
for the linker.
ld64.lld can read these files fine, but it can't write them. Before this
change, it would just silently use the last -platform_version flag and ignore
the rest.
This change adds a warning that writing zippered dylibs isn't implemented yet
instead.
Sadly, parts of ld64.lld's test suite relied on the previous
"silently use last flag" semantics for its test suite: `%lld` always expanded
to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a
different value passed a 2nd `-platform_version` flag later on. But this now
produces a warning if the platform passed to `-platform_version` is not `macos`.
There weren't very many cases of this, so move these to use `%no-arg-lld` and
manually pass `-arch`.
Differential Revision: https://reviews.llvm.org/D124106
{D123302} got me looking deeper at `includeInSymtab`. I thought it was a
little odd that there were excluded (live) symbols for which
`includeInSymtab` was false; we shouldn't have so many different ways to
exclude a symbol. As such, this diff makes the `L`-prefixed-symbol
exclusion code use `includeInSymtab` too. (Note that as part of our
support for `__eh_frame`, we will also be excluding all `__eh_frame`
symbols from the symtab in a future diff.)
Another thing I noticed is that the `emitStabs` code never has to deal
with excluded symbols because `SymtabSection::finalize()` already
filters them out. As such, I've updated the comments and asserts from
{D123302} to reflect this.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D123433
{D118797} means that we can now check the name/segname of a given
section directly, instead of having to look those properties up on one
of its subsections. This allows us to simplify our code.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D123275
This removes options for performing LTO with the legacy pass
manager in LLD. Options that explicitly enable the new pass manager
are retained as no-ops.
Differential Revision: https://reviews.llvm.org/D123219
Or rather, error out if it is set to something other than ON. This
removes the ability to enable the legacy pass manager by default,
but does not remove the ability to explicitly enable it through
various flags like -flegacy-pass-manager or -enable-new-pm=0.
I checked, and our test suite definitely doesn't pass with
LLVM_ENABLE_NEW_PASS_MANAGER=OFF anymore.
Differential Revision: https://reviews.llvm.org/D123126
`config->priorities` has been used to hold the intermediate state during the construction of the order in which sections should be laid out. This is not a good place to hold this state since the intermediate state is not a "configuration" for LLD. It should be encapsulated in a class for building a mapping from section to priority (which I created in this diff as the `PriorityBuilder` class).
The same thing is being done for `config->callGraphProfile`.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D122156
This was based off @thakis' draft in {D103517}. I employed templates to ensure
the support for `-why_live` wouldn't slow down the regular non-why-live code
path.
No stat sig perf difference on my 3.2 GHz 16-Core Intel Xeon W:
base diff difference (95% CI)
sys_time 1.195 ± 0.015 1.199 ± 0.022 [ -0.4% .. +1.0%]
user_time 3.716 ± 0.022 3.701 ± 0.025 [ -0.7% .. -0.1%]
wall_time 4.606 ± 0.034 4.597 ± 0.046 [ -0.6% .. +0.2%]
samples 44 37
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D120377
If both an order file and a call graph profile are present, the edges of the
call graph which use symbols present in the order file are not used. All of
the symbols in the order file will appear at the beginning of the section just
as they do currently. In other words, the highest priority derived from the
call graph will be below the lowest priority derived from the order file.
Practically, this change renames CallGraphSort.{h,cpp} to SectionPriorities.{h,cpp},
and most order file and call graph profile related code is moved into the new
file to reduce duplication.
Differential Revision: https://reviews.llvm.org/D117354
By unsetting this property, we are now able to internalize more symbols
during LTO. I compared the output of `-save-temps` for both LLD and
ld64, and we now match ld64's behavior as far as `lto-internalize.ll` is
concerned.
(Thanks @smeenai for working on an initial version of this diff!)
Fixes https://github.com/llvm/llvm-project/issues/50574.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D119372
Adds `-pagezero_size`. `-pagezero_size` commonly used for kernel development.
`-pagezero_size` changes the `__PAGEZERO` size, removing that segment if it is set to zero.
One of the four flags from {D118570}
Now with error messages and tests.
Differential Revision: https://reviews.llvm.org/D118724
Earlier in LLD's evolution, I tried to create the illusion that
subsections were indistinguishable from "top-level" sections. Thus, even
though the subsections shared many common field values, I hid those
common values away in a private Shared struct (see D105305). More
recently, however, @gkm added a public `Section` struct in D113241 that
served as an explicit way to store values that are common to an entire
set of subsections (aka InputSections). Now that we have another "common
value" struct, `Shared` has been rendered redundant. All its fields can
be moved into `Section` instead, and the pointer to `Shared` can be replaced
with a pointer to `Section`.
This `Section` pointer also has the advantage of letting us inspect other
subsections easily, simplifying the implementation of {D118798}.
P.S. I do think that having both `Section` and `InputSection` makes for
a slightly confusing naming scheme. I considered renaming `InputSection`
to `Subsection`, but that would break the symmetry with `OutputSection`.
It would also make us deviate from LLD-ELF's naming scheme.
This change is perf-neutral on my 3.2 GHz 16-Core Intel Xeon W machine:
base diff difference (95% CI)
sys_time 1.258 ± 0.031 1.248 ± 0.023 [ -1.6% .. +0.1%]
user_time 3.659 ± 0.047 3.658 ± 0.041 [ -0.5% .. +0.4%]
wall_time 4.640 ± 0.085 4.625 ± 0.063 [ -1.0% .. +0.3%]
samples 49 61
There's also no stat sig change in RSS (as measured by `time -l`):
base diff difference (95% CI)
time 998038627.097 ± 13567305.958 1003327715.556 ± 15210451.236 [ -0.2% .. +1.2%]
samples 31 36
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D118797
In the case your framework bundles contain relocatable objects, and your
objects include LC_LINKER_OPTIONs for the framework, previously they
would not be deduplicated like they would have if they were static
archives. This was also the case if you passed `-framework` for the
framework as well.
Reviewed By: #lld-macho, thakis, oontvoo
Differential Revision: https://reviews.llvm.org/D114841
This is a ld64 option equivalent to `-sectcreate seg sect /dev/null`
that's useful for creating sections like the RESTRICT section.
Differential Revision: https://reviews.llvm.org/D117749
This flag is the default, so in ld64 it is not implemented, but it can
be useful to negate previous -all_load arguments. Specifically if your
build system has some global linker flags, that you may want to negate
for specific links. We use something like this today to make sure some
C++ symbols are automatically discovered for all links, which passing
-all_load hides.
Differential Revision: https://reviews.llvm.org/D117629
In ld.lld, when an ObjFile/BitcodeFile is read in --start-lib state, the file is
given archive semantics. --end-lib closes the previous --start-lib. A build
system can use this feature as an alternative to archives. This patch ports
the feature to lld-macho.
--start-lib and --end-lib are positional, unlike usual ld64 options.
I think the slight drawback does not matter as (a) reusing option names
make build systems convenient (b) `--start-lib a.o b.o --end-lib` conveys more
information than an alternative design: `-objlib a.o -objlib b.o` because
--start-lib makes it clear which objects are in the same conceptual archive.
This provides flexibility (c) `-objlib`/`-filelist` interaction may be weird.
Close https://github.com/llvm/llvm-project/issues/52931
Reviewed By: #lld-macho, Jez Ng, oontvoo
Differential Revision: https://reviews.llvm.org/D116913
It's still uncertain but whether we want to have `deduplicate-literals` be the
default flag for LLD out of the box or not. If `deduplicate-literals` is the default
behavior, then we will need a way override it and not deduplicate. Luckily, we
have `no_deduplicate` to fill this gap. For now, I've set the default to be false
which aligns with the existing behavior. That can only always be changed after
discussions on D117250.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D117387
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.
See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html
Differential Revision: https://reviews.llvm.org/D108850
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.
The majority of the changes were in LLD and TextAPI.
Reviewed By: cishida
Differential Revision: https://reviews.llvm.org/D117163
Depends on D112160
This adds the new options `--call-graph-profile-sort` (default),
`--no-call-graph-profile-sort` and `--print-symbol-order=`. If call graph
profile sorting is enabled, reads `__LLVM,__cg_profile` sections from object
files and uses the resulting graph to put callees and callers close to each
other in the final binary via the C3 clustering heuristic.
Differential Revision: https://reviews.llvm.org/D112164