We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or
3-bytes). These should be 0-filled even though that is not a valid
instruction. This matches the behaviour on other architectures like
ARM, X86, and MIPS. When a custom section is emitted, it may be
classified as text even though it may be a data section or we may be
emitting data into a text segment (e.g. a literal pool). In such cases,
we should be resilient to the emission request.
This was originally identified by the Linux kernel build and reported on
D131270 by Nathan Chancellor.
Differential Revision: https://reviews.llvm.org/D132482
Reviewed By: luismarques
Tested By: Nathan Chancellor
We currently process one OutputSection at a time and for each OutputSection
write contained input sections in parallel. This strategy does not leverage
multi-threading well. Instead, parallelize writes of different OutputSections.
The default TaskSize for parallelFor often leads to inferior sharding. We
prepare the task in the caller instead.
* Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup
* Add llvm::parallel::TaskGroup::execute.
* Change writeSections to declare TaskGroup and pass it to writeTo.
Speed-up with --threads=8:
* clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast
* clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast
* chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast
* scylladb build/release: 1.09x as fast
On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast.
Differential Revision: https://reviews.llvm.org/D131247
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/(\$\{CMAKE_CFG_INTDIR}/)?lib(${LLVM_LIBDIR_SUFFIX})?\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/(\$\{CMAKE_CFG_INTDIR}/)?bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `compiler-rt/cmake/Modules/CompilerRTUtils.cmake
- `runtimes/CMakeLists.txt`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D132316
This change renames this method match its original name and the name
used in the wasm linker.
Back in d8f8abbd4a the ELF SymbolTable
method `getSymbols()` was replaced with `forEachSymbol`.
Then in a2fc964417 `forEachSymbol` was
replaced with a `llvm::iterator_range`.
Then in e9262edf0d we came full circle
and the `llvm::iterator_range` was replaced with a `symbols()` accessor
that was identical the original `getSymbols()`.
`getSymbols` also matches the name used elsewhere in the ELF linker as
well as in both COFF and wasm backend (e.g. `InputFiles.h` and
`SyntheticSections.h`)
Differential Revision: https://reviews.llvm.org/D130787
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.
`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.
I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.
Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi
Differential Revision: https://reviews.llvm.org/D130586
The flag accidentally used Joined<> instead of Flag<>.
Previously, `--warn-dylib-install-namefoobarbaz` would be accepted and
had the same effect as `-warn-dylib-install-name`. Now the flag only
works if no suffix is attached to it, as originally intended.
Also fix a typo in the flag's help text.
Differential Revision: https://reviews.llvm.org/D131781
This refactors LTO compile to look more like COFF, where cache hits and misses are all funneled through the same code path.
Previously, cache hits were *not* being saved to -object_path_lto, which led to them sometimes falling out of the cache before dsymutil could process them. As a side effect of the refactor, cached objects are now saved with -save-temps as well, which seems desirable.
(Deleted lld/test/MachO/lto-cache-dsymutil.ll and rolled it into lld/test/MachO/lto-object-path.ll, since the cache-only, non object path approach is unreliable anyway).
Differential Revision: https://reviews.llvm.org/D131624
This is an entirely new embedded directive - extending the GNU ld
command line option --exclude-symbols to be usable in embedded
directives too.
(GNU ld.bfd also got support for the same new directive, currently in
the latest git version, after the 2.39 branch.)
This works as an inverse to the regular embedded dllexport directives,
for cases when autoexport of all eligible symbols is performed.
Differential Revision: https://reviews.llvm.org/D130120
This adds support for the existing GNU ld command line option, which
allows excluding individual symbols from autoexport (when linking a
DLL and no symbols are marked explicitly as dllexported).
Differential Revision: https://reviews.llvm.org/D130118
Apple Clang in Xcode 14 introduced a new feature for reducing the
overhead of objc_msgSend calls by deduplicating the setup calls for each
individual selector. This works by clang adding undefined symbols for
each selector called in a translation unit, such as `_objc_msgSend$foo`
for calling the `foo` method on any `NSObject`. There are 2
different modes for this behavior, the default directly does the setup
for `_objc_msgSend` and calls it, and the smaller option does the
selector setup, and then calls the standard `_objc_msgSend` stub
function.
The general overview of how this works is:
- Undefined symbols with the given prefix are collected
- The suffix of each matching undefined symbol is added as a string to
`__objc_methname`
- A pointer is added for every method name in the `__objc_selrefs`
section
- A `got` entry is emitted for `_objc_msgSend`
- Stubs are emitting pointing to the synthesized locations
Notes:
- Both `__objc_methname` and `__objc_selrefs` can also exist from object
files, so their contents are merged with our synthesized contents
- The compiler emits method names for defined methods, but not for
undefined symbols you call, but stubs are used for both
- This only implements the default "fast" mode currently just to reduce
the diff, I also doubt many folks will care to swap modes
- This only implements this for arm64 and x86_64, we don't need to
implement this for 32 bit iOS archs, but we should implement it for
watchOS archs in a later diff
Differential Revision: https://reviews.llvm.org/D128108
With D26647, we can already identify input object files compiled by cl.exe with
/GL. It seems to be helpful to do the same and print an error message for those
object files compiled with /GL but are inside libraries/archives too.
Reviewed By: rnk, thieta
Differential Revision: https://reviews.llvm.org/D131458
This was recently introduced in GNU linkers and it makes sense for
ld.lld to have the same support. This implementation omits checking if
the input string is valid json to reduce size bloat.
Differential Revision: https://reviews.llvm.org/D131439
Normally we'd use LLVM_FALLTHROUGH, or now, [[fallthrough]].
But for case labels followed directly by other case labels, we
use neither.
No behavior change.
These are new debug types that ships with the latest
Windows SDK and would warn and finally fail lld-link.
The symbols seems to be related to Microsoft's XFG
which is their version of CFG. We can't handle any of
this yet, so for now we can just ignore these types
so that lld doesn't fail with a new version of Windows
SDK.
Fixes: #56285
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D129378
Some header files used
namespace lld {
namespace macho {
// ...
} // namespace macho
std::string toString(const Type &t);
} // namespace lld
In those files, I didn't use a nested namespace since it's not a big win there.
No behavior change.
Differential Revision: https://reviews.llvm.org/D131354
This has been deprecated since D116492 earlier in 2022.
That seems recent, but with the recent cut of LLVM 15 that is still two releases (14 and 15). Meanwhile Clang has deprecated `llvm-config` for a lot longer, and since it is likely that LLD users are also Clang users, this serves as an extra "heads up" that `llvm-config` is on its way out.
Remove it in favor of using CMake's find_package() function.
Reviewed By: MaskRay, mgorny
Differential Revision: https://reviews.llvm.org/D131144
Also make the soft toolchain requirements hard. This allows
us to use C++17 features in LLVM now.
If we find patterns with C++17 that improve readability
it should be recommended in the coding standards.
Reviewed By: jhenderson, cor3ntin, MaskRay
Differential Revision: https://reviews.llvm.org/D130689
D74537 introduced a bug: if `(config->andFeatures & GNU_PROPERTY_AARCH64_FEATURE_1_PAC) != 0`
with -z pac-plt unspecified, we incorrectly use AArch64BtiPac, whose writePlt will make
out-of-bounds write after the .plt section. This is often benign because the
output section after .plt will usually overwrite the content.
This is very difficult to test without D131247 (Parallelize writes of different OutputSections).
Some tests (e.g. aarch64-feature-pac.s) segfault in libstdc++ _GLIBCXX_DEBUG
builds (enabled by LLVM_ENABLE_EXPENSIVE_CHECKS).
dyn_cast<ThunkSection> is incorrectly true for any SyntheticSection. std::merge
transitively calls mergeCmp(x, x) (due to __glibcxx_requires_irreflexive_pred)
and will segfault in `ta->getTargetInputSection()`. The dyn_cast<ThunkSection>
issue should be eventually fixed properly, bug `a != b` is robust enough for now.
D91426 makes .got possibly empty while needed. If .got and .data have the same
address, and .got's content is written after .data, the first word of .data will
be corrupted.
The bug is not testable without D131247.
This implements the last step of
https://discourse.llvm.org/t/parallel-input-file-parsing/60164 for the ELF port.
For an ELF object file, we previously did: parse, (parallel) initializeLocalSymbols, (parallel) postParseObjectFile.
Now we do: parse, (parallel) initSectionsAndLocalSyms, (parallel) postParseObjectFile.
initSectionsAndLocalSyms does most of input section initialization.
The sequential `parse` does SHT_ARM_ATTRIBUTES/SHT_RISCV_ATTRIBUTES/SHT_GROUP initialization for now.
Performance linking some programs with --threads=8 (glibc 2.33 malloc and mimalloc):
* clang: 1.05x as fast with glibc malloc, 1.03x as fast with mimalloc
* chrome: 1.04x as fast with glibc malloc, 1.03x as fast with mimalloc
* internal search program: 1.08x as fast with glibc malloc, 1.05x as fast with mimalloc
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D130810
makeThreadLocal/makeThreadLocalN are moved from D130810 ([ELF] Parallelize input
section initialization) here to make D130810 more focused on the refactor:
* COFF has some needs for multiple linker contexts. D108850 partially removed
global states from lldCommon but left the global variable `lctx`.
* To the best of my knowledge, all multiple-linker-context feature requests to
ELF are more from user convenience, with no very strong argument.
* In practice, ELF port is very difficult to remove global states without
introducing significant performance regression/hurting code readability.
* Per-thread allocators from D122922/D123879 are too expensive and will not
really benefit ELF.
This patch adds a simple thread_local based makeThreadLocal to
lld/Common/Memory.h. It will enable further optimization in ELF.
This fixes the following warnings produced by GCC 9:
../tools/lld/MachO/Arch/ARM64.cpp: In member function ‘void {anonymous}::OptimizationHintContext::applyAdrpLdr(const lld::macho::OptimizationHint&)’:
../tools/lld/MachO/Arch/ARM64.cpp:448:18: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘uint64_t’ {aka ‘long unsigned int’} [-Wsign-compare]
448 | if (ldr.offset != (rel1->referentVA & 0xfff))
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../tools/lld/MachO/UnwindInfoSection.cpp: In function ‘bool canFoldEncoding(compact_unwind_encoding_t)’:
../tools/lld/MachO/UnwindInfoSection.cpp:404:44: warning: comparison between ‘enum<unnamed>’ and ‘enum<unnamed>’ [-Wenum-compare]
404 | static_assert(UNWIND_X86_64_MODE_MASK == UNWIND_X86_MODE_MASK, "");
| ^~~~~~~~~~~~~~~~~~~~
../tools/lld/MachO/UnwindInfoSection.cpp:405:49: warning: comparison between ‘enum<unnamed>’ and ‘enum<unnamed>’ [-Wenum-compare]
405 | static_assert(UNWIND_X86_64_MODE_STACK_IND == UNWIND_X86_MODE_STACK_IND, "");
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Differential Revision: https://reviews.llvm.org/D130970
I went over the output of the following mess of a command:
`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`
and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).
Differential Revision: https://reviews.llvm.org/D130982
Linux Standard Base Core Specification says that CIE/FDE is padded to an
addressing unit size boundary, but in practice GNU assembler/LLVM integrated
assembler pad FDE/CIE to 4 and the last FDE to 8 on 64-bit systems.
In addition, GNU ld doesn't pad to 8, so let's drop excess padding, too.
If the assembler provides aligned pieces, the output will be aligned.
Noticed .eh_frame size reduction for 3 executables: 0.3% (chrome), 4.7% (clang),
7.6% (an internal program).
Previously we only supporting using the system pointer size (aka the
`absptr` encoding) because `llvm-mc`'s CFI directives always generate EH
frames with that encoding. But libffi uses 4-byte-encoded, hand-rolled
EH frames, so this patch adds support for it.
Fixes#56576.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D130804
* Inline getReloc
* Fold the UINT32_MAX length check into the section size check.
This transformation is valid because we don't support .eh_frame input sections
larger than 32-bit (unrealistic even for large code models).
This simplifies code, removes a read32 (for id==0 check), and makes it feasible
to combine some operations in EhInputSection::split and EhFrameSection::addRecords.
Mostly NFC, but fixes "Relocation not in any piece" assertion failure in an
erroneous case when a relocation offset precedes all CIE/FDE pices.
If we change
CieRecord *&rec = cieMap[{cie.data(), personality}];
to
CieRecord *&rec = cieMap[{cie.data(), nullptr}];
The new test can catch the failure.
inputSections temporarily contains EhInputSection objects mainly for
combineEhSections. Place EhInputSection objects into a new vector
ehInputSections instead of inputSections.
A LazySymbol is one that lives in `.a` archive and gets pulled in by a
strong reference. However, weak references to such symbols do not
result in them be loaded from the archive. In this case we want to
treat such symbols at undefined rather then lazy, once symbols
resolution is complete.
This fixes a crash bug in the linker when weakly referenced symbol that
lives in an archive file is live at the end of the link. In the case of
dynamic linking this is expected to turn into an import with (in the
case of a function symbol) a function index.
Differential Revision: https://reviews.llvm.org/D130736
Similarly to -o output directories will not be created so -Map being
copied verbatim will likely cause ld.lld @response.txt to fail.
Differential Revision: https://reviews.llvm.org/D130681
A symbol `$ld$previous$/Another$1.2.3$1$3.0$14.0$_xxx$` means
"pretend symbol `_xxx` is in dylib `/Another` with version `1.2.3`
if the deployment target is between `3.0` and `14.0` and we're
targeting platform `1` (ie macOS)".
This means dylibs can now inject synthetic dylibs into the link, so
DylibFile needs to grow a 3rd constructor.
The only other interesting thing is that such an injected dylib
counts as a use of the original dylib. This patch gets this mostly
right (if _only_ `$ld$previous` symbols are used from a dylib,
we don't add a dep on the dylib itself, matching ld64), but one case
where we don't match ld64 yet is that ld64 even omits the original
dylib when linking it with `-needed-l`. Lld currently still adds a load
command for the original dylib in that case. (That's for a future
patch.)
Fixes#56074.
Differential Revision: https://reviews.llvm.org/D130725
Linking fails when targeting `x86_64-apple-darwin` for runtimes. The issue
is that LLD strictly assumes the target architecture be present in the tbd
files (which isn't always true). For example, when targeting `x86_64h`, it should
work with `x86_64` because they are ABI compatible. This is also inline with what
ld64 does.
An environment variable (which ld64 also supports) is also added to preserve the
existing behavior of strict architecture matching.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D130683
We don't need to recompute the list LLVMConfig.cmake provides us.
When LLVM is being built, the list is two elements long: generated headers and headers from source.
When LLVM is already built,the list is one element long: the installed header directory containing both generated and hand-written sources.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D130553
We were previously doing it after LTO, which did have the desired effect
of having the un-exported symbols marked as private extern in the final
output binary, but doing it before LTO creates more optimization
opportunities.
One observable difference is that LTO can now elide un-exported symbols
entirely, so they may not even be present as private externs in the
output.
This is also what ld64 implements.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D130429
This hint instructs the linker to optimize an adrp+add+ldr sequence used
for loading from a local symbol's address by loading directly if it's
close enough, or with an adrp(p)+ldr sequence if it's not.
This transformation is the same as what's done for ADRP_LDR_GOT_LDR when
the symbol is local. The logic for acting on this hint is therefore
moved to a new function which will be called from the existing
applyAdrpLdrGotLdr() function.
Differential Revision: https://reviews.llvm.org/D130505
In DWARF5, the `DW_AT_name` and `DW_AT_comp_dir` attributes are encoded
using the `strx*` forms, which specify an index into `__debug_str_offs`.
This commit adds that section to DwarfObject, so the debug info parser
can resolve these references.
The test case was manually adapted from stabs-icf.s.
Fixes#51668
Differential Revision: https://reviews.llvm.org/D130559
This is still undocumented and unsupported, but if someone passed it
before you would end up with a missing file error since this takes an
argument that wouldn't be handled.
Differential Revision: https://reviews.llvm.org/D130606
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.
Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.
To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D128955
Fixes a regression from D117973, that used CMAKE_BINARY_DIR instead of
LLVM_BINARY_DIR in some places.
Differential Revision: https://reviews.llvm.org/D130555
Most Arm disassemblers, including GNU objdump and Arm's own `fromelf`,
emit an instruction's raw encoding as a 32-bit words or (for Thumb)
one or two 16-bit halfwords, in logical order rather than according to
their storage endianness. This is generally easier to read: it matches
the encoding diagrams in the architecture spec, it matches the value
you'd write in a `.inst` directive, and it means that fields within
the instruction encoding that span more than one byte (such as branch
offsets or `SVC` immediates) can be read directly in the encoding
without having to mentally reverse the bytes.
llvm-objdump already has a system of PrettyPrinter subclasses which
makes it easy for a target to drop in its own preferred formatting.
This patch adds pretty-printers for all the Arm targets, so that
llvm-objdump will display Arm instruction encodings in their preferred
layout instead of little-endian and bytewise.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D130358
Similarly to -load_hidden, this flag instructs the linker to not export
symbols from the specified archive. While that flag takes a path,
-hidden-l looks for the specified library name in the search path.
The test changes are needed because -hidden-lfoo resolves to libfoo.a,
not foo.a.
Differential Revision: https://reviews.llvm.org/D130529
Firstly, we we make an additional GNUInstallDirs-style variable. With
NixOS, for example, this is crucial as we want those to go in
`${dev}/lib/cmake` not `${out}/lib/cmake` as that would a cmake subdir
of the "regular" libdir, which is installed even when no one needs to do
any development.
Secondly, we make *Config.cmake robust to absolute package install
paths. We for NixOS will in fact be passing them absolute paths to make
the `${dev}` vs `${out}` distinction mentioned above, and the
GNUInstallDirs-style variables are suposed to support absolute paths in
general so it's good practice besides the NixOS use-case.
Thirdly, we make `${project}_INSTALL_PACKAGE_DIR` CACHE PATHs like other
install dirs are.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D117973
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
This reland adds bitcode file handling, so we won't get any compile
errors due to BitcodeFile::forceHidden being unused.
Differential Revision: https://reviews.llvm.org/D130473
This flag was introduced in ld64-609. It instructs the linker to link to
a static library while treating its symbols as if they had hidden
visibility. This is useful when building a dylib that links to static
libraries but we don't want the symbols from those to be exported.
Closes#51505
Differential Revision: https://reviews.llvm.org/D130473
If the `-demangle` flag is passed to lld, symbol names will now be
demangled in the "referenced by:" message in addition to the referenced
symbol's name, which was already demangled before this change.
Differential Revision: https://reviews.llvm.org/D130490
Previously, we treated it as a regular ConcatInputSection. However, ld64
actually parses its contents and uses that to synthesize a single image
info struct, generating one 8-byte section instead of `8 * number of
object files with ObjC code`.
I'm not entirely sure what impact this section has on the runtime, so I
just tried to follow ld64's semantics as closely as possible in this
diff. My main motivation though was to reduce binary size.
No significant perf change on chromium_framework on my 16-core Mac Pro:
base diff difference (95% CI)
sys_time 1.764 ± 0.062 1.748 ± 0.032 [ -2.4% .. +0.5%]
user_time 5.112 ± 0.104 5.106 ± 0.046 [ -0.9% .. +0.7%]
wall_time 6.111 ± 0.184 6.085 ± 0.076 [ -1.6% .. +0.8%]
samples 30 32
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130125
which occurs when there are EH frames present in the object file's weak
def.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D130409
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
`advanceSubsection()` didn't account for the possibility that a section
could have no subsections.
Reviewed By: #lld-macho, thakis, BertalanD
Differential Revision: https://reviews.llvm.org/D130288
If there are multiple symbols at the same address, our unwind info
implementation assumes that we always register unwind entries to a
single canonical symbol.
This assumption was violated by the `registerEhFrame` code.
Fixes#56570.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130208
It's only used in one branch, so we were unnecessarily calculating the
length of many symbol names.
Tiny speedup when linking chromium_framework on my M1 Mac mini:
x before.txt
+ after.txt
N Min Max Median Avg Stddev
x 10 3.9917109 4.0418 4.0318099 4.0203902 0.021459873
+ 10 3.944725 4.053988 3.9708955 3.9825602 0.037257609
Difference at 95.0% confidence
-0.03783 +/- 0.0285663
-0.940953% +/- 0.710536%
(Student's t, pooled s = 0.0304028)
Differential Revision: https://reviews.llvm.org/D130234
This commit reduces the size of the emitted rebase sections by
generating the REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB and
REBASE_OPCODE_DO_REBASE_ULEB_TIMES_SKIPPING_ULEB opcodes.
With this change, chromium_framework's rebase section is a 40% smaller
197 kilobytes, down from the previous 320 kB. That is 6 kB smaller than
what ld64 produces for the same input.
Performance figures from my M1 Mac mini:
x before
+ after
N Min Max Median Avg Stddev
x 10 4.2269349 4.3300061 4.2689675 4.2690016 0.031151669
+ 10 4.219331 4.2914009 4.2398136 4.2448277 0.023817308
No difference proven at 95.0% confidence
Differential Revision: https://reviews.llvm.org/D130180
Similar to cstrings ld64 always deduplicates cfstrings. This was already
being done when enabling ICF, but for debug builds you may want to flip
this on if you cannot eliminate your instances of this, so this change
makes --deduplicate-literals also apply to cfstrings.
Differential Revision: https://reviews.llvm.org/D130134
Print the actual number of symbols that would have been exported
too, which helps assessing the situation.
Differential Revision: https://reviews.llvm.org/D130117
This is a follow-on to {D129556}. I've refactored the code such that
`addFile()` no longer needs to take an extra parameter. Additionally,
the "do we force-load or not" policy logic is now fully contained within
addFile, instead of being split between `addFile` and
`parseLCLinkerOptions`. This also allows us to move the `ForceLoad` (now
`LoadType`) enum out of the header file.
Additionally, we can now correctly report loads induced by
`LC_LINKER_OPTION` in our `-why_load` output.
I've also added another test to check that CLI library non-force-loads
take precedence over `LC_LINKER_OPTION` + `-force_load_swift_libs`. (The
existing logic is correct, just untested.)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D130137
The new format uses symbol relocations, as described in {D127637}.
Reviewed By: #lld-macho, alx32
Differential Revision: https://reviews.llvm.org/D128938
This fixes https://github.com/llvm/llvm-project/issues/56059 and
https://github.com/llvm/llvm-project/issues/56440. This is inspired by
tapthaker's patch (https://reviews.llvm.org/D127941), and has reused his
test cases. This patch adds an bool "isCommandLineLoad" to indicate
where archives are from. If lld tries to load the same library loaded
previously by LC_LINKER_OPTION from CLI, it will use this
isCommandLineLoad to determine if it should be affected by -all_load &
-ObjC flags. This also prevents -force_load from affecting archives
loaded previously from CLI without such flag, whereas tapthaker's patch
will fail such test case (introduced by
https://reviews.llvm.org/D128025).
Reviewed By: int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D129556
This creates a symbol alias similar to --defsym in the elf linker. This
is used by swiftpm for all executables, so it's useful to support. This
doesn't implement -alias_list but that could be done pretty easily as
needed.
Differential Revision: https://reviews.llvm.org/D129938
To do this, we need to slice away the LSDA pointer, just like we are
slicing away the functionAddress pointer.
No observable difference in perf on chromium_framework:
base diff difference (95% CI)
sys_time 1.769 ± 0.068 1.761 ± 0.065 [ -2.7% .. +1.8%]
user_time 9.517 ± 0.110 9.528 ± 0.116 [ -0.6% .. +0.8%]
wall_time 8.291 ± 0.174 8.307 ± 0.183 [ -1.1% .. +1.5%]
samples 21 25
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D129830
This method is called on each relocation when parsing input files, so
the overhead of using virtual functions ends up being quite large. We
now have a single non-virtual method, which reads from the appropriate
array of relocation attributes set in the TargetInfo constructor.
This change results in a modest 2.3% reduction in link time for
chromium_framework measured on an x86-64 VPS, and 0.7% on an arm64 Mac.
N Min Max Median Avg Stddev
x 10 11.869417 12.032609 11.935041 11.938268 0.045802324
+ 10 11.581526 11.785265 11.649885 11.659507 0.054634834
Difference at 95.0% confidence
-0.278761 +/- 0.0473673
-2.33502% +/- 0.396768%
(Student's t, pooled s = 0.0504124)
Differential Revision: https://reviews.llvm.org/D130000
Clang passes a filename rather than a directory in -lto_object_path when
using FullLTO. Previously, it was always treated it as a directory, so
lld would crash when it attempted to create temporary files inside it.
Fixes#54805
Differential Revision: https://reviews.llvm.org/D129705
While working on {D129830}, I realized that our handling of ICF +
eh_frame combined was untested. Additionally I realized that the comment
explaining why we were safely slicing away the functionAddress reloc
from our compact unwind entries was... insufficient and slightly
misleading. I've tried to clarify it.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D129894
We were re-defining the various numeric variables when we actually
intended to check already-defined variables against the value on the
current CHECK line.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D129831
Use a format more similar to unresolved references from regular object
files. It's probably easier to read for people who are less familiar
with the linker diagnostics.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D129790
On my system the date formatting is a bit different from what the test used to
support. I'm using:
Windows 11 version 21H2, build 22000.795 using the English(Canada) region.
ls from BusyBox 1.36
VS 2022 17.2.5
WinSDK 10.0.22000
This just removes the code that gates the logic. The main issue here is
perf impact: without {D122258}, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:
**chromium_framework**
This is without the use of `-femit-dwarf-unwind=no-compact-unwind`:
base diff difference (95% CI)
sys_time 1.826 ± 0.019 1.962 ± 0.034 [ +6.5% .. +8.4%]
user_time 9.306 ± 0.054 9.926 ± 0.082 [ +6.2% .. +7.1%]
wall_time 8.225 ± 0.068 8.947 ± 0.128 [ +8.0% .. +9.6%]
samples 15 22
With that flag enabled, the regression mostly disappears, as hoped:
base diff difference (95% CI)
sys_time 1.839 ± 0.062 1.866 ± 0.068 [ -0.9% .. +3.8%]
user_time 9.452 ± 0.068 9.490 ± 0.067 [ -0.1% .. +0.9%]
wall_time 8.383 ± 0.127 8.452 ± 0.114 [ -0.1% .. +1.8%]
samples 17 21
**Unnamed internal app**
Without `-femit-dwarf-unwind`, this is the perf hit:
base diff difference (95% CI)
sys_time 1.372 ± 0.029 1.317 ± 0.024 [ -4.6% .. -3.5%]
user_time 2.835 ± 0.028 2.980 ± 0.027 [ +4.8% .. +5.4%]
wall_time 3.205 ± 0.079 3.383 ± 0.066 [ +4.9% .. +6.2%]
samples 102 83
With `-femit-dwarf-unwind`, the perf hit almost disappears:
base diff difference (95% CI)
sys_time 1.274 ± 0.026 1.270 ± 0.025 [ -0.9% .. +0.3%]
user_time 2.812 ± 0.023 2.822 ± 0.035 [ +0.1% .. +0.7%]
wall_time 3.166 ± 0.047 3.174 ± 0.059 [ -0.2% .. +0.7%]
samples 95 97
Just for fun, I measured the impact of `-femit-dwarf-unwind` on ld64
(`base` has the extra DWARF unwind info in the input object files,
`diff` doesn't):
base diff difference (95% CI)
sys_time 1.128 ± 0.010 1.124 ± 0.023 [ -1.3% .. +0.6%]
user_time 7.176 ± 0.030 7.106 ± 0.094 [ -1.5% .. -0.4%]
wall_time 7.874 ± 0.041 7.795 ± 0.121 [ -1.7% .. -0.3%]
samples 16 25
And for LLD:
base diff difference (95% CI)
sys_time 1.315 ± 0.019 1.280 ± 0.019 [ -3.2% .. -2.0%]
user_time 2.980 ± 0.022 2.822 ± 0.016 [ -5.5% .. -5.0%]
wall_time 3.369 ± 0.038 3.175 ± 0.033 [ -6.2% .. -5.3%]
samples 47 47
So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D129540
Currently the PPC64R2SaveStub thunk will produce Power 10 code by default.
This produced an issue when linking older code that made use of the st_other=1
bit but was never meant to be linked or run on Power 10.
This patch makes it so that only the R_PPC64_REL24_NOTOC relocation can produce
Power 10 code.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D129580
It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
This load command specifies the offset and size of the exports trie.
This information used to be a field in LC_DYLD_INFO, but in newer
libraries, it has a dedicated load command: LC_DYLD_EXPORTS_TRIE.
The format of the trie is the same for both load commands, so the code
for parsing it can be shared.
LLD does not generate this yet; it is mainly useful when chained fixups
are in use, as the other members of LC_DYLD_INFO are unused then, so the
smaller LC_DYLD_EXPORTS_TRIE can be output instead.
LLDB gained support for this in D107673.
Fixes#54550
Differential Revision: https://reviews.llvm.org/D129430
This hint instructs the linker to relax a GOT-indirect load.
If the referenced symbol is external and its GOT entry is within +/- 1
MiB, the GOT entry can be loaded with a single literal ldr instruction.
If the referenced symbol is local, its address may be loaded directly if
it's close enough, or with an adr(p) + ldr pair if it's not.
This type accounts for more than half of all LOHs in chromium_framework.
This commit moves the eligibility checks into helper functions to
improve the readability of the LOH processing code. Ho functional
changes are intended to the previously implemented LOH types.
Differential Revision: https://reviews.llvm.org/D129427