Currently LLD does not generate a map file if no /out flag (e.g., /out:a.exe) is
present. This is because LLD derives the map file's name from the default output
file name is no output file name is specified explicitly on the command
line. However, in this case, the default output file name has not been set by
LLD yet when LLD tries to set the name of the map file. This patch fixes this
corner case by moving the logic handling map file flags to a place after the
default output file name is set.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D134559
Absolute symbol should contain its absolute value, but LLD had been
writing its RVA instead. Write its VA instead.
DefinedSynthetic were being skipped before with the reasoning "Relative
symbols are unrepresentable in a COFF symbol table", which is only true
if the RVA points to outside of a section. LLD does create synthetic
symbols which points to actual data chunks (typical for symbols embedded
into the load config directory). Write these symbols to the COFF symbol
table too.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D134462
`__thread_vars` contains pointers to `__tlv_bootstrap`, which are fixed
up by dyld; however the section's alignment is not specified. This means
that the relocations might end up on odd addresses, which is not
representable by the soon to be added chained fixups.
This is arguably a bug in MC, but this behavior has been there since TLV
support was originally added.
This patch forces the `__thread_vars` sections to be aligned to the
target's pointer size. This is done by ld64 as well.
Differential Revision: https://reviews.llvm.org/D134594
This fixes a regression since fa74144c64dff6b145b0b3fa9397f913ddaa87bf;
even if we're linking to the dylib (which handles all the dependencies
in LLVMSupport), we're now also directly referencing zstd from
lld/ELF, and thus need to explicitly express our dependency on it.
https://reviews.llvm.org/D133003#3806508 can reproduce a non-determinism with
--threads=4. Making the config serial fixes non-determinism (by running the link
many times and compare output).
See D117853: compressing debug sections is a bottleneck and therefore it
has a large value parallizing the step.
zstd provides multi-threading API and the output is deterministic even with
different numbers of threads (see https://github.com/facebook/zstd/issues/2238).
Therefore we can leverage it instead of using the pigz-style sharding approach.
Also, switch to the default compression level 3. The current level 5
is significantly slower without providing justifying size benefit.
```
'dash b.sh 1' ran
1.05 ± 0.01 times faster than 'dash b.sh 3'
1.18 ± 0.01 times faster than 'dash b.sh 4'
1.29 ± 0.02 times faster than 'dash b.sh 5'
level=1 size: 358946945
level=3 size: 309002145
level=4 size: 307693204
level=5 size: 297828315
```
Reviewed By: andrewng, peter.smith
Differential Revision: https://reviews.llvm.org/D133679
In GNU ld,
* --version skips linker input processing.
* -v and -V keep processing if there is any input file. -V has more
information we don't support.
We currently make -V an alias for --version which skips input processing.
On many `*-freebsd` and `powerpc-*` targets, `gcc -v` passes `-V` to ld
and expects to process input. Make -V an alias for -v to provide
compatibility.
Fix https://github.com/llvm/llvm-project/issues/57859
This arg is undocumented but from looking at the code + experiment, it's used to add additional DYLD_ENVIRONMENT load commands to the output.
Differential Revision: https://reviews.llvm.org/D134058
Control Flow Guard requires specific flags and VA's be included in the
load config directory to be functional. In case CFGuard is enabled via
linker flags, we can check to make sure this is the case and give the
user a warning if otherwise.
MSVC provides a proper `_load_config_used` by default, so this is more
relevant for the MinGW target in which current versions of mingw-w64
does not provide this symbol.
The checks (only if CFGuard is enabled) include:
- The `_load_config_used` struct shall exist.
- Alignment of the `_load_config_used` struct (shall be aligned to
pointer size.)
- The `_load_config_used` struct shall be large enough to contain the
required fields.
- The values of the following fields are checked against the expected
values:
- GuardCFFunctionTable
- GuardCFFunctionCount
- GuardFlags
- GuardAddressTakenIatEntryTable
- GuardAddressTakenIatEntryCount
- GuardLongJumpTargetTable
- GuardLongJumpTargetCount
- GuardEHContinuationTable
- GuardEHContinuationCount
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D133099
Import thunks themselves contain a jump or branch, which is code by
nature. Therefore the import thunk symbol should be marked as function
type in the symbol table to help with debugging.
The `__imp_` import symbol associated to the import thunk is also useful
for debugging. However, when the import symbol isn't directly referenced
outside of the import thunk, it doesn't normally get added to the symbol
table. This change teaches LLD to add the import symbol explicitly.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D134169
They may modify thinlto optimization.
This patch only extends support for `-mllvm`. There is another way to
pass llvm flags, `-plugin-opt`, but its processing is different and will
be provided in a subsequent patch.
Differential Revision: https://reviews.llvm.org/D134013
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
This commit moves the parsing of linker optimization hints into
`ARM64::applyOptimizationHints`. This lets us avoid allocating memory
for holding the parsed information, and moves work out of
`ObjFile::parse`, which is not parallelized at the moment.
This change reduces the overhead of processing LOHs to 25-30 ms when
linking Chromium Framework on my M1 machine; previously it took close to
100 ms.
There's no statistically significant change in runtime for a --threads=1
link.
Performance figures with all 8 cores utilized:
N Min Max Median Avg Stddev
x 20 3.8027232 3.8760762 3.8505335 3.8454145 0.026352574
+ 20 3.7019017 3.8660538 3.7546209 3.7620371 0.032680043
Difference at 95.0% confidence
-0.0833775 +/- 0.019
-2.16823% +/- 0.494094%
(Student's t, pooled s = 0.0296854)
Differential Revision: https://reviews.llvm.org/D133439
This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.
While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.
`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.
Reviewed By: #lld-macho, thakis, thevinster
Differential Revision: https://reviews.llvm.org/D133825
This is similar to the `-alias` CLI option, but it gives finer-grained
control in that it allows the aliased symbols to be treated as private
externs.
While working on this, I realized that our `-alias` handling did not
cover the cases where the aliased symbol is a common or dylib symbol,
nor the case where we have an undefined that gets treated specially and
converted to a defined later on. My N_INDR handling neglects this too
for now; I've added checks and TODO messages for these.
`N_INDR` symbols cropped up as part of our attempt to link swift-stdlib.
Reviewed By: #lld-macho, thakis, thevinster
Differential Revision: https://reviews.llvm.org/D133825
This is what ld64 does (though it doesn't use ICF to do this; instead it
always dedups selrefs by default).
We'll want to dedup implicitly-defined selrefs as well, but I will leave
that for future work.
Additionally, I'm not *super* happy with the current LLD implementation
because I think it is rather janky and inefficient. But at least it
moves us toward the goal of closing the size gap with ld64. I've
described ideas for cleaning up our implementation here:
https://github.com/llvm/llvm-project/issues/57714
Differential Revision: https://reviews.llvm.org/D133780
A simple sed doing these substitutions:
- `${LLVM_BINARY_DIR}/lib${LLVM_LIBDIR_SUFFIX}\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`
where `\>` means "word boundary".
The only manual modifications were reverting changes in
- `runtimes/CMakeLists.txt`
because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.
There are some `${LLVM_BINARY_DIR}/lib` without the `${LLVM_LIBDIR_SUFFIX}`, but these refer to the lib subdirectory of the source (`llvm/lib`). That `lib` is automatically appended to make the local `CMAKE_CURRENT_BINARY_DIR` value by `add_subdirectory`; since the directory name in the source tree is fixed without any suffix, the corresponding `CMAKE_CURRENT_BINARY_DIR` will also be. We therefore do not replace it but leave it as-is.
This picks up where D133828 left off, getting the occurrences with*out* `CMAKE_CFG_INTDIR`. But this is difficult to do correctly and so not done in the (retroactively) previous diff.
This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D132316
On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.
Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.
This fixes mingw dylib builds with native TLS after
e6aebff674.
At the same time, move the whole thread local variable within
#if LLVM_ENABLE_THREADS
to fix builds without threading support.
Differential Revision: https://reviews.llvm.org/D133759
R_RISCV_CALL/R_RISCV_CALL_PLT distinction isn't necessary. R_RISCV_CALL has been
deprecated as a resolution to
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/98 .
ld.lld and mold treat the two relocation types the same. GNU ld has a custom
handling for undefined weak functions which is unnecessary: calling an
unresolved undefined weak function is UB and GNU ld can handle the case without
a relocation error (such a function call is usually guarded by a zero value
check and should be allowed).
This patch assembles `call foo` to use R_RISCV_CALL_PLT instead of the
deprecated R_RISCV_CALL.
Note: the code generator still differentiates `call foo` and (maybe preemptible)
`call foo@plt`, but the difference is purely aesthetic.
Note: D105429 does not support R_RISCV_CALL_PLT correctly. Changed the test to
force R_RISCV_CALL for now.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D132530
So @keith observed
[here](https://reviews.llvm.org/D128108#inline-1263900) that the
StringRefs we were returning from `CStringInputSection::getStringRef()`
included the null terminator in their total length, but regular
StringRefs do not. Let's fix that so these StringRefs are less confusing
to use.
Reviewed By: #lld-macho, keith, Roger
Differential Revision: https://reviews.llvm.org/D133728
When the same bitcode object file is given multiple times from the Command-line
as lazy object file, empty index is generated which overwrites the one from thinlink.
This could cause undefined symbols during final link.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D133740
Previously, we would add entries to DataInCodeSection in the order they
appeared in input files. Because of this, entries would not be sorted if
sections were reordered due to e.g. `-order_file` or call graph profile
sorting. ld64 always keeps data-in-code information sorted.
This commit also fixes an incorrect assertion. The original assertion
from D103006 used to check that data-in-code entries are sorted in the
input objects -- likely because we use binary search on that data. In
D115556, the assertion was moved into `collectDataInCodeEntries`, but
the checked variable's name was not changed, so it ended up checking the
final contents of the DataInCodeSection.
We no longer crash when building LLVM with PGO using an asserts build of
LLD as the linker.
Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1265937
Numbers for linking the Chromium Framework reproducer from #48001, which
has 6829 data-in-code entries:
x before
+ after
N Min Max Median Avg Stddev
x 20 2.1076453 2.3059683 2.1132485 2.1350302 0.049905767
+ 20 2.1069031 2.3915262 2.14465 2.1728429 0.084065898
No difference proven at 95.0% confidence
Differential Revision: https://reviews.llvm.org/D133581
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.
MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.
Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:
* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast
Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):
* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast
Reviewed By: andrewng
Differential Revision: https://reviews.llvm.org/D133003
1. Fixed rST hyperlink syntax
2. Renamed LD64 -> ld64
3. Moved up the `-no_deduplicate` section so it is right under the
section talking about how our default dedup behavior differs; IMO
it makes more sense to read them in that order
4. De-bullet-listed some other sections so we have less whitespace in
the rendered page
5. Since the Mach-O LLD Port page has only one sub-page, don't render an
entire toctree with just one item. Use a "See also" box instead.
6. Wrap lines at 80 chars.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D133717
pdb-natvis test fails on Arm Windows as generated byte sizes and crc
differ for Windows on Arm. This patch fixes the test to make it pass on
Arm/Windows.
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Differential Revision: https://reviews.llvm.org/D133598
`clang -gz=zstd a.o` passes this option to the linker. This option compresses output
debug sections with zstd and sets ch_type to ELFCOMPRESS_ZSTD. As of today, very
few DWARF consumers recognize ELFCOMPRESS_ZSTD.
Use the llvm::zstd::compress API with level llvm::zstd::DefaultCompression (5),
which we may tune after we have more experience with zstd output.
zstd has built-in parallel compression support (so we don't need to do D117853
for zlib), which is not leveraged yet.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D133548
so that lld accepts relocatable object files produced by `clang -c -g -gz=zstd`.
We don't want to increase the size of InputSection, so do redundant but cheap
ch_type checks instead.
Differential Revision: https://reviews.llvm.org/D129406
These will be LLD-specific options to support Control Flow Guard for the
MinGW target. They are disabled by default, but enabling `--guard-cf`
will also enable `--guard-longjmp` unless `--no-guard-longjmp` is also
specified. These options maps to `-guard:cf,[no]longjmp`.
Note that these features require the `_load_config_used` symbol to
contain the load config directory and be filled with the required
symbols. While current versions of mingw-w64 do not supply this symbol,
the user can provide their own version of it.
Reviewed By: MaskRay, rnk
Differential Revision: https://reviews.llvm.org/D132808
This is MSVC's behaviour. LLD was matching it before D99078. Let's go back this way.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D132901
LLVM bitcode contains support for weak symbols, so we can add support
for overriding weak symbols in the output COFF even though COFF doesn't
have inherent support for weak symbols.
The motivation for this patch is that Chromium is trying to use libc++'s
assertion handler mechanism, which relies on weak symbols [0], but we're
unable to perform a ThinLTO build on Windows due to this problem [1].
[0]: https://reviews.llvm.org/D121478
[1]: https://crrev.com/c/3863576
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D133165
Bug: An assertion fails:
Assertion failed: isa<To>(Val) && "cast<Ty>() argument of incompatible type!",
file C:\Users\<user>\prog\llvm\llvm-git-lld-bug\llvm\include\llvm/Support/Casting.h, line 578
Bug is triggered, if
- a map file is requested with /MAP, and
- Architecture is ARMv7, Thumb, and
- a relative jump (branch instruction) is greater than 16 MiB (2^24)
The reason for the Bug is:
- a Thunk is created for the jump
- a Symbol for the Thunk is created
- of type `DefinedSynthetic`
- in file `Writer.cpp`
- in function `getThunk`
- the Symbol has no name
- when creating the map file, the name of the Symbol is queried
- the function `Symbol::computeName` of the base class `Symbol`
casts the `this` pointer to type `DefinedCOFF` (a derived type),
but the acutal type is `DefinedSynthetic`
- The in the llvm::cast an assertion fails
Changes:
- Modify regression test to trigger this bug
- Give the symbol pointing to the thunk a name, to fix the bug
- Add assertion, that only DefinedCOFF symbols are allowed to have an
empty name, when the constructor of the base class Symbol is executed
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D133201
I wasn't able to find any docs for Mach-O in `lld/docs`, so here's an attempt at adding basic docs. One of my goals here is to make it easy for users who are unfamiliar with linkers to successfully use lld.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D132893
This commit removes the `relocTargets` vector, and instead makes the
code reconstruct the referent addresses from the relocated instructions.
This will allow us to move `applyOptimizationHints` from
`ConcatInputSection::writeTo` to a separate pass that parses and applies
LOHs in one step, on a per-file basis. This will improve performance, as
parsing is currently done serially in `ObjFile::parse`.
I opted to remove the sanity check that ensures that all relocations
within a LOH point to the same symbol. This completely eliminates the
need to search through relocations. It is my understanding that
mismatched relocation targets should not be present in valid object
files, so it's unlikely that the removal will lead to mislinks.
Differential Revision: https://reviews.llvm.org/D133274