Remove a stray -lib argument in guardcf-lto.ll; llvm-lib doesn't
support generating import libs from a def file unlike lib.exe.
Previously this worked because the -lib argument was ignored
(printing only a warning).
Differential Revision: https://reviews.llvm.org/D96699
/reproduce: now works correctly with:
- /call-graph-ordering-file:
- /def:
- /natvis:
- /order:
- /pdbstream:
I went through all instances of MemoryBuffer::getFile() and made sure
everything that didn't already do so called takeBuffer().
For natvis, that wasn't possible since DebugInfo/PDB wants to take
owernship of the natvis buffer. For that case, I'm manually adding the
tar file entry.
/natvis: and /pdbstream: is slightly awkward, since createResponseFile()
always adds these flags to the response file but createPDB() (which
ultimately adds the files referenced by the flags) is only called if
/debug is also passed. So when using /natvis: without /debug with
/reproduce:, lld won't warn, but when linking using the response
file from the archive, it won't find the natvis file since it's not
in the tar. This isn't a new issue though, and after this patch things
at least work with using /natvis: _with_ debug with /reproduce:.
(Same for /pdbstream:)
Differential Revison: https://reviews.llvm.org/D97212
On z/OS, other error messages are not matched correctly in lit tests.
```
EDC5121I Invalid argument.
EDC5111I Permission denied.
```
This patch adds a lit substitution to fix it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D95808
On z/OS, the following error message is not matched correctly in lit tests.
```
EDC5129I No such file or directory.
```
This patch uses a lit config substitution to check for platform specific error messages.
Reviewed By: muiez, jhenderson
Differential Revision: https://reviews.llvm.org/D95246
The relocation offsets were incorrect. I fixed them with llvm-readobj
-codeview -codeview-subsection-bytes, which has a helpful printout of
the relocations that apply to a given symbol record with their offsets.
With this, I was able to update the relocation offsets in the yaml to
fix the line table and the S_DEFRANGE_REGISTER records.
There is still some remaining inconsistency in yaml2obj and obj2yaml
when round tripping MSVC objects, but that isn't a blocker for relanding
D94267.
Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately
after they're used (either by renaming or deletion). However, we've seen
instances on Windows where this doesn't happen, probably due to the
filesystem being flaky. This is effectively a resource leak which has
prevented us from using the ThinLTO cache on Windows.
Since those temporary files are in the thinlto cache directory which we
prune periodically anyway, allowing them to be pruned too seems like a
tidy way to solve the problem.
Differential revision: https://reviews.llvm.org/D94962
On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully.
```
EDC5129I No such file or directory.
```
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D94239
Before this patch, when using LLD with /DEBUG:GHASH and MSVC precomp.OBJ files, we had a bunch of:
lld-link: warning: S_[GL]PROC32ID record in blabla.obj refers to PDB item index 0x206ED1 which is not a LF[M]FUNC_ID record
This was caused by LF_FUNC_ID and LF_MFUNC_ID which didn't have correct mapping to the corresponding TPI records. The root issue was that the indexMapStorage was improperly re-assembled in UsePrecompSource::remapTpiWithGHashes.
After this patch, /DEBUG and /DEBUG:GHASH produce exactly the same debug infos in the PDB.
Differential Revision: https://reviews.llvm.org/D93732
Fixes issue where if a line section doesn't start with a line number
then the addresses at the beginning of the section don't have line numbers.
For example, for a line section like this
```
0001:00000010-00000014, line/column/addr entries = 1
7 00000013 !
```
a line number wouldn't be found for addresses from 10 to 12.
This matches behavior when using the DIA SDK.
Differential Revision: https://reviews.llvm.org/D93306
Similar to D77853. Change ADRP to print the target address in hex, instead of the raw immediate.
The behavior is similar to GNU objdump but we also include `0x`.
Note: GNU objdump is not consistent whether or not to emit `0x` for different architectures. We try emitting 0x consistently for all targets.
```
GNU objdump: adrp x16, 10000000
Old llvm-objdump: adrp x16, #0
New llvm-objdump: adrp x16, 0x10000000
```
`adrp Xd, 0x...` assembles to a relocation referencing `*ABS*+0x10000` which is not intended. We need to use a linker or use yaml2obj.
The main test is `test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml`
Differential Revision: https://reviews.llvm.org/D93241
The existing code handles this correctly and I checked that the code
in NativeInlineSiteSymbol also handles this correctly, but it was
wrong in the NativeFunctionSymbol code.
Differential Revision: https://reviews.llvm.org/D92134
Also sync help texts for the option between elf and coff ports.
Decisions:
- Do this even if /lldignoreenv is passed. /reproduce: does not affect
the main output, and this makes the env var more convenient to use.
(On the other hand, it's now possible to set this env var and forget
about it, and all future builds in the same shell will be much slower.
That's true for ld.lld, but posix shells have an easy way to set an
env var for a single command; in cmd.exe this is not possible without
contortions. Then again, lld-link runs in posix shells too.)
Original patch rebased across D68378 and D68381.
Differential Revision: https://reviews.llvm.org/D67707
In https://reviews.llvm.org/D89072 I added static const data members
to the debug subsection for globals. It skipped emitting an S_CONSTANT if it
didn't have a value, which meant the subsection could be empty.
This patch fixes the empty subsection issue.
Differential Revision: https://reviews.llvm.org/D92049
llvm-symbolizer used to use the DIA SDK for symbolization on
Windows; this patch switches to using native symbolization, which was
implemented recently.
Users can still make the symbolizer use DIA by adding the `-dia` flag
in the LLVM_SYMBOLIZER_OPTS environment variable.
Differential Revision: https://reviews.llvm.org/D91814
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This is a follow-up for D70378 (Cover usage of LLD as a library).
While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:
1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
lld:🧝:Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
to free the half-initialized ObjFile instance, and more precisely its
ObjFile::dwarf member.
Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.
But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.
Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.
Differential Revision: https://reviews.llvm.org/D88348
This broke both Firefox and Chromium (PR47905) due to what seems like dllimport
function not being handled correctly.
> This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
> Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
>
> Reviewed By: rnk
>
> Differential Revision: https://reviews.llvm.org/D87544
This reverts commit cfd8481da1.
This is more or less a port of rL329598 (D45275) to the COFF linker.
Since there were already LTO-related settings under -opt:, I added
them there instead of new flags.
Differential Revision: https://reviews.llvm.org/D90624
Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.
The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.
The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.
Differential Revision: https://reviews.llvm.org/D89004
Reapplied with the bitfield member canInline fixed so it doesn't break
builds targeting windows.
This reverts commit a012c704b5.
Breaks Windows builds.
C:\src\llvm-mint\lld\COFF\Symbols.cpp(26,1): error: static_assert failed due to requirement 'sizeof(lld::coff::SymbolUnion) <= 48' "symbols should be optimized for memory usage"
static_assert(sizeof(SymbolUnion) <= 48,
Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.
The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.
The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.
Differential Revision: https://reviews.llvm.org/D89004
Fixes https://bugs.llvm.org/show_bug.cgi?id=46473
LLD wasn't previously specifying any specific alignment in the TLS table's Characteristics field so the loader would just assume the default value (16 bytes). This works most of the time except if you have thread locals that want specific higher alignments (e.g. 32 as in the bug) *even* if they specify an alignment on the thread local. This change updates LLD to take the max alignment from tls section.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D88637
Revert individual wip commits and will instead follow up with a
single commit with all the changes. Makes cherry-picking easier
and will contain all the right tags.
This reverts commit 32a4ad3b6c.
This reverts commit 7fe13af676.
This reverts commit 51fbc1bef6.
This reverts commit f80950a8bb.
This reverts commit 0778cad9f3.
This reverts commit 8b70d527d7.
This reverts 9b5b305023 and fixes the unwanted re-ordering when generating ThinLTO indexes.
The goal of this patch is to better balance thread utilization during ThinLTO in-process linking (in llvm-lto2 or in LLD). Before this patch, large modules would often be scheduled late during execution, taking a long time to complete, thus starving the thread pool.
We now sort modules in descending order, based on each module's bitcode size, so that larger modules are processed first. By doing so, smaller modules have a better chance to keep the thread pool active, and thus avoid starvation when the bitcode compilation is almost complete.
In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & -flto=thin, /opt:lldltojobs=all, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc.
Before patch: 100 sec
After patch: 85 sec
Inspired by the work done by David Callahan in D60495.
Differential Revision: https://reviews.llvm.org/D87966
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
The MinGW driver has separate options for OS and subsystem version.
Having this available in lld-link allows the MinGW driver to both match
GNU ld better and simplifies the code for merging two (potentially
mismatching) arguments into one.
Differential Revision: https://reviews.llvm.org/D88802
Parse the components as decimal, instead of decuding the base from
the string. This avoids ambiguity if the second number contains leading
zeros, which previously were parsed as indicating an octal number.
MS link.exe doesn't support hexadecimal numbers in the version numbers,
neither in /version nor in /subsystem.
Differential Revision: https://reviews.llvm.org/D88801
This adds the following two new lines to /summary:
21351 Input OBJ files (expanded from all cmd-line inputs)
61 PDB type server dependencies
38 Precomp OBJ dependencies
1420669231 Input type records <<<<
78665073382 Input type records bytes <<<<
8801393 Merged TPI records
3177158 Merged IPI records
59194 Output PDB strings
71576766 Global symbol records
25416935 Module symbol records
2103431 Public symbol records
Differential Revision: https://reviews.llvm.org/D88703
Before this patch /summary was crashing with some .PCH.OBJ files, because tpiMap[srcIdx++] was reading at the wrong location. When the TpiSource depends on a .PCH.OBJ file, the types should be offset by the previously merged PCH.OBJ set of indices.
Differential Revision: https://reviews.llvm.org/D88678
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
Stored Error objects have to be checked, even if they are success
values.
This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..
Original commit message:
-----------------------------------------
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.
Differential Revision: https://reviews.llvm.org/D70378
Before this patch, these two tests were emitting both a .DLL and .LIB. The output .LIB file name also happens to be an input .LIB file name. This prevented the test from executing a second time when LLD is re-entrant (LLD_IN_TEST=2).
This is a support patch for https://reviews.llvm.org/D70378.