Parse the components as decimal, instead of decuding the base from
the string. This avoids ambiguity if the second number contains leading
zeros, which previously were parsed as indicating an octal number.
MS link.exe doesn't support hexadecimal numbers in the version numbers,
neither in /version nor in /subsystem.
Differential Revision: https://reviews.llvm.org/D88801
This adds the following two new lines to /summary:
21351 Input OBJ files (expanded from all cmd-line inputs)
61 PDB type server dependencies
38 Precomp OBJ dependencies
1420669231 Input type records <<<<
78665073382 Input type records bytes <<<<
8801393 Merged TPI records
3177158 Merged IPI records
59194 Output PDB strings
71576766 Global symbol records
25416935 Module symbol records
2103431 Public symbol records
Differential Revision: https://reviews.llvm.org/D88703
Before this patch /summary was crashing with some .PCH.OBJ files, because tpiMap[srcIdx++] was reading at the wrong location. When the TpiSource depends on a .PCH.OBJ file, the types should be offset by the previously merged PCH.OBJ set of indices.
Differential Revision: https://reviews.llvm.org/D88678
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
Stored Error objects have to be checked, even if they are success
values.
This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..
Original commit message:
-----------------------------------------
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.
To give an idea, here is the /time output placed side by side:
BEFORE | AFTER
Input File Reading: 956 ms | 968 ms
Code Layout: 258 ms | 190 ms
Commit Output File: 6 ms | 7 ms
PDB Emission (Cumulative): 6691 ms | 4253 ms
Add Objects: 4341 ms | 2927 ms
Type Merging: 2814 ms | 1269 ms -55%!
Symbol Merging: 1509 ms | 1645 ms
Publics Stream Layout: 111 ms | 112 ms
TPI Stream Layout: 764 ms | 26 ms trivial
Commit to Disk: 1322 ms | 1036 ms -300ms
----------------------------------------- --------
Total Link Time: 8416 ms 5882 ms -30% overall
The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.
This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.
Algorithm
----------
Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.
In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206
The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
tpiSources[tpiSrcIndex]->ghashes[typeIndex]
By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.
To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.
After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.
Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.
Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.
Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.
Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.
Differential Revision: https://reviews.llvm.org/D87805
Library users should not need to call errorHandler().reset() explicitly.
google/iree calls lld:🧝:link and without the patch some global
variables are not cleaned up in the next invocation.
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.
Differential Revision: https://reviews.llvm.org/D70378
Extending the lifetime of these type index mappings does increase memory
usage (+2% in my case), but it decouples type merging from symbol
merging. This is a pre-requisite for two changes that I have in mind:
- parallel type merging: speeds up slow type merging
- defered symbol merging: avoid heap allocating (relocating) all symbols
This eliminates CVIndexMap and moves its data into TpiSource. The maps
are also split into a SmallVector and ArrayRef component, so that the
ipiMap can alias the tpiMap for /Z7 object files, and so that both maps
can simply alias the PDB type server maps for /Zi files.
Splitting TypeServerSource establishes that all input types to be merged
can be identified with two 32-bit indices:
- The index of the TpiSource object
- The type index of the record
This is useful, because this information can be stored in a single
64-bit atomic word to enable concurrent hashtable insertion.
One last change is that now all object files with debugChunks get a
TpiSource, even if they have no type info. This avoids some null checks
and special cases.
Differential Revision: https://reviews.llvm.org/D87736
Binutils generated sections seem to be padded to a multiple of 16 bytes,
but the aux section definition contains the original, unpadded section
length.
The size check used for IMAGE_COMDAT_SELECT_SAME_SIZE previously
only checked the size of the section itself. When checking the
currently processed object file against the previously chosen
comdat section, we easily have access to the aux section definition
of the currently processed section, but we have to iterate over the
symbols of the previously selected object file to find the section
definition of the previously picked section. (We don't want to
inflate SectionChunk to carry more data, for something that is only
needed in corner cases.) Only do this when the mingw flag is set.
This fixes statically linking clang-built C++ object files against
libstdc++ built with GCC, if the object files contain e.g. typeinfo.
Differential Revision: https://reviews.llvm.org/D86659
The global variable outputSections in the COFF writer was not
cleared between runs which caused successive calls to lld::coff::link
to generate invalid binaries. These binaries when loaded would result
in "invalid win32 applications" and/or "bad image" errors.
Differential Revision: https://reviews.llvm.org/D86401
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).
Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
Previously this flag was just ignored. If set, set the
IMAGE_DLL_CHARACTERISTICS_NO_SEH bit, regardless of the normal safeSEH
machinery.
In mingw configurations, the safeSEH bit might not be set in e.g. object
files built from handwritten assembly, making it impossible to use the
normal safeseh flag. As mingw setups don't generally use SEH on 32 bit
x86 at all, it should be fine to set that flag bit though - hook up
the existing GNU ld flag for controlling that.
Differential Revision: https://reviews.llvm.org/D84701
For a weak symbol func in a comdat, the actual leader symbol ends up
named like .weak.func.default*. Likewise, for stdcall on i386, the symbol
may be named _func@4, while the section suffix only is "func", which the
previous implementation didn't handle.
This fixes unwinding through weak functions when using
-ffunction-sections in mingw environments.
Differential Revision: https://reviews.llvm.org/D84607
The "undefined symbol" error message from lld-link displays up to 3 references to that symbol, and the number of extra references not shown.
This patch removes the computation of the strings for those extra references.
It fixes a freeze of lld-link we accidentally encountered when activating asan on a large project, without linking with the asan library.
In that case, __asan_report_load8 was referenced more than 2 million times, causing the computation of that many display strings, of which only 3 were used.
Differential Revision: https://reviews.llvm.org/D83510
The `intrinsics_gen` target exists in the CMake exports since r309389
(see LLVMConfig.cmake.in), hence projects can depend on `intrinsics_gen`
even it they are built separately from LLVM.
Reviewed By: MaskRay, JDevlieghere
Differential Revision: https://reviews.llvm.org/D83454
Previously, lld would crash if the .pdata size was not an even multiple
of the expected .pdata entry size. This makes it error gracefully instead.
(We hit this in Chromium due to an assembler problem: https://crbug.com/1101577)
Differential revision: https://reviews.llvm.org/D83479
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
This reduces peak memory on my test case from 1960.14MB to 1700.63MB
(-260MB, -13.2%) with no measurable impact on CPU time. I'm currently
working with a publics stream that is about 277MB. Before this change,
we would allocate 277MB of heap memory, serialize publics into them,
hold onto that heap memory, open the PDB, and commit into it. After
this change, we defer the serialization until commit time.
In the last change I made to public writing, I re-sorted the list of
publics multiple times in place to avoid allocating new temporary data
structures. Deferring serialization until later requires that we don't
reorder the publics. Instead of sorting the publics, I partially
construct the hash table data structures, store a publics index in them,
and then sort the hash table data structures. Later, I replace the index
with the symbol record offset.
This change also addresses a FIXME and moves the list of global and
public records from GSIHashStreamBuilder to GSIStreamBuilder. Now that
publics aren't being serialized, it makes even less sense to store them
as a list of CVSymbol records. The hash table used to deduplicate
globals is moved as well, since that is specific to globals, and not
publics.
Reviewed By: aganea, hans
Differential Revision: https://reviews.llvm.org/D81296
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
First, do not reserve numSections in the Chunks array. In cases where
there are many non-prevailing sections, this will overallocate memory
which will not be used.
Second, free the memory for sparseChunks after initializeSymbols. After
that, it is never used.
This saves 50MB of 627MB for my use case without affecting performance.
The inlinees section contains references to the file checksum table. The
file checksum table in the PDB must have the same layout as the file
checksum table in the object file, so all the existing file id
references should stay valid.
Previously, we would do this:
for all inlined functions:
- lookup filename from checksum and string table
- make that filename absolute
- look up the new file id for that filename up in the new checksum
table
This lead to pdbMakeAbsolute and remove_dots ending up in the hot path.
We should only need to absolutify the source path once, not once every
time we process an inline function from that source file.
This speeds up linking chrome PGO stage 1 net_unittests.exe from 9.203s
to 8.500s (-7.6%). Looking just at time to process symbol records, it
goes from ~2000ms to ~1300ms, which is consistent with the overall
speedup of about 700ms. This will be less noticeable in debug builds,
which have fewer inlined functions records.
Summary:
This is a pre-requisite to parallelizing PDB symbol and type merging.
Currently this timer usage would not be thread safe.
Reviewers: aganea, MaskRay
Subscribers: jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80298
This paves the way to doing more things in parallel, and allows us to
order type sources in dependency order. PDBs and PCH objects have to be
loaded before object files which use them.
This is a rebase of the unapplied remaining changes in
https://reviews.llvm.org/D59226. I found it very challenging to rebase
this across the LLD variable name style change. I recall there was a
tool for that, but I didn't take the time to use it.
Reviewers: aganea, akhuang
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79672
Allow disabling either the full auto import feature, or just
forbidding the cases that require runtime fixups.
As long as all auto imported variables are referenced from separate
.refptr$<name> sections, we can alias them on top of the IAT entries
and don't actually need any runtime fixups via pseudo relocations.
LLVM generates references to variables in .refptr stubs, if it
isn't known that the variable for sure is defined in the same object
module. Runtime pseudo relocs are needed if the addresses of auto
imported variables are used in constant initializers though.
Fixing up runtime pseudo relocations requires the use of
VirtualProtect (which is disallowed in WinStore/UWP apps) or
VirtualProtectFromApp. To allow any risk of ambiguity, allow
rejecting cases that would require this at the linker stage.
This adds support for the --disable-runtime-pseudo-reloc and
--disable-auto-import options in the MinGW driver (matching GNU ld.bfd)
with corresponding lld private options in the COFF driver.
Differential Revision: https://reviews.llvm.org/D78923
This fixes an accidental breakage of exporting symbols using def
files, when the symbol name contains a period, since commit
0ca06f7950, mixing up a symbol name containing a period with
the case of exporting a symbol as a forward to another dll.
Differential Revision: https://reviews.llvm.org/D79619
I noticed that std::error_code() does one-time initialization. Avoid
that overhead with Expected<T> and llvm::Error. Also, it is consistent
with the virtual interface and ELF, and generally cleaner.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D79643
Reduces time to link PGO instrumented net_unittets.exe by 11% (9.766s ->
8.672s, best of three). Reduces peak memory by 65.7MB (2142.71MB ->
2076.95MB).
Use a more compact struct, BulkPublic, for faster sorting. Sort in
parallel. Construct the hash buckets in parallel. Try to use one vector
to hold all the publics instead of copying them from one to another.
Allocate all the memory needed to serialize publics up front, and then
serialize them in place in parallel.
Reviewed By: aganea, hans
Differential Revision: https://reviews.llvm.org/D79467
Before this patch, the debug record S_GTHREAD32 which represents global thread_local symbols, was emitted by LLD into the respective module stream. This makes Visual Studio unable to display thread_local symbols in the debugger.
After this patch, S_GTHREAD32 is moved into the globals stream. This matches MSVC behavior.
Differential Revision: https://reviews.llvm.org/D79005
Essentially takes the lld/Common/Threads.h wrappers and moves them to
the llvm/Support/Paralle.h algorithm header.
The changes are:
- Remove policy parameter, since all clients use `par`.
- Rename the methods to `parallelSort` etc to match LLVM style, since
they are no longer C++17 pstl compatible.
- Move algorithms from llvm::parallel:: to llvm::, since they have
"parallel" in the name and are no longer overloads of the regular
algorithms.
- Add range overloads
- Use the sequential algorithm directly when 1 thread is requested
(skips task grouping)
- Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
using any other parameter, and it made overload resolution hard for
for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.
Remove Threads.h and update LLD for that.
This is a prerequisite for parallel public symbol processing in the PDB
library, which is in LLVM.
Reviewed By: MaskRay, aganea
Differential Revision: https://reviews.llvm.org/D79390
Summary:
That unless the user requested an output object (--lto-obj-path), the an
unused empty combined module is not emitted.
This changed is helpful for some target (ex. RISCV-V) which encoded the
ABI info in IR module flags (target-abi). Empty unused module has no ABI
info so the linker would get the linking error during merging
incompatible ABIs.
Reviewers: tejohnson, espindola, MaskRay
Subscribers: emaste, inglorion, arichardson, hiraditya, simoncook, MaskRay, steven_wu, dexonsmith, PkmX, dang, lenary, s.egerton, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78988
Heap profiling with ETW shows that LLD performs 4,053,721 heap
allocations over its lifetime, and ~800,000 of them come from
assocEquals. These vectors are created just to do a comparison, so fuse
the comparison into the loop and avoid the allocation.
ICF is overall a small portion of the time spent linking, and I did not
measure overall throughput improvements from this change above the noise
threshold. However, these show up in the heap profiler, and the work is
done, so we might as well land it if the code is clear enough.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D79297
This particular overload allocates memory, and we do this for every
S_[GL]PROC32_ID record. Instead, hardcode the offset of the typeindex
that we are looking for in the LF_[MEM]FUNC_ID record. We already
assumed that looking up the item index already found a record of this
kind.
Otherwise an ArgumentParser is constructed for every directive section,
and that involves copying the entire table of options into a vector.
There is no need for this, just have one option table.
This generalizes the main Windows command line tokenizer to be able to
produce StringRef substrings as well as freshly copied C strings. The
implementation is still shared with the normal tokenizer, which is
important, because we have unit tests for that.
.drective sections can be very long. They can potentially list up to
every symbol in the object file by name. It is worth avoiding these
string copies.
This saves a lot of memory when linking chrome.dll with PGO
instrumentation:
BEFORE AFTER % IMP
peak memory: 6657.76MB 4983.54MB -25%
real: 4m30.875s 2m26.250s -46%
The time improvement may not be real, my machine was noisy while running
this, but that the peak memory usage improvement should be real.
This change may also help apps that heavily use dllexport annotations,
because those also use linker directives in object files. Apps that do
not use many directives are unlikely to be affected.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D79262
This speeds up linking chrome.dll with PGO instrumentation by 13%
(154271ms -> 134033ms).
LLVM's Option library is very slow. In particular, it allocates at least
one large-ish heap object (Arg) for every argument. When PGO
instrumentation is enabled, all the __profd_* symbols are added to the
@llvm.used list, which compiles down to these /INCLUDE: directives. This
means we have O(#symbols) directives to parse in the section, so we end
up allocating an Arg for every function symbol in the object file. This
is unnecessary.
To address the issue and speed up the link, extend the fast path that we
already have for /EXPORT:, which has similar scaling issues.
I promise that I took a hard look at optimizing the Option library, but
its data structures are very general and would need a lot of cleanup. We
have accumulated lots of optional features (option groups, aliases,
multiple values) over the years, and these are now properties of every
parsed argument, when the vast majority of arguments do not use these
features.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D78845
Use the unique filenames that are used when /lldsavetemps is passed.
After this change, module names for LTO blobs in PDBs will be unique.
Visual Studio and probably other debuggers expect module names to be
unique.
Revert some changes from 1e0b158db (2017) that are no longer necessary
after removing MSVC LTO support.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D78221
The alignment of ARM64 range extension thunks was fixed in
7c81649219, but ARM range extension thunks, and import
and delay import thunks also need aligning (like all code on ARM
platforms).
I'm adding a test for alignment of ARM64 import thunks - not
specifically adding tests for misalignment of all of them though.
Differential Revision: https://reviews.llvm.org/D77796
Summary:
/PDBSTREAM:<name>=<file> adds the contents of <file> to stream <name> in the resulting PDB.
This allows native uses with workflows that (for example) add srcsrv streams to PDB files to provide a location for the build's source files.
Results should be equivalent to linking with lld-link, then running Microsoft's pdbstr tool with the command line:
pdbstr.exe -w -p:<PDB LOCATION> -s:<name> -i:<file>
except in cases where the named stream overlaps with a default named stream, such as "/names". In those cases, the added stream will be overridden, making the /pdbstream option a no-op.
Reviewers: thakis, rnk
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D77310
--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.
There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.
--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.
If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.
There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.
Reviewed By: rnk, aganea
Differential Revision: https://reviews.llvm.org/D76885
As of a while ago, lld groups all undefined references to a single
symbol in a single diagnostic. Back then, I made it so that we
print up to 10 references to each undefined symbol.
Having used this for a while, I never wished there were more
references, but I sometimes found that this can print a lot of
output. lld prints up to 10 diagnostics by default, and if
each has 10 references (which I've seen in practice), and each
undefined symbol produces 2 (possibly very long) lines of output,
that's over 200 lines of error output.
Let's try it with just 3 references for a while and see how
that feels in practice.
Differential Revision: https://reviews.llvm.org/D77017
DWARF sections are typically live and not COMDAT, so they would be
treated as GC roots. Enabling DWARF would essentially keep all code with
debug info alive, preventing any section GC.
Fixes PR45273
Reviewed By: mstorsjo, MaskRay
Differential Revision: https://reviews.llvm.org/D76935
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.
One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.
When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).
Differential Revision: https://reviews.llvm.org/D75153
Added support for /map and /map:[filepath].
The output was derived from Microsoft's Link.exe output when using that same option.
Note that /MAPINFO support was not added.
The previous implementation of MapFile.cpp/.h was meant for /lldmap, and was renamed to LLDMapFile.cpp/.h
MapFile.cpp/.h is now for /MAP
However, a small fix was added to lldmap, replacing a std::sort with std::stable_sort to enforce reproducibility.
Differential Revision: https://reviews.llvm.org/D70557
Instead, use `using namespace lld(::coff)`, and fully qualify the names
of free functions where they are defined in cpp files.
This effectively reverts d79c3be618 to follow the new style guide added
in 236fcbc21a.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D74882
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
When annotating a symbol with __declspec(selectany), Clang assigns it
comdat 2 while GCC assigns it comdat 3. This patch enables two object
files that contain a __declspec(selectany) symbol, one created by gcc
and the other by clang, to be linked together instead of issuing a
duplicate symbol error.
Differential Revision: https://reviews.llvm.org/D73139
RangeExtensionThunkARM64 is created for out-of-range branches on Windows ARM64
because branch instructions has limited bits to encode target address.
Currently, RangeExtensionThunkARM64 is appended to its referencing COFF section
from object file at link time without any alignment requirement, so if size of
the preceding COFF section is not aligned to instruction boundary (4 bytes),
RangeExtensionThunkARM64 will emit thunk instructions at unaligned address
which is never a valid branch target on ARM64, and usually triggers invalid
instruction exception when branching to it.
This PR fixes it by requiring such thunks to align at 4 bytes.
Differential revision: https://reviews.llvm.org/D72473
down to pass builder in ltobackend.
Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang
are not passed down to pass builder in ltobackend when new pass manager is
used. This is inconsistent with the behavior when new pass manager is used
and thinlto is not used. Such inconsistency causes slp vectorization pass
not being enabled in ltobackend for O3 + thinlto right now. This patch
fixes that.
Differential Revision: https://reviews.llvm.org/D72386
Both MS link.exe and GNU ld.bfd handle it this way; one can have
multiple object files defining the same absolute symbols, as long
as it defines it to the same value. But if there are multiple absolute
symbols with differing values, it is treated as an error.
Differential Revision: https://reviews.llvm.org/D71981
Summary:
I used this information to motivate splitting up the Intrinsic::ID enum
(5d986953c8) and adding a key method to
clang::Sema (586f65d31f) which saved a
fair amount of object file size.
Example output for clang.pdb:
Top 10 types responsible for the most TPI input bytes:
index total bytes count size
0x3890: 8,671,220 = 1,805 * 4,804
0xE13BE: 5,634,720 = 252 * 22,360
0x6874C: 5,181,600 = 408 * 12,700
0x2A1F: 4,520,528 = 1,574 * 2,872
0x64BFF: 4,024,020 = 469 * 8,580
0x1123: 4,012,020 = 2,157 * 1,860
0x6952: 3,753,792 = 912 * 4,116
0xC16F: 3,630,888 = 633 * 5,736
0x69DD: 3,601,160 = 985 * 3,656
0x678D: 3,577,904 = 319 * 11,216
In this case, we can see that record 0x3890 is responsible for ~8MB of
total object file size for objects in clang.
The user can then use llvm-pdbutil to find out what the record is:
$ llvm-pdbutil dump -types -type-index 0x3890
Types (TPI Stream)
============================================================
Showing 1 records.
0x3890 | LF_FIELDLIST [size = 4804]
- LF_STMEMBER [name = `WORDTYPE_MAX`, type = 0x1001, attrs = public]
- LF_MEMBER [name = `U`, Type = 0x37F0, offset = 0, attrs = private]
- LF_MEMBER [name = `BitWidth`, Type = 0x0075 (unsigned), offset = 8, attrs = private]
- LF_METHOD [name = `APInt`, # overloads = 8, overload list = 0x3805]
...
In this case, we can see that these are members of the APInt class,
which is emitted in 1805 object files.
The next largest type is ASTContext:
$ llvm-pdbutil dump -types -type-index 0xE13BE bin/clang.pdb
0xE13BE | LF_FIELDLIST [size = 22360]
- LF_BCLASS
type = 0x653EA, offset = 0, attrs = public
- LF_MEMBER [name = `Types`, Type = 0x653EB, offset = 8, attrs = private]
- LF_MEMBER [name = `ExtQualNodes`, Type = 0x653EC, offset = 24, attrs = private]
- LF_MEMBER [name = `ComplexTypes`, Type = 0x653ED, offset = 48, attrs = private]
- LF_MEMBER [name = `PointerTypes`, Type = 0x653EE, offset = 72, attrs = private]
...
ASTContext only appears 252 times, but the list of members is long, and
must be repeated everywhere it is used.
This was the output before I split Intrinsic::ID:
Top 10 types responsible for the most TPI input:
0x686C: 69,823,920 = 1,070 * 65,256
0x686D: 69,819,640 = 1,070 * 65,252
0x686E: 69,819,640 = 1,070 * 65,252
0x686B: 16,371,000 = 1,070 * 15,300
...
These records were all lists of intrinsic enums.
Reviewers: MaskRay, ruiu
Subscribers: mgrang, zturner, thakis, hans, akhuang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71437
One instance looks like a false positive:
lld/ELF/Relocations.cpp:1622:14: note: use reference type 'const std::pair<ThunkSection *, uint32_t> &' (aka 'cons
t pair<lld:🧝:ThunkSection *, unsigned int> &') to prevent copying
for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections)
It is not changed in this commit.
Previously this caused crashes in the reportDuplicate method.
A DefinedAbsolute doesn't have any InputFile attached to it, so we
can't report the file for the original symbol.
We could add an InputFile argument to SymbolTable::addAbsolute
only for the sake of error reporting, but even then it'd be assymetrical,
only pointing out the file containing the new conflicting definition,
not the original one.
Differential Revision: https://reviews.llvm.org/D71679
Remove the lld::enableColors function, as it just obscures which
stream it's affecting, and replace with explicit calls to the stream's
enable_colors.
Also, assign the stderrOS and stdoutOS globals first in link function,
just to ensure nothing might use them.
(Either change individually fixes the issue of using the old
stream, but both together seems best.)
Follow-up to b11386f9be.
Differential Revision: https://reviews.llvm.org/D70492
In lld we rarely use std::unique_ptr but instead allocate new instances
using lld::make<T>() so that they are deallocated at the end of linking.
This patch changes existing code so that that follows the convention.
Differential Revision: https://reviews.llvm.org/D70420
This change is for those who use lld as a library. Context:
https://reviews.llvm.org/D70287
This patch adds a new parmeter to lld::*::link() so that we can pass
an raw_ostream object representing stdout. Previously, lld::*::link()
took only an stderr object.
Justification for making stdoutOS and stderrOS mandatory: I wanted to
make link() functions to take stdout and stderr in that order.
However, if we change the function signature from
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stderrOS = llvm::errs());
to
bool link(ArrayRef<const char *> args, bool canExitEarly,
raw_ostream &stdoutOS = llvm::outs(),
raw_ostream &stderrOS = llvm::errs());
, then the meaning of existing code that passes stderrOS silently
changes (stderrOS would be interpreted as stdoutOS). So, I chose to
make existing code not to compile, so that developers can fix their
code.
Differential Revision: https://reviews.llvm.org/D70292
/align is not supposed to be used without /driver, so it makes sense
to warn if only /align is passed. MSVC link.exe warns on this too.
Differential Revision: https://reviews.llvm.org/D70163
This broke in 51dcb292cc, "[lld-link] diagnose undefined symbols
before LTO when possible" (very soon after the 9.0 branch, so
luckily the 9.0 release is unaffected).
The code for loading objects we believe might be needed for autoimport
(loadMinGWAutomaticImports()) does run before the new
reportUnresolvable() function, but it had a condition to only operate
on symbols from regular object files. This condition came from
resolveRemainingUndefines(), but as loadMinGWAutomaticImports() now
has to operate before the LTO, it has to operate on undefineds from
LTO objects as well.
Differential Revision: https://reviews.llvm.org/D70166
Recent versions of Microsoft's dumpbin tool cannot handle such PE files.
LLVM tools and GNU tools can, and use this to encode long section names
like ".debug_info", which is commonly used for DWARF. Don't do this in
mingw mode or when -debug:dwarf is passed, since the user probably wants
long section names for DWARF sections.
PR43754
Reviewers: ruiu, mstorsjo
Differential Revision: https://reviews.llvm.org/D69594
Also mention "basename" and "dirname" in Path.h since I tried
to find these functions by looking for these strings. It might
help others find them faster if the comments contain these strings.
No behavior change.
Differential Revision: https://reviews.llvm.org/D69458
As we now have code that parses the dwarf info for variable locations,
we can use that instead of relying on the higher level Symbolizer library,
reducing the previous two different dwarf codepaths into one.
Differential Revision: https://reviews.llvm.org/D69198
llvm-svn: 375391
This fixes the second part of PR42407.
For files with dwarf debug info, it manually loads and iterates
.debug_info to find the declared location of variables, to allow
reporting them. (This matches the corresponding code in the ELF
linker.)
For functions, it uses the existing getFileLineDwarf which uses
LLVMSymbolizer for translating addresses to file lines.
In object files with codeview debug info, only the source location
of duplicate functions is printed. (And even there, only for the
first input file. The getFileLineCodeView function requires the
object file to be fully loaded and initialized to properly resolve
source locations, but duplicate symbols are reported at a stage when
the second object file isn't fully loaded yet.)
Differential Revision: https://reviews.llvm.org/D68975
llvm-svn: 375218
This makes use of it slightly clearer, and makes it match the
same construct in the lld ELF linker.
Differential Revision: https://reviews.llvm.org/D68935
llvm-svn: 374869
A common pattern in Windows is to have all your precompiled headers
use an object named stdafx.obj. If you've got a project with many
different static libs, you might use a separate PCH for each one of
these.
During the final link step, a file from A might reference the PCH
object from A, but it will have the same name (stdafx.obj) as any
other PCH from another project. The only difference will be the
path. For example, A might be A/stdafx.obj while B is B/stdafx.obj.
The existing algorithm checks only the filename that was passed on
the command line (or stored in archive), but this is insufficient in
the case where relative paths are used, because depending on the
command line object file / library order, it might find the wrong
PCH object first resulting in a signature mismatch.
The fix here is to simply check whether the absolute path of the
PCH object (which is stored in the input obj file for the file that
references the PCH) *ends with* the full relative path of whatever
is specified on the command line (or is in the archive).
Differential Revision: https://reviews.llvm.org/D66431
llvm-svn: 374442
Similar to D67323, but for COFF. Many lld/COFF/ files already use
`namespace lld { namespace coff {`. Only a few need changing.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D68772
llvm-svn: 374314
David added the JamCRC implementation in r246590. More recently, Eugene
added a CRC-32 implementation in r357901, which falls back to zlib's
crc32 function if present.
These checksums are essentially the same, so having multiple
implementations seems unnecessary. This replaces the CRC-32
implementation with the simpler one from JamCRC, and implements the
JamCRC interface in terms of CRC-32 since this means it can use zlib's
implementation when available, saving a few bytes and potentially making
it faster.
JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef.
This patch changes it to ArrayRef<uint8_t> which I think is the best
choice, and simplifies a few of the callers nicely.
Differential revision: https://reviews.llvm.org/D68570
llvm-svn: 374148
Fixes assert in addLinkerModuleCoffGroup() when using by-ordinal imports
only.
Patch by Stefan Schmidt.
Differential revision: https://reviews.llvm.org/D68352
llvm-svn: 374140
This patch adds /reproduce:<path> option to lld/COFF. This is an
lld-specific option, so we can name it freely. I chose /reproduce
over other names (e.g. /lldlinkrepro) for consistency with other lld
ports.
Differential Revision: https://reviews.llvm.org/D68381
llvm-svn: 373704
This reverts commit r371729 because /linkrepro option also exists
in Microsoft link.exe and their linker takes not a filename but a
directory name as an argument for /linkrepro.
Differential Revision: https://reviews.llvm.org/D68378
llvm-svn: 373703
This is useful for enforcing that builds are independent of the
environment; it can be used when all system library paths are added
via /libpath: already. It's similar ot cl.exe's /X flag.
Since it should also affect %LINK% (the other caller of
`Process::GetEnv` in lld/COFF), the early-option-parsing needs
to move around a bit. The options are:
- Add a manual loop over the argv ArrayRef and look for "/lldignoreenv".
This repeats the name of the flag in both Options.td and in
DriverUtils.cpp.
- Add yet another table.ParseArgs() call just for /lldignoreenv before
adding %LINK%.
- Use the existing early ParseArgs() that's there for --rsp-quoting and use
it for /lldignoreenv for %LINK% as well. This means --rsp-quoting
and /lldignoreenv can't be passed via %LINK%.
I went with the third approach.
Differential Revision: https://reviews.llvm.org/D67456
llvm-svn: 371852
from thin archives.
Currently lld adds the archive name to MemoryBufferRef identifiers in order to
ensure they are unique. For thin archives, since the file name is already unique and we
want to keep the original path to the file, don't add the archive name.
Differential Revision: https://reviews.llvm.org/D67295
llvm-svn: 371778
This makes lld-link behave like ld.lld. I don't see a reason for
the two drivers to have different behavior here.
While here, also make lld-link add a version.txt to the tar, like
ld.lld does.
Differential Revision: https://reviews.llvm.org/D67461
llvm-svn: 371729
Patch by Markus Böck.
PR42951: When linking an archive with members that have the same name linking
fails when using the -wholearchive option. This patch passes the index
of the member in the archive to the offset parameter to disambiguate the
member.
Differential Revision: https://reviews.llvm.org/D66239
llvm-svn: 371509
In mingw environments, resources are normally compiled to resource
object files directly, instead of letting the linker convert them to
COFF format.
Since some time, GCC supports the notion of a default manifest object.
When invoking the linker, GCC looks for the default manifest object
file, and if found in the expected path, it is added to linker commands.
The default manifest is one that indicates support for the latest known
versions of windows, to implicitly unlock the modern behaviours of certain
APIs.
Not all mingw/gcc distributions include this file, but e.g. in msys2,
the default manifest object is distributed in a separate package (which
can be but might not always be installed).
This means that even if user projects only use one single resource
object file, the linker can end up with two resource object files,
and thus needs to support merging them.
The default manifest has a language id of zero, and GNU ld has got
logic for dropping a manifest with a zero language id, if there's
another manifest present with a nonzero language id. If there are
multiple manifests with a nonzero language id, the merging process
errors out.
Differential Revision: https://reviews.llvm.org/D66825
llvm-svn: 370974
Summary:
This is a re-land of r370487 with a fix for the use-after-free bug
that rev contained.
This implements -start-lib and -end-lib flags for lld-link, analogous
to the similarly named options in ld.lld. Object files after
-start-lib are included in the link only when needed to resolve
undefined symbols. The -end-lib flag goes back to the normal behavior
of always including object files in the link. This mimics the
semantics of static libraries, but without needing to actually create
the archive file.
Reviewers: ruiu, smeenai, MaskRay
Reviewed By: ruiu, MaskRay
Subscribers: akhuang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66848
llvm-svn: 370816
Summary:
This implements -start-lib and -end-lib flags for lld-link, analogous
to the similarly named options in ld.lld. Object files after
-start-lib are included in the link only when needed to resolve
undefined symbols. The -end-lib flag goes back to the normal behavior
of always including object files in the link. This mimics the
semantics of static libraries, but without needing to actually create
the archive file.
Reviewers: ruiu, smeenai, MaskRay
Reviewed By: ruiu, MaskRay
Subscribers: akhuang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66848
llvm-svn: 370487
Extend WindowsResourceParser to support using a ResourceSectionRef for
loading resources from an object file.
Only allow merging resource object files in mingw mode; keep the
existing error on multiple resource objects in link mode.
If there only is one resource object file and no .res resources,
don't parse and recreate the .rsrc section, but just link it in without
inspecting it. This allows users to produce any .rsrc section (outside
of what the parser supports), just like before. (I don't have a specific
need for this, but it reduces the risk of this new feature.)
Separate out the .rsrc section chunks in InputFiles.cpp, and only include
them in the list of section chunks to link if we've determined that there
only was one single resource object. (We need to keep other chunks from
those object files, as they can legitimately contain other sections as
well, in addition to .rsrc section chunks.)
Differential Revision: https://reviews.llvm.org/D66824
llvm-svn: 370436
Summary:
This adds the -lto-obj-path option to lld-link. This can be
used to specify a path at which to write a native object file for
the full LTO part when using LTO unit splitting.
Reviewers: ruiu, tejohnson, pcc, rnk
Reviewed By: ruiu, rnk
Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65964
llvm-svn: 369559
This avoids producing an output file if errors appeared late in the
linking process (e.g. while fixing relocations, or as in the test,
while checking for multiple resources). If an output file is produced,
build tools might not retry building it on rebuilds, even if a previous
build failed due to the error return code.
Differential Revision: https://reviews.llvm.org/D66491
llvm-svn: 369445
This avoids confusing contextless error messages such as "No such file
or directory" if e.g. the pdb output file should be written to a
nonexistent directory. (This can happen with linkrepro scripts, at least
old ones.)
Differential Revision: https://reviews.llvm.org/D66466
llvm-svn: 369425
This is used by Wine for manually crafting export tables.
If the input object contains .edata sections, GNU ld references them
in the export directory instead of synthesizing an export table using
either export directives or the normal auto export mechanism. (AFAIK,
historically, way way back, GNU ld didn't support synthesizing the
export table - one was supposed to generate it using dlltool and link
it in instead.)
If faced with --out-implib and --output-def, GNU ld still populates
those output files with the same export info as it would have generated
otherwise, disregarding the input .edata. As this isn't an intended
usage combination, I'm not adding checks for that in tests.
Differential Revision: https://reviews.llvm.org/D65903
llvm-svn: 369358
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368936
These symbols actually point to the symbol's IAT entry, which
obviously is different from the symbol itself (which is imported
from a different module and doesn't exist in the current one).
Omitting this symbol helps gdb inspect automatically imported
symbols, see https://sourceware.org/bugzilla/show_bug.cgi?id=24574
for discussion on the matter.
Surprisingly, those extra symbols don't seem to be an issue for
gdb when the sources have been built with clang, only with gcc.
The actual logic in gdb that this depends on still is unknown, but
omitting these symbols from the symbol table is the right thing to
do in any case.
Differential Revision: https://reviews.llvm.org/D65727
llvm-svn: 367836
This avoids a spurious and confusing log message in cases where
both e.g. "alias" and "__imp_alias" exist.
Differential Revision: https://reviews.llvm.org/D65598
llvm-svn: 367673
1. raw_ostream supports ANSI colors so that you can write messages to
the termina with colors. Previously, in order to change and reset
color, you had to call `changeColor` and `resetColor` functions,
respectively.
So, if you print out "error: " in red, for example, you had to do
something like this:
OS.changeColor(raw_ostream::RED);
OS << "error: ";
OS.resetColor();
With this patch, you can write the same code as follows:
OS << raw_ostream::RED << "error: " << raw_ostream::RESET;
2. Add a boolean flag to raw_ostream so that you can disable colored
output. If you disable colors, changeColor, operator<<(Color),
resetColor and other color-related functions have no effect.
Most LLVM tools automatically prints out messages using colors, and
you can disable it by passing a flag such as `--disable-colors`.
This new flag makes it easy to write code that works that way.
Differential Revision: https://reviews.llvm.org/D65564
llvm-svn: 367649
The Archive object created when loading an archive specified with
wholearchive got cleaned up immediately, when the owning std::unique_ptr
went out of scope, even if persisted StringRefs pointed to memory that
belonged to the archive, which no longer was mapped in memory.
This hasn't been an issue with regular (as opposed to thin) archives,
as references to the member objects has kept the mapping for the whole
archive file alive - but with thin archives, all such references point
to other files.
Add the std::unique_ptr to the arena allocator, to retain it as long
as necessary.
This fixes (the last issue raised in) PR42388.
Differential Revision: https://reviews.llvm.org/D65565
llvm-svn: 367599
Summary:
This allows reporting undefined symbols before LTO codegen is
run. Since LTO codegen can take a long time, this improves user
experience by avoiding that time spend if the link is going to
fail with undefined symbols anyway.
Fixes PR32400.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: mehdi_amini, steven_wu, dexonsmith, mstorsjo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62434
llvm-svn: 367136
This ports r366573 from COFF to ELF.
There are now to toString(Archive::Symbol), one doing MSVC demangling
in COFF and one doing Itanium demangling in ELF, so rename these two
to toCOFFString() and to toELFString() to not get a duplicate symbol.
Nothing ever passes a raw Archive::Symbol to CHECK(), so these not
being part of the normal toString() machinery seems ok.
There are two code paths in the ELF linker that emits this type of
diagnostic:
1. The "normal" one in InputFiles.cpp. This is covered by the tweaked test.
2. An additional one that's only used for libcalls if there's at least
one bitcode in the link, and if the libcall symbol is lazy, and
lazily loaded from an archive (i.e. not from a lazy .o file).
(This code path was added in r339301.) Since all libcall names so far
are C symbols and never mangled, the change there is not observable
and hence not covered by tests.
Differential Revision: https://reviews.llvm.org/D65095
llvm-svn: 366836
Code built for mingw with -fdata-sections will store each TLS variable
in a comdat section, named .tls$$<varname>. Normal TLS variables are
stored in sections named .tls$ with a trailing dollar, which are
sorted after a starter marker (in a later linked object file) in a
section named ".tls" (with no dollar suffix), before an ending marker
in a section named ".tls$ZZZ".
The mingw comdat section suffix stripping introduced in SVN r363457
broke sorting of such tls sections, ending up sorting the stripped
.tls$$<varname> sections (stripped to ".tls") before the start marker
in the section named ".tls".
We could add exceptions to the section name suffix stripping for
.tls (and .CRT, where suffixes always should be honored), but the
more conservative option is probably the reverse; to only apply the
stripping for the normal sections where sorting shouldn't have any
effect.
Differential Revision: https://reviews.llvm.org/D65018
llvm-svn: 366780