Commit Graph

35 Commits

Author SHA1 Message Date
Martin Storsjö 3c6f8ca7c9 [lld] Rename StringRef _lower() method calls to _insensitive() 2021-06-25 00:22:01 +03:00
Alexandre Ganea 2b9b9652ce [LLD][COFF] Reduce the maximum size of the GHASH table
Before this patch, the maximum size of the GHASH table was 2^31 buckets. However we were storing the bucket index into a TypeIndex which has an input limit of (2^31)-4095 indices, see this link. Any value above that limit will improperly set the TypeIndex's high bit, which is interpreted as DecoratedItemIdMask. This used to cause bad indices on extraction when calling TypeIndex::toArrayIndex().

Differential Revision: https://reviews.llvm.org/D103297
2021-05-28 09:46:17 -04:00
Reid Kleckner 99f023656b [PDB] Fix ubsan complaint about memcpy from null pointer 2021-05-27 19:49:09 -07:00
Reid Kleckner e73203a561 [PDB] Check the type server guid when ghashing
Previously we simply didn't check this. Prereq to make the test suite
pass with ghash enabled by default.

Differential Revision: https://reviews.llvm.org/D102885
2021-05-20 16:36:12 -07:00
Alexandre Ganea e7a371f9fd [LLD][COFF] Avoid std::vector resizes during type merging
Consistently saves approx. 0.6 sec (out of 18 sec) on a large output (400 MB EXE, 2 GB PDB).

Differential Revision: https://reviews.llvm.org/D94555
2021-01-13 14:35:03 -05:00
Alexandre Ganea b14ad90b13 [LLD][COFF] Simplify function. NFC. 2021-01-07 22:37:11 -05:00
Alexandre Ganea eaadb41db6 [LLD][COFF] When using PCH.OBJ, ensure func_id records indices are remapped under /DEBUG:GHASH
Before this patch, when using LLD with /DEBUG:GHASH and MSVC precomp.OBJ files, we had a bunch of:

lld-link: warning: S_[GL]PROC32ID record in blabla.obj refers to PDB item index 0x206ED1 which is not a LF[M]FUNC_ID record

This was caused by LF_FUNC_ID and LF_MFUNC_ID which didn't have correct mapping to the corresponding TPI records. The root issue was that the indexMapStorage was improperly re-assembled in UsePrecompSource::remapTpiWithGHashes.

After this patch, /DEBUG and /DEBUG:GHASH produce exactly the same debug infos in the PDB.

Differential Revision: https://reviews.llvm.org/D93732
2021-01-07 17:27:13 -05:00
Nico Weber 9c6a884f67 fix typo to cycle bots 2020-12-12 20:16:14 -05:00
Nico Weber c8974af164 fix typos to cycle bots 2020-12-04 10:18:44 -05:00
Martin Storsjö d77d727339 [LLD] [COFF] Fix a ubsan error in pdb-type-server-missing.yaml
This error has been present since 5519e4da83.

Differential Revision: https://reviews.llvm.org/D89027
2020-10-12 23:28:23 +03:00
Alexandre Ganea 55b97a6d2a [LLD][COFF] Add more type record information to /summary
This adds the following two new lines to /summary:

      21351 Input OBJ files (expanded from all cmd-line inputs)
         61 PDB type server dependencies
         38 Precomp OBJ dependencies
 1420669231 Input type records         <<<<
78665073382 Input type records bytes   <<<<
    8801393 Merged TPI records
    3177158 Merged IPI records
      59194 Output PDB strings
   71576766 Global symbol records
   25416935 Module symbol records
    2103431 Public symbol records

Differential Revision: https://reviews.llvm.org/D88703
2020-10-02 09:36:11 -04:00
Alexandre Ganea 4140f0744f [LLD][COFF] Fix crash with /summary and PCH input files
Before this patch /summary was crashing with some .PCH.OBJ files, because tpiMap[srcIdx++] was reading at the wrong location. When the TpiSource depends on a .PCH.OBJ file, the types should be offset by the previously merged PCH.OBJ set of indices.

Differential Revision: https://reviews.llvm.org/D88678
2020-10-01 17:08:35 -04:00
Reid Kleckner 5d46d7e8b2 [PDB] Use one func id DenseMap instead of per-source maps, NFC
This avoids some DenseMap copies when /Zi is in use, and results in
fewer data structures.

Differential Revision: https://reviews.llvm.org/D88617
2020-10-01 12:22:27 -07:00
Reid Kleckner 5519e4da83 Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success
values.

This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..

Original commit message:
-----------------------------------------

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 15:44:38 -07:00
Reid Kleckner 8d250ac3cd Revert "[PDB] Merge types in parallel when using ghashing"
This reverts commit 49b3459930.
2020-09-30 14:55:32 -07:00
Reid Kleckner 49b3459930 [PDB] Merge types in parallel when using ghashing
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 14:22:48 -07:00
Reid Kleckner 1e5b7e91aa [PDB] Split TypeServerSource and extend type index map lifetime
Extending the lifetime of these type index mappings does increase memory
usage (+2% in my case), but it decouples type merging from symbol
merging. This is a pre-requisite for two changes that I have in mind:
- parallel type merging: speeds up slow type merging
- defered symbol merging: avoid heap allocating (relocating) all symbols

This eliminates CVIndexMap and moves its data into TpiSource. The maps
are also split into a SmallVector and ArrayRef component, so that the
ipiMap can alias the tpiMap for /Z7 object files, and so that both maps
can simply alias the PDB type server maps for /Zi files.

Splitting TypeServerSource establishes that all input types to be merged
can be identified with two 32-bit indices:
- The index of the TpiSource object
- The type index of the record
This is useful, because this information can be stored in a single
64-bit atomic word to enable concurrent hashtable insertion.

One last change is that now all object files with debugChunks get a
TpiSource, even if they have no type info. This avoids some null checks
and special cases.

Differential Revision: https://reviews.llvm.org/D87736
2020-09-17 11:53:10 -07:00
Reid Kleckner 1b88845ce1 [PDB] Drop LF_PRECOMP from debugTypes earlier
This is a minor simplification to avoid firing up a BinaryStreamReader
and CVType parser.
2020-09-15 18:50:37 -07:00
Benjamin Kramer 9a0689e072 Make helpers static. NFC. 2020-07-17 13:49:11 +02:00
Reid Kleckner 54a335a2f6 [COFF] Move type merging to TpiSource::mergeDebugT virtual method
This paves the way to doing more things in parallel, and allows us to
order type sources in dependency order. PDBs and PCH objects have to be
loaded before object files which use them.

This is a rebase of the unapplied remaining changes in
https://reviews.llvm.org/D59226. I found it very challenging to rebase
this across the LLD variable name style change. I recall there was a
tool for that, but I didn't take the time to use it.

Reviewers: aganea, akhuang

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79672
2020-05-14 09:47:00 -07:00
Reid Kleckner 8a310f40d0 Remove namespace lld { namespace coff { from COFF LLD cpp files
Instead, use `using namespace lld(::coff)`, and fully qualify the names
of free functions where they are defined in cpp files.

This effectively reverts d79c3be618 to follow the new style guide added
in 236fcbc21a.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D74882
2020-02-25 17:30:53 -08:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Rui Ueyama 47feae5dd6 Use lld::make<T> to make TpiSource objects
In lld we rarely use std::unique_ptr but instead allocate new instances
using lld::make<T>() so that they are deallocated at the end of linking.
This patch changes existing code so that that follows the convention.

Differential Revision: https://reviews.llvm.org/D70420
2019-11-20 13:14:44 +09:00
Fangrui Song d79c3be618 [COFF] Wrap definitions in namespace lld { namespace coff {. NFC
Similar to D67323, but for COFF. Many lld/COFF/ files already use
`namespace lld { namespace coff {`. Only a few need changing.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D68772

llvm-svn: 374314
2019-10-10 11:27:58 +00:00
Bob Haarman 7dc5e7a0a4 reland "[lld-link] implement -start-lib and -end-lib"
Summary:
This is a re-land of r370487 with a fix for the use-after-free bug
that rev contained.

This implements -start-lib and -end-lib flags for lld-link, analogous
to the similarly named options in ld.lld. Object files after
-start-lib are included in the link only when needed to resolve
undefined symbols. The -end-lib flag goes back to the normal behavior
of always including object files in the link. This mimics the
semantics of static libraries, but without needing to actually create
the archive file.

Reviewers: ruiu, smeenai, MaskRay

Reviewed By: ruiu, MaskRay

Subscribers: akhuang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66848

llvm-svn: 370816
2019-09-03 20:32:16 +00:00
Vlad Tsyrklevich 802aab5de8 Revert "[lld-link] implement -start-lib and -end-lib"
This reverts commit r370487 as it is causing ASan/MSan failures on
sanitizer-x86_64-linux-fast

llvm-svn: 370550
2019-08-30 23:24:41 +00:00
Bob Haarman fd7569c8e3 [lld-link] implement -start-lib and -end-lib
Summary:
This implements -start-lib and -end-lib flags for lld-link, analogous
to the similarly named options in ld.lld. Object files after
-start-lib are included in the link only when needed to resolve
undefined symbols. The -end-lib flag goes back to the normal behavior
of always including object files in the link. This mimics the
semantics of static libraries, but without needing to actually create
the archive file.

Reviewers: ruiu, smeenai, MaskRay

Reviewed By: ruiu, MaskRay

Subscribers: akhuang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66848

llvm-svn: 370487
2019-08-30 16:50:10 +00:00
Rui Ueyama 136d27ab4d [Coding style change][lld] Rename variables for non-ELF ports
This patch does the same thing as r365595 to other subdirectories,
which completes the naming style change for the entire lld directory.

With this, the naming style conversion is complete for lld.

Differential Revision: https://reviews.llvm.org/D64473

llvm-svn: 365730
2019-07-11 05:40:30 +00:00
Alexandre Ganea 4b7bdcd318 [LLD][COFF] Don't take into account the 'age' when looking for PDB type server
The age field is only there to say how many times an OBJ or a PDB was incrementally linked. It shouldn't be used to validate the link between the OBJ and the PDB.

Differential Revision: https://reviews.llvm.org/D62837

llvm-svn: 362572
2019-06-05 02:01:43 +00:00
Alexandre Ganea 9c78db6005 Re-land [LLD][COFF] Early load PDB type server files
We need to have all input files ready before doing debuginfo type merging.
This patch is moving the late PDB type server discovery much earlier in the process, when the explicit inputs (OBJs, LIBs) are loaded.
The short term goal is to parallelize type merging.

Differential Revision: https://reviews.llvm.org/D60095

llvm-svn: 362393
2019-06-03 12:39:47 +00:00
Alexandre Ganea ccc1fa5e1d Revert r361842 as it breaks LLDB :: tools/lldb-mi/exec/exec-finish.test
llvm-svn: 361876
2019-05-28 20:57:56 +00:00
Alexandre Ganea ebe22a1774 [LLD][COFF] Early load PDB type server files
We need to have all input files ready before doing debuginfo type merging.
This patch is moving the late PDB type server discovery much earlier in the process, when the explicit inputs (OBJs, LIBs) are loaded.
The short term goal is to parallelize type merging.

Differential Revision: https://reviews.llvm.org/D60095

llvm-svn: 361842
2019-05-28 15:35:23 +00:00
Matthew Voss 3c023420d1 [NFC][LLD] Specify namespaces explicity to fix build failure on GCC 5 after r357383
llvm-svn: 357421
2019-04-01 19:23:56 +00:00
Alexandre Ganea 30c2f20e55 Fix builder.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/24702/steps/check-fuzzer/logs/stdio

llvm-svn: 357391
2019-04-01 14:37:36 +00:00
Alexandre Ganea bf55c4e3e3 [LLD][COFF] Early dependency detection
We introduce a new class hierarchy for debug types merging (in DebugTypes.h). The end-goal is to parallelize the type merging - please see the plan in D59226.

Previously, dependency discovery was done on the fly, much later, during the type merging loop. Unfortunately, parallelizing the type merging requires the dependencies to be merged in first, before any dependent ObjFile, thus this early discovery.

The overall intention for this path is to discover debug information dependencies at a much earlier stage, when processing input files. Currently, two types of dependency are supported: PDB type servers (when compiling with MSVC /Zi) and precompiled headers OBJs (when compiling with MSVC /Yc and /Yu). Once discovered, an explicit link is added into the dependent ObjFile, through the new debug types class hierarchy introduced in DebugTypes.h.

Differential Revision: https://reviews.llvm.org/D59053

llvm-svn: 357383
2019-04-01 13:36:59 +00:00