Commit Graph

480 Commits

Author SHA1 Message Date
Martin Storsjö 3c6f8ca7c9 [lld] Rename StringRef _lower() method calls to _insensitive() 2021-06-25 00:22:01 +03:00
Reid Kleckner 8d84751ac4 Revert "[LLD] [COFF] Avoid doing repeated fuzzy symbol lookup for each iteration. NFC."
This reverts commit e1adf90826.

This appears to affect the way that C++ mangled symbols appear in the
import library when using a .def file that names a C++ free function
with no name decoration. I will follow up with a reduced test case
shortly.
2021-06-22 11:35:14 -07:00
Martin Storsjö e1adf90826 [LLD] [COFF] Avoid doing repeated fuzzy symbol lookup for each iteration. NFC.
This is run every time around in the main linker loop. Once a match
has been found, stop trying to rematch such a symbol.

Not sure if this has any actual measurable performance impact though
(SymbolTable::findMangle() iterates over the whole symbol table for
each call and does fuzzy matching on top of that) but this makes the
code more reassuring to read at least. (This is in practice run for def
files listing undecorated stdcall functions to be exported.)

Differential Revision: https://reviews.llvm.org/D104529
2021-06-19 22:32:37 +03:00
Reid Kleckner 109aac9212 [PDB] Enable parallel ghash type merging by default
Ghashing is probably going to be faster in most cases, even without
precomputed ghashes in object files.

Here is my table of results linking clang.pdb:

-------------------------------
| threads | GHASH   | NOGHASH |
-------------------------------
|  j1     | 51.031s | 25.141s |
|  j2     | 31.079s | 22.109s |
|  j4     | 18.609s | 23.156s |
|  j8     | 11.938s | 21.984s |
| j28     |  8.375s | 18.391s |
-------------------------------

This shows that ghashing is faster if at least four cores are available.
This may make the linker slower if most cores are busy in the middle of
a build, but in that case, the linker probably isn't on the critical
path of the build. Incremental build performance is arguably more
important than highly contended batch build link performance.

The -time output indicates that ghash computation is the dominant
factor:

    Input File Reading:             924 ms (  1.8%)
    GC:                             689 ms (  1.3%)
    ICF:                            527 ms (  1.0%)
    Code Layout:                    414 ms (  0.8%)
    Commit Output File:              24 ms (  0.0%)
    PDB Emission (Cumulative):    49938 ms ( 94.8%)
      Add Objects:                46783 ms ( 88.8%)
        Global Type Hashing:      38983 ms ( 74.0%)
        GHash Type Merging:        5640 ms ( 10.7%)
        Symbol Merging:            2154 ms (  4.1%)
      Publics Stream Layout:        188 ms (  0.4%)
      TPI Stream Layout:             18 ms (  0.0%)
      Commit to Disk:              2818 ms (  5.4%)
  --------------------------------------------------
  Total Link Time:                52669 ms (100.0%)

We can speed that up with a faster content hash (not SHA1).

Differential Revision: https://reviews.llvm.org/D102888
2021-05-27 14:19:36 -07:00
Martin Storsjö 33b71ec9c6 [LLD] [COFF] Fix automatic export of symbols from LTO objects
Differential Revision: https://reviews.llvm.org/D101569
2021-05-21 00:36:58 +03:00
Martin Storsjö 7e0768329c [LLD] [COFF] Fix including the personality function for DWARF EH when linking with --gc-sections
Since c579a5b1d9 we don't traverse
.eh_frame when doing GC. But the exception handling personality
function needs to be included, and is only referenced from within
.eh_frame.

Differential Revision: https://reviews.llvm.org/D102138
2021-05-12 22:23:01 +03:00
Alex Reinking 7ac3fcc526 Allow /STACK in #pragma comment(linker, ...)
The Halide project uses `#pragma comment(linker, "/STACK:...")` to set
the stack size high enough for our embedded compiler to run in end-user
programs on Windows.

Unfortunately, lld-link.exe breaks on this when embedded in a COFF
object, despite supporting the flag on the command line. MSVC's link.exe
supports this fine. This patch extends support for this to lld-link.exe
for better compatibility with MSVC projects.

Differential Revision: https://reviews.llvm.org/D99680
2021-05-05 16:00:33 -07:00
Martin Storsjö 82de4e0753 [LLD] [COFF] Actually include the exported comdat symbols
This is a followup to 2b01a417d7ccb001ccc1185ef5fdc967c9fac8d7;
previously the RVAs of the exported symbols from comdats were left
zero.

Thanks to Kleis Auke Wolthuizen for the fix suggestion and pointing
out the omission.

Differential Revision: https://reviews.llvm.org/D101615
2021-05-04 22:13:08 +03:00
Pengfei Wang 184377da5c [LLD] Implement /guard:[no]ehcont
Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99078
2021-04-14 15:06:49 +08:00
Abhina Sreeskantharajan c83cd8feef [NFC] Reordering parameters in getFile and getFileOrSTDIN
In future patches I will be setting the IsText parameter frequently so I will refactor the args to be in the following order. I have removed the FileSize parameter because it is never used.

```
  static ErrorOr<std::unique_ptr<MemoryBuffer>>
  getFile(const Twine &Filename, bool IsText = false,
          bool RequiresNullTerminator = true, bool IsVolatile = false);

  static ErrorOr<std::unique_ptr<MemoryBuffer>>
  getFileOrSTDIN(const Twine &Filename, bool IsText = false,
                 bool RequiresNullTerminator = true);

 static ErrorOr<std::unique_ptr<MB>>
 getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset,
            bool IsText, bool RequiresNullTerminator, bool IsVolatile);

  static ErrorOr<std::unique_ptr<WritableMemoryBuffer>>
  getFile(const Twine &Filename, bool IsVolatile = false);
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D99182
2021-03-25 09:47:49 -04:00
Yolanda Chen 4f9c61ef72 [lld] add context-sensitive PGO options for COFF.
Add lld CSPGO (Contex-Sensitive PGO) options for COFF target.

Reference the ELF options from https://reviews.llvm.org/D56675

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D98763
2021-03-24 23:40:09 -07:00
Zequan Wu 5bdc5e7efd [lld-link] Add safe icf mode to lld-link, which does safe icf for all sections.
Differential Revision: https://reviews.llvm.org/D97436
2021-03-03 14:52:33 -08:00
Martin Storsjö 075539ddf6 [LLD] [COFF] Allow invoking lib.exe mode via -lib in addition to /lib
Remove a stray -lib argument in guardcf-lto.ll; llvm-lib doesn't
support generating import libs from a def file unlike lib.exe.
Previously this worked because the -lib argument was ignored
(printing only a warning).

Differential Revision: https://reviews.llvm.org/D96699
2021-02-24 11:16:12 +02:00
Nico Weber e6d1f261a5 [lld-link] Add /reproduce: support for several flags
/reproduce: now works correctly with:
- /call-graph-ordering-file:
- /def:
- /natvis:
- /order:
- /pdbstream:

I went through all instances of MemoryBuffer::getFile() and made sure
everything that didn't already do so called takeBuffer().

For natvis, that wasn't possible since DebugInfo/PDB wants to take
owernship of the natvis buffer. For that case, I'm manually adding the
tar file entry.

/natvis: and /pdbstream: is slightly awkward, since createResponseFile()
always adds these flags to the response file but createPDB() (which
ultimately adds the files referenced by the flags) is only called if
/debug is also passed. So when using /natvis: without /debug with
/reproduce:, lld won't warn, but when linking using the response
file from the archive, it won't find the natvis file since it's not
in the tar. This isn't a new issue though, and after this patch things
at least work with using /natvis: _with_ debug with /reproduce:.
(Same for /pdbstream:)

Differential Revison: https://reviews.llvm.org/D97212
2021-02-22 16:52:49 -05:00
Reshabh Sharma fdd6ed8e93 [LLD] Rename lld port driver entry function to a consistent name
Libraries linked to the lld elf library exposes a function named main.
When debugging code linked to such libraries and intending to set a
breakpoint at main, the debugger also sets breakpoint at the main
function at lld elf driver. The possible choice was to rename it to
link but that would again clash with lld::*::link. This patch tries
to consistently rename them to linkerMain.

Differential Revision: https://reviews.llvm.org/D91418
2020-12-18 12:18:37 +05:30
Nico Weber f710bb7063 lld: Replace some lld::outs()s with message()
No behavior change.
2020-12-17 16:19:09 -05:00
Arthur Eubanks fed7565ee2 [COFF][LTO][NPM] Use NPM for LTO with ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER
Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D92866
2020-12-09 08:53:50 -08:00
Nico Weber a0994cbe27 lld-link: Let LLD_REPRODUCE control /reproduce:, like in ld.lld
Also sync help texts for the option between elf and coff ports.

Decisions:
- Do this even if /lldignoreenv is passed. /reproduce: does not affect
  the main output, and this makes the env var more convenient to use.
  (On the other hand, it's now possible to set this env var and forget
  about it, and all future builds in the same shell will be much slower.
  That's true for ld.lld, but posix shells have an easy way to set an
  env var for a single command; in cmd.exe this is not possible without
  contortions. Then again, lld-link runs in posix shells too.)

Original patch rebased across D68378 and D68381.

Differential Revision: https://reviews.llvm.org/D67707
2020-11-27 13:33:55 -05:00
rojamd b79e990f40 [lld][COFF] Add command line options for LTO with new pass manager
This is more or less a port of rL329598 (D45275) to the COFF linker.
Since there were already LTO-related settings under -opt:, I added
them there instead of new flags.

Differential Revision: https://reviews.llvm.org/D90624
2020-11-05 14:41:35 -05:00
Martin Storsjö 3785a413fe Reapply [LLD] [COFF] Implement a GNU/ELF like -wrap option
Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.

The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.

The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.

Differential Revision: https://reviews.llvm.org/D89004

Reapplied with the bitfield member canInline fixed so it doesn't break
builds targeting windows.
2020-10-15 22:14:02 +03:00
Arthur Eubanks 3d338f6813 Revert "[LLD] [COFF] Implement a GNU/ELF like -wrap option"
This reverts commit a012c704b5.

Breaks Windows builds.

C:\src\llvm-mint\lld\COFF\Symbols.cpp(26,1): error: static_assert failed due to requirement 'sizeof(lld::coff::SymbolUnion) <= 48' "symbols should be optimized for memory usage"
static_assert(sizeof(SymbolUnion) <= 48,
2020-10-15 10:27:25 -07:00
Martin Storsjö a012c704b5 [LLD] [COFF] Implement a GNU/ELF like -wrap option
Add a simple forwarding option in the MinGW frontend, and implement
the private -wrap option in the COFF linker.

The feature in lld-link isn't gated by the -lldmingw option, but
the option is left as a private, undocumented option primarily
used by the MinGW driver.

The implementation is significantly based on the support for --wrap
in the ELF linker, but many small nuance details are different
between the ELF and COFF linkers, ending up with more than a few
implementation differences.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47384.

Differential Revision: https://reviews.llvm.org/D89004
2020-10-15 18:34:02 +03:00
Martin Storsjö 45c4c54003 [LLD] [COFF] Add a private option for setting the os version separately from subsystem version
The MinGW driver has separate options for OS and subsystem version.
Having this available in lld-link allows the MinGW driver to both match
GNU ld better and simplifies the code for merging two (potentially
mismatching) arguments into one.

Differential Revision: https://reviews.llvm.org/D88802
2020-10-05 23:08:01 +03:00
Reid Kleckner 5519e4da83 Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success
values.

This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..

Original commit message:
-----------------------------------------

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 15:44:38 -07:00
Reid Kleckner 8d250ac3cd Revert "[PDB] Merge types in parallel when using ghashing"
This reverts commit 49b3459930.
2020-09-30 14:55:32 -07:00
Reid Kleckner 49b3459930 [PDB] Merge types in parallel when using ghashing
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 14:22:48 -07:00
Fangrui Song 1ca6bd261e [lld] Clean up in lld::{coff,elf}::link after D70378
Library users should not need to call errorHandler().reset() explicitly.

google/iree calls lld:🧝:link and without the patch some global
variables are not cleaned up in the next invocation.
2020-09-24 18:02:45 -07:00
Alexandre Ganea f2efb5742c [LLD][COFF] Cover usage of LLD-as-a-library in tests
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.

Differential Revision: https://reviews.llvm.org/D70378
2020-09-24 15:07:50 -04:00
Zequan Wu 763671f387 [COFF] Port CallGraphSort to COFF from ELF 2020-07-30 15:21:44 -07:00
Martin Storsjö 745eb02496 [LLD] [MinGW] Implement the --no-seh flag
Previously this flag was just ignored. If set, set the
IMAGE_DLL_CHARACTERISTICS_NO_SEH bit, regardless of the normal safeSEH
machinery.

In mingw configurations, the safeSEH bit might not be set in e.g. object
files built from handwritten assembly, making it impossible to use the
normal safeseh flag. As mingw setups don't generally use SEH on 32 bit
x86 at all, it should be fine to set that flag bit though - hook up
the existing GNU ld flag for controlling that.

Differential Revision: https://reviews.llvm.org/D84701
2020-07-28 21:08:37 +03:00
Reid Kleckner 3508c1d8fb [LLD] Make scoped timers thread safe
Summary:
This is a pre-requisite to parallelizing PDB symbol and type merging.
Currently this timer usage would not be thread safe.

Reviewers: aganea, MaskRay

Subscribers: jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80298
2020-05-20 16:16:08 -07:00
Reid Kleckner 54a335a2f6 [COFF] Move type merging to TpiSource::mergeDebugT virtual method
This paves the way to doing more things in parallel, and allows us to
order type sources in dependency order. PDBs and PCH objects have to be
loaded before object files which use them.

This is a rebase of the unapplied remaining changes in
https://reviews.llvm.org/D59226. I found it very challenging to rebase
this across the LLD variable name style change. I recall there was a
tool for that, but I didn't take the time to use it.

Reviewers: aganea, akhuang

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79672
2020-05-14 09:47:00 -07:00
Martin Storsjö 7f0e6c31c2 [LLD] [COFF] Add options for disabling auto import and runtime pseudo relocs
Allow disabling either the full auto import feature, or just
forbidding the cases that require runtime fixups.

As long as all auto imported variables are referenced from separate
.refptr$<name> sections, we can alias them on top of the IAT entries
and don't actually need any runtime fixups via pseudo relocations.
LLVM generates references to variables in .refptr stubs, if it
isn't known that the variable for sure is defined in the same object
module. Runtime pseudo relocs are needed if the addresses of auto
imported variables are used in constant initializers though.

Fixing up runtime pseudo relocations requires the use of
VirtualProtect (which is disallowed in WinStore/UWP apps) or
VirtualProtectFromApp. To allow any risk of ambiguity, allow
rejecting cases that would require this at the linker stage.

This adds support for the --disable-runtime-pseudo-reloc and
--disable-auto-import options in the MinGW driver (matching GNU ld.bfd)
with corresponding lld private options in the COFF driver.

Differential Revision: https://reviews.llvm.org/D78923
2020-05-14 13:05:14 +03:00
Martin Storsjö ed0a57f753 [LLD] [COFF] Fix def file exporting of symbols containing periods
This fixes an accidental breakage of exporting symbols using def
files, when the symbol name contains a period, since commit
0ca06f7950, mixing up a symbol name containing a period with
the case of exporting a symbol as a forward to another dll.

Differential Revision: https://reviews.llvm.org/D79619
2020-05-10 23:30:14 +03:00
Reid Kleckner 932f0276ea [Support] Move LLD's parallel algorithm wrappers to support
Essentially takes the lld/Common/Threads.h wrappers and moves them to
the llvm/Support/Paralle.h algorithm header.

The changes are:
- Remove policy parameter, since all clients use `par`.
- Rename the methods to `parallelSort` etc to match LLVM style, since
  they are no longer C++17 pstl compatible.
- Move algorithms from llvm::parallel:: to llvm::, since they have
  "parallel" in the name and are no longer overloads of the regular
  algorithms.
- Add range overloads
- Use the sequential algorithm directly when 1 thread is requested
  (skips task grouping)
- Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
  using any other parameter, and it made overload resolution hard for
  for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.

Remove Threads.h and update LLD for that.

This is a prerequisite for parallel public symbol processing in the PDB
library, which is in LLVM.

Reviewed By: MaskRay, aganea

Differential Revision: https://reviews.llvm.org/D79390
2020-05-05 15:21:05 -07:00
Reid Kleckner 3542384ae9 [COFF] Use a global option table to avoid reconstructing it
Otherwise an ArgumentParser is constructed for every directive section,
and that involves copying the entire table of options into a vector.
There is no need for this, just have one option table.
2020-05-02 15:04:19 -07:00
Reid Kleckner 01b5f52140 [COFF] Add a fastpath for /INCLUDE: in .drective sections
This speeds up linking chrome.dll with PGO instrumentation by 13%
(154271ms -> 134033ms).

LLVM's Option library is very slow. In particular, it allocates at least
one large-ish heap object (Arg) for every argument. When PGO
instrumentation is enabled, all the __profd_* symbols are added to the
@llvm.used list, which compiles down to these /INCLUDE: directives. This
means we have O(#symbols) directives to parse in the section, so we end
up allocating an Arg for every function symbol in the object file. This
is unnecessary.

To address the issue and speed up the link, extend the fast path that we
already have for /EXPORT:, which has similar scaling issues.

I promise that I took a hard look at optimizing the Option library, but
its data structures are very general and would need a lot of cleanup. We
have accumulated lots of optional features (option groups, aliases,
multiple values) over the years, and these are now properties of every
parsed argument, when the vast majority of arguments do not use these
features.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D78845
2020-04-28 10:35:57 -07:00
Eric Astor a39b14f0b4 [ms] Add new /PDBSTREAM option to lld-link allowing injection of streams into PDB files.
Summary:
/PDBSTREAM:<name>=<file> adds the contents of <file> to stream <name> in the resulting PDB.

This allows native uses with workflows that (for example) add srcsrv streams to PDB files to provide a location for the build's source files.

Results should be equivalent to linking with lld-link, then running Microsoft's pdbstr tool with the command line:
pdbstr.exe -w -p:<PDB LOCATION> -s:<name> -i:<file>
except in cases where the named stream overlaps with a default named stream, such as "/names". In those cases, the added stream will be overridden, making the /pdbstream option a no-op.

Reviewers: thakis, rnk

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D77310
2020-04-07 16:19:38 -04:00
Fangrui Song eb4663d8c6 [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}
--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.

There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.

--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.

If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.

There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.

Reviewed By: rnk, aganea

Differential Revision: https://reviews.llvm.org/D76885
2020-03-31 08:46:12 -07:00
Alexandre Ganea 09158252f7 [ThinLTO] Allow usage of all hardware threads in the system
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.

One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.

When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).

Differential Revision: https://reviews.llvm.org/D75153
2020-03-27 10:20:58 -04:00
Sylvain Audi b91905a263 [lld-link] Support /map option, matching link.exe 's /map output format
Added support for /map and /map:[filepath].
The output was derived from Microsoft's Link.exe output when using that same option.
Note that /MAPINFO support was not added.

The previous implementation of MapFile.cpp/.h was meant for /lldmap, and was renamed to LLDMapFile.cpp/.h
MapFile.cpp/.h is now for /MAP
However, a small fix was added to lldmap, replacing a std::sort with std::stable_sort to enforce reproducibility.

Differential Revision: https://reviews.llvm.org/D70557
2020-03-24 09:48:00 -04:00
Rui Ueyama a2923b2a1e Implement CET Shadow Stack (Intel Controlflow Enforcement Technology) support on Windows
Patch by Petr Penzin.

Windows support for CET is limited to shadow stack, which is enabled
by setting a PE bit in the linker.

Docs:

MSVC linker flag:
https://docs.microsoft.com/en-us/cpp/build/reference/cetcompat?view=vs-2019

IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT PE bit:
https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#extended-dll-characteristics

Differential Revision: https://reviews.llvm.org/D70606
2020-03-16 17:51:32 +09:00
Jonas Devlieghere 3e24242a7d [lld] Replace SmallStr.str().str() with std::string conversion operator.
Use the std::string conversion operator introduced in
d7049213d0.
2020-01-29 21:30:21 -08:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Martin Storsjö e6b0ce70bd [LLD] [COFF] Silence a GCC warning about an unused variable. NFC. 2020-01-23 13:23:56 +02:00
Reid Kleckner 8045a8a7f1 [COFF] Warn that LLD does not support /PDBSTRIPPED:
Doesn't really fix PR44491, but it avoids treating it as an input.
2020-01-15 15:11:19 -08:00
James Y Knight d3fec7fb45 LLD: Don't use the stderrOS stream in link before it's reassigned.
Remove the lld::enableColors function, as it just obscures which
stream it's affecting, and replace with explicit calls to the stream's
enable_colors.

Also, assign the stderrOS and stdoutOS globals first in link function,
just to ensure nothing might use them.

(Either change individually fixes the issue of using the old
stream, but both together seems best.)

Follow-up to b11386f9be.

Differential Revision: https://reviews.llvm.org/D70492
2019-11-21 10:55:03 -05:00
Rui Ueyama b11386f9be Make it possible to redirect not only errs() but also outs()
This change is for those who use lld as a library. Context:
https://reviews.llvm.org/D70287

This patch adds a new parmeter to lld::*::link() so that we can pass
an raw_ostream object representing stdout. Previously, lld::*::link()
took only an stderr object.

Justification for making stdoutOS and stderrOS mandatory: I wanted to
make link() functions to take stdout and stderr in that order.
However, if we change the function signature from

  bool link(ArrayRef<const char *> args, bool canExitEarly,
            raw_ostream &stderrOS = llvm::errs());

to

  bool link(ArrayRef<const char *> args, bool canExitEarly,
            raw_ostream &stdoutOS = llvm::outs(),
            raw_ostream &stderrOS = llvm::errs());

, then the meaning of existing code that passes stderrOS silently
changes (stderrOS would be interpreted as stdoutOS). So, I chose to
make existing code not to compile, so that developers can fix their
code.

Differential Revision: https://reviews.llvm.org/D70292
2019-11-18 11:18:06 +09:00
Reid Kleckner ce0f3ee5e4 [COFF] Don't error if the only inputs are from /wholearchive:
Fixes PR43744

Differential Revision: https://reviews.llvm.org/D69968
2019-11-15 16:09:07 -08:00
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00