Commit Graph

1041 Commits

Author SHA1 Message Date
Daniel Bertalan 0836fc395f [NFC][lld] Fix typos to test commit access 2022-06-24 00:19:18 +02:00
Nico Weber 7cb49996f7 [lld] Remove lld/include/lld/Core
This is all dead code that we forgot to delete in
https://reviews.llvm.org/D114842

Differential Revision: https://reviews.llvm.org/D128147
2022-06-19 21:37:13 -04:00
Keith Smiley 7d57c69826
[lld-macho] Add support for -w
This flag suppresses warnings produced by the linker. In ld64 this has
an interesting interaction with -fatal_warnings, it silences the
warnings but the link still fails. Instead of doing that here we still
print the warning and eagerly fail the link in case both are passed,
this seems more reasonable so users can understand why the link fails.

Differential Revision: https://reviews.llvm.org/D127564
2022-06-11 17:38:50 -07:00
Fangrui Song 941f06282a [lld] Make error handling functions opaque
The inline `lld::error` expands to two function calls `errorHandler` and `error`
where the latter is opaque. Move the functions to .cpp files to decrease code
size.

My x86-64 lld executable is 9KiB smaller.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D120002
2022-02-17 11:54:57 -08:00
Jez Ng 69297cf639 [lld-macho] Don't include CommandFlags.h in CommonLinkerContext.h
Main motivation: including `llvm/CodeGen/CommandFlags.h` in
`CommonLinkerContext.h` means that the declaration of `llvm::Reloc` is
visible in any file that includes `CommonLinkerContext.h`. Since our
cpp files have both `using namespace llvm` and `using namespace
lld::macho`, this results in conflicts with `lld::macho::Reloc`.

I suppose we could put `llvm::Reloc` into a nested namespace, but in general,
I think we should avoid transitively including too many header files in
a very widely used header like `CommonLinkerContext.h`.

RegisterCodeGenFlags' ctor initializes a bunch of function-`static`
structures and does nothing else, so it should be fine to "initialize"
it as a temporary stack variable rather than as a file static.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D119913
2022-02-16 20:05:07 -05:00
Krzysztof Drewniak 1ce314ce6b [MLIR][GPU][lld] Use LLD bundled in ROCm, removing workaround
Having clarified that executing the SerializeToHsaco pass can
depend on a ROCm installation, switch from calling lld as a library to
using the copy of lld guaranteed to be included in a ROCm install.

This removes the workaround introduced in D119277

Reviewed By: whchung

Differential Revision: https://reviews.llvm.org/D119463
2022-02-10 19:37:30 +00:00
Alexandre Ganea 1e661e583d [MLIR] Temporary workaround for calling the LLD ELF driver as-a-lib
This fixes the situation described in https://github.com/llvm/llvm-project/issues/53475 with a repro exposed by https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction

This is purposely just a workaround to unblock users. This could be transplanted to the release/14.x branch if need be. A proper fix will later be provided in https://reviews.llvm.org/D119049.

Differential Revision: https://reviews.llvm.org/D119277
2022-02-08 19:12:15 -05:00
Alexandre Ganea 83d59e05b2 Re-land [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

The previous land f860fe3622 caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by 22ee510dac.

Differential Revision: https://reviews.llvm.org/D108850
2022-01-20 14:53:26 -05:00
Alexandre Ganea e6b153947d Revert [LLD] Remove global state in lldCommon
It seems to be causing issues on https://lab.llvm.org/buildbot/#/builders/123/builds/8383
2022-01-16 11:03:06 -05:00
Alexandre Ganea 30a4020a7d [LLD] Supplement with more comments. Clarify the intention in f860fe3622. 2022-01-16 09:17:39 -05:00
Alexandre Ganea f860fe3622 [LLD] Remove global state in lldCommon
Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext.

See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

Differential Revision: https://reviews.llvm.org/D108850
2022-01-16 08:57:57 -05:00
Nico Weber 085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas 859ebca744 Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Luís Ferreira 10e40a4ea3 [lld] Add support for other demanglers other than Itanium
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.

Reviewed By: MaskRay, #lld-macho

Differential Revision: https://reviews.llvm.org/D116279
2022-01-05 03:25:41 +00:00
Luís Ferreira 8792cd75d0 Revert "[lld] Add support for other demanglers other than Itanium"
This reverts commit e60d6dfd5a.

clang-ppc64le-rhel buildbot failed (https://lab.llvm.org/buildbot#builders/57/builds/13424):

    tools/lld/MachO/CMakeFiles/lldMachO.dir/Symbols.cpp.o: In function `lld::demangle(llvm::StringRef, bool)':
    Symbols.cpp:(.text._ZN3lld8demangleEN4llvm9StringRefEb[_ZN3lld8demangleEN4llvm9StringRefEb]+0x90): undefined reference to `llvm::demangle(std::string const&)'
2021-12-30 18:04:21 +00:00
Luís Ferreira e60d6dfd5a [lld] Add support for other demanglers other than Itanium
LLVM core library supports demangling other mangled symbols other than itanium,
such as D and Rust. LLD should use those demanglers in order to output pretty
demangled symbols on error messages.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D116279
2021-12-30 17:52:38 +00:00
Keith Smiley 9e3552523e [lld-macho] Remove old macho darwin lld
During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.

Differential Revision: https://reviews.llvm.org/D114842
2021-12-02 11:04:49 -08:00
Nico Weber 64c1734438 [lld/mac] Write -v output to stderr
This matches ld64, and it's conceivable that projects try to read
this information off stderr for that reason.

--version keeps writing to stdout.

Differential Revision: https://reviews.llvm.org/D113020
2021-11-02 13:59:14 -04:00
Heejin Ahn 3ec1760d91 [WebAssembly] Remove WasmTagType
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
  uint8_t Attribute;
  uint32_t SigIndex;
};
```

Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.

In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.

I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.

Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.

This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.

Reviewed By: sbc100, tlively

Differential Revision: https://reviews.llvm.org/D111086
2021-10-05 17:11:22 -07:00
Amy Huang 6f7483b1ec Reland "[LLD] Remove global state in lld/COFF" after fixing asan and msan test failures
Original commit description:

  [LLD] Remove global state in lld/COFF

  This patch removes globals from the lldCOFF library, by moving globals
  into a context class (COFFLinkingContext) and passing it around wherever
  it's needed.

  See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
  context about removing globals from LLD.

  I also haven't moved the `driver` or `config` variables yet.

  Differential Revision: https://reviews.llvm.org/D109634

This reverts commit a2fd05ada9.

Original commits were b4fa71eed3
and e03c7e367a.
2021-09-17 17:18:42 -07:00
Amy Huang a2fd05ada9 Temporarily revert "[LLD] Remove global state in lld/COFF" and "[lld] Add test to
check for timer output"

Seems to be causing a number of asan test failures.

This reverts commit b4fa71eed3
and e03c7e367a.
2021-09-16 11:58:11 -07:00
Amy Huang b4fa71eed3 [LLD] Remove global state in lld/COFF
This patch removes globals from the lldCOFF library, by moving globals
into a context class (COFFLinkingContext) and passing it around wherever
it's needed.

See https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html for
context about removing globals from LLD.

I also haven't moved the `driver` or `config` variables yet.

Differential Revision: https://reviews.llvm.org/D109634
2021-09-16 11:00:23 -07:00
Fangrui Song 0db402c5b4 [lld] Buffer writes when composing a single diagnostic
llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).

For an invocation of ld.lld with 62000+ lines of
`ld.lld: warning: symbol ordering file: no such symbol: ` warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file). This can speed up
`relocation R_X86_64_PC32 out of range` diagnostic printing as well
with `--noinhibit-exec --no-fatal-warnings`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87272
2021-09-09 09:27:14 -07:00
Fangrui Song 323b9bf862 [lld] Replace LLVM_ATTRIBUTE_NORETURN with [[noreturn]]
[[noreturn]] can be used since 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
2021-07-27 18:51:17 -07:00
Heejin Ahn 1d891d44f3 [WebAssembly] Rename event to tag
We recently decided to change 'event' to 'tag', and 'event section' to
'tag section', out of the rationale that the section contains a
generalized tag that references a type, which may be used for something
other than exceptions, and the name 'event' can be confusing in the web
context.

See
- https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130
- https://github.com/WebAssembly/exception-handling/pull/161

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104423
2021-06-17 20:34:19 -07:00
Nikita Popov d93b678abb [lld] Add missing includes (NFC)
Fix lld build after 983565a6fe.
2021-06-03 18:55:18 +02:00
Yang Fan 062d4ddd22
[lld] Add missing header guard (NFC) 2021-04-02 11:12:23 +08:00
Jez Ng 9b6dde8af8 [lld-macho] Parallelize UUID hash computation
This reuses the approach (and some code) from LLD-ELF.

It's a decent win when linking chromium_framework on a Mac Pro (3.2 GHz 16-Core Intel Xeon W):

      N           Min           Max        Median           Avg        Stddev
  x  20          4.58          4.83          4.66        4.6685   0.066591844
  +  20          4.42          4.61           4.5         4.505    0.04751731
  Difference at 95.0% confidence
          -0.1635 +/- 0.0370242
          -3.5022% +/- 0.793064%
          (Student's t, pooled s = 0.0578462)

The output binary is 381MB.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D99279
2021-03-31 15:48:36 -04:00
Nico Weber cf59ffbfe3 fix comment typo to cycle bots 2021-02-17 11:49:23 -05:00
Andy Wingo a56e57493b [lld][WebAssembly] Common superclass for input globals/events/tables
This commit regroups commonalities among InputGlobal, InputEvent, and
InputTable into the new InputElement.  The subclasses are defined
inline in the new InputElement.h.  NFC.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D94677
2021-02-11 14:54:45 +01:00
Andy Wingo 53e3b81faa [lld][WebAssembly] Add support for handling table symbols
This commit adds table symbol support in a partial way, while still
including some special cases for the __indirect_function_table symbol.
No change in tests.

Differential Revision: https://reviews.llvm.org/D94075
2021-01-14 11:13:13 +01:00
Alexandre Ganea 45b8a741fb [LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures
This is a follow-up for D70378 (Cover usage of LLD as a library).

While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:

1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
  lld:🧝:Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
  then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
  half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
  hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
  in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
  attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
  to free the half-initialized ObjFile instance, and more precisely its
  ObjFile::dwarf member.

Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.

But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.

Differential Revision: https://reviews.llvm.org/D88348
2020-11-12 08:14:43 -05:00
serge-sans-paille 1e70ec10eb [lld] Provide a hook to customize undefined symbols error handling
This is a follow up to https://reviews.llvm.org/D87758, implementing the missing
symbol part, as done by binutils.

Differential Revision: https://reviews.llvm.org/D89687
2020-11-09 13:28:48 +01:00
serge-sans-paille cfc32267e2 Provide a hook to customize missing library error handling
Make it possible for lld users to provide a custom script that would help to
find missing libraries. A possible scenario could be:

    % clang /tmp/a.c -fuse-ld=lld -loauth -Wl,--error-handling-script=/tmp/addLibrary.py
    unable to find library -loauth
    looking for relevant packages to provides that library

        liboauth-0.9.7-4.el7.i686
        liboauth-devel-0.9.7-4.el7.i686
        liboauth-0.9.7-4.el7.x86_64
        liboauth-devel-0.9.7-4.el7.x86_64
        pix-1.6.1-3.el7.x86_64

Where addLibrary would be called with the missing library name as first argument
(in that case addLibrary.py oauth)

Differential Revision: https://reviews.llvm.org/D87758
2020-11-03 11:01:29 +01:00
Reid Kleckner 5519e4da83 Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success
values.

This reverts commit 8d250ac3cd.
Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f..

Original commit message:
-----------------------------------------

This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 15:44:38 -07:00
Reid Kleckner 8d250ac3cd Revert "[PDB] Merge types in parallel when using ghashing"
This reverts commit 49b3459930.
2020-09-30 14:55:32 -07:00
Reid Kleckner 49b3459930 [PDB] Merge types in parallel when using ghashing
This makes type merging much faster (-24% on chrome.dll) when multiple
threads are available, but it slightly increases the time to link (+10%)
when /threads:1 is passed. With only one more thread, the new type
merging is faster (-11%). The output PDB should be identical to what it
was before this change.

To give an idea, here is the /time output placed side by side:
                              BEFORE    | AFTER
  Input File Reading:           956 ms  |  968 ms
  Code Layout:                  258 ms  |  190 ms
  Commit Output File:             6 ms  |    7 ms
  PDB Emission (Cumulative):   6691 ms  | 4253 ms
    Add Objects:               4341 ms  | 2927 ms
      Type Merging:            2814 ms  | 1269 ms  -55%!
      Symbol Merging:          1509 ms  | 1645 ms
    Publics Stream Layout:      111 ms  |  112 ms
    TPI Stream Layout:          764 ms  |   26 ms  trivial
    Commit to Disk:            1322 ms  | 1036 ms  -300ms
----------------------------------------- --------
Total Link Time:               8416 ms    5882 ms  -30% overall

The main source of the additional overhead in the single-threaded case
is the need to iterate all .debug$T sections up front to check which
type records should go in the IPI stream. See fillIsItemIndexFromDebugT.
With changes to the .debug$H section, we could pre-calculate this info
and eliminate the need to do this walk up front. That should restore
single-threaded performance back to what it was before this change.

This change will cause LLD to be much more parallel than it used to, and
for users who do multiple links in parallel, it could regress
performance. However, when the user is only doing one link, it's a huge
improvement. In the future, we can use NT worker threads to avoid
oversaturating the machine with work, but for now, this is such an
improvement for the single-link use case that I think we should land
this as is.

Algorithm
----------

Before this change, we essentially used a
DenseMap<GloballyHashedType, TypeIndex> to check if a type has already
been seen, and if it hasn't been seen, insert it now and use the next
available type index for it in the destination type stream. DenseMap
does not support concurrent insertion, and even if it did, the linker
must be deterministic: it cannot produce different PDBs by using
different numbers of threads. The output type stream must be in the same
order regardless of the order of hash table insertions.

In order to create a hash table that supports concurrent insertion, the
table cells must be small enough that they can be updated atomically.
The algorithm I used for updating the table using linear probing is
described in this paper, "Concurrent Hash Tables: Fast and General(?)!":
https://dl.acm.org/doi/10.1145/3309206

The GHashCell in this change is essentially a pair of 32-bit integer
indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the
TpiSource object, and it represents an input type stream. The typeIndex
is the index of the type in the stream. Together, we have something like
a ragged 2D array of ghashes, which can be looked up as:
  tpiSources[tpiSrcIndex]->ghashes[typeIndex]

By using these side tables, we can omit the key data from the hash
table, and keep the table cell small. There is a cost to this: resolving
hash table collisions requires many more loads than simply looking at
the key in the same cache line as the insertion position. However, most
supported platforms should have a 64-bit CAS operation to update the
cell atomically.

To make the result of concurrent insertion deterministic, the cell
payloads must have a priority function. Defining one is pretty
straightforward: compare the two 32-bit numbers as a combined 64-bit
number. This means that types coming from inputs earlier on the command
line have a higher priority and are more likely to appear earlier in the
final PDB type stream than types from an input appearing later on the
link line.

After table insertion, the non-empty cells in the table can be copied
out of the main table and sorted by priority to determine the ordering
of the final type index stream. At this point, item and type records
must be separated, either by sorting or by splitting into two arrays,
and I chose sorting. This is why the GHashCell must contain the isItem
bit.

Once the final PDB TPI stream ordering is known, we need to compute a
mapping from source type index to PDB type index. To avoid starting over
from scratch and looking up every type again by its ghash, we save the
insertion position of every hash table insertion during the first
insertion phase. Because the table does not support rehashing, the
insertion position is stable. Using the array of insertion positions
indexed by source type index, we can replace the source type indices in
the ghash table cells with the PDB type indices.

Once the table cells have been updated to contain PDB type indices, the
mapping for each type source can be computed in parallel. Simply iterate
the list of cell positions and replace them with the PDB type index,
since the insertion positions are no longer needed.

Once we have a source to destination type index mapping for every type
source, there are no more data dependencies. We know which type records
are "unique" (not duplicates), and what their final type indices will
be. We can do the remapping in parallel, and accumulate type sizes and
type hashes in parallel by type source.

Lastly, TPI stream layout must be done serially. Accumulate all the type
records, sizes, and hashes, and add them to the PDB.

Differential Revision: https://reviews.llvm.org/D87805
2020-09-30 14:22:48 -07:00
Alexandre Ganea f2efb5742c [LLD][COFF] Cover usage of LLD-as-a-library in tests
In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs.
Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before.
When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before.
A public API, lld::safeLldMain(), is also available when using LLD as a library.

Differential Revision: https://reviews.llvm.org/D70378
2020-09-24 15:07:50 -04:00
Andrew Ng 77152a6b7a [LLD][ELF] Optimize linker script filename glob pattern matching NFC
Optimize the filename glob pattern matching in
LinkerScript::computeInputSections() and LinkerScript::shouldKeep().

Add InputFile::getNameForScript() which gets and if required caches the
Inputfile's name used for linker script matching. This avoids the
overhead of name creation that was in getFilename() in LinkerScript.cpp.

Add InputSectionDescription::matchesFile() and
SectionPattern::excludesFile() which perform the glob pattern matching
for an InputFile and make use of a cache of the previous result. As both
computeInputSections() and shouldKeep() process sections in order and
the sections of the same InputFile are contiguous, these single entry
caches can significantly speed up performance for more complex glob
patterns.

These changes have been seen to reduce link time with --gc-sections by
up to ~40% with linker scripts that contain KEEP filename glob patterns
such as "*crtbegin*.o".

Differential Revision: https://reviews.llvm.org/D87469
2020-09-16 10:26:11 +01:00
Jez Ng 22e6648a18 [lld-macho] Implement -headerpad
Tools like `install_name_tool` and `codesign` may modify the Mach-O
header and increase its size. The linker has to provide padding to make this
possible. This diff does that, plus sets its default value to 32 bytes (which
is what ld64 does).

Unlike ld64, however, we lay out our sections *exactly* `-headerpad` bytes from
the header, whereas ld64 just treats the padding requirement as a lower bound.
ld64 actually starts laying out the non-header sections in the __TEXT segment
from the end of the (page-aligned) segment rather than the front, so its
binaries typically have more than `-headerpad` bytes of actual padding.
We should consider implementing the same alignment behavior.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D84714
2020-07-30 14:29:31 -07:00
Eric Christopher 058ec20677 [lld] As part of using inclusive language within the llvm
project, migrate away from the use of blacklist and whitelist.
2020-06-19 21:50:14 -07:00
Reid Kleckner 3eb16fe4e9 [LLD] Have only one SpecificAllocator per type
Previously, the SpecificAllocator was a static local in the `make<T>`
function template. Using static locals is nice because they are only
constructed and registered if they are accessed. However, if there are
multiple calls to make<> with different constructor parameters, we would
get multiple static local variable instances. This is undesirable and
leads to extra memory allocations. I noticed there were two sources of
DefinedRegular allocations while checking heap profiles.
2020-06-02 14:09:09 -07:00
Reid Kleckner 3508c1d8fb [LLD] Make scoped timers thread safe
Summary:
This is a pre-requisite to parallelizing PDB symbol and type merging.
Currently this timer usage would not be thread safe.

Reviewers: aganea, MaskRay

Subscribers: jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80298
2020-05-20 16:16:08 -07:00
Reid Kleckner 932f0276ea [Support] Move LLD's parallel algorithm wrappers to support
Essentially takes the lld/Common/Threads.h wrappers and moves them to
the llvm/Support/Paralle.h algorithm header.

The changes are:
- Remove policy parameter, since all clients use `par`.
- Rename the methods to `parallelSort` etc to match LLVM style, since
  they are no longer C++17 pstl compatible.
- Move algorithms from llvm::parallel:: to llvm::, since they have
  "parallel" in the name and are no longer overloads of the regular
  algorithms.
- Add range overloads
- Use the sequential algorithm directly when 1 thread is requested
  (skips task grouping)
- Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
  using any other parameter, and it made overload resolution hard for
  for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.

Remove Threads.h and update LLD for that.

This is a prerequisite for parallel public symbol processing in the PDB
library, which is in LLVM.

Reviewed By: MaskRay, aganea

Differential Revision: https://reviews.llvm.org/D79390
2020-05-05 15:21:05 -07:00
Fangrui Song 6acd300375 Reland D75382 "[lld] Initial commit for new Mach-O backend"
With a fix for http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/3636

Also trims some unneeded dependencies.
2020-04-02 12:03:43 -07:00
Oliver Stannard af39151f3c Revert "[lld] Initial commit for new Mach-O backend"
This is causing buildbot failures on 32-bit hosts, for example:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/3636

This reverts commit 03f43b3aca.
2020-04-02 13:23:30 +01:00
Kazuaki Ishizaki 7c5fcb3591 [lld] NFC: fix trivial typos in comments
Differential Revision: https://reviews.llvm.org/D72339
2020-04-02 01:21:36 +09:00
Jez Ng 03f43b3aca [lld] Initial commit for new Mach-O backend
Summary:
This is the first commit for the new Mach-O backend, designed to roughly
follow the architecture of the existing ELF and COFF backends, and
building off work that @ruiu and @pcc did in a branch a while back. Note
that this is a very stripped-down commit with the bare minimum of
functionality for ease of review. We'll be following up with more diffs
soon.

Currently, we're able to generate a simple "Hello World!" executable
that runs on OS X Catalina (and possibly on earlier OS X versions; I
haven't tested them). (This executable can be obtained by compiling
`test/MachO/relocations.s`.) We're mocking out a few load commands to
achieve this -- for example, we can't load dynamic libraries, but
Catalina requires binaries to be linked against `dyld`, so we hardcode
the emission of a `LC_LOAD_DYLIB` command. Other mocked out load
commands include LC_SYMTAB and LC_DYSYMTAB.

Differential Revision: https://reviews.llvm.org/D75382
2020-03-31 11:58:47 -07:00
Fangrui Song eb4663d8c6 [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}
--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.

There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.

--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.

If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.

There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.

Reviewed By: rnk, aganea

Differential Revision: https://reviews.llvm.org/D76885
2020-03-31 08:46:12 -07:00
Alexey Lapshin dcf6494abe LLD already has a mechanism for caching creation of DWARCContext:
llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });

Though it is not used in all places.

I need that patch for implementing "Remove obsolete debug info" feature
(D74169). But this caching mechanism is useful by itself, and I think it
would be good to use it without connection to "Remove obsolete debug info"
feature. So this patch changes inplace creation of DWARFContext with
its cached version.

Depends on D74308

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D74773
2020-03-06 21:17:07 +03:00