llvm-project

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	075539ddf6	[LLD] [COFF] Allow invoking lib.exe mode via -lib in addition to /lib Remove a stray -lib argument in guardcf-lto.ll; llvm-lib doesn't support generating import libs from a def file unlike lib.exe. Previously this worked because the -lib argument was ignored (printing only a warning). Differential Revision: https://reviews.llvm.org/D96699	2021-02-24 11:16:12 +02:00
Nico Weber	e6d1f261a5	[lld-link] Add /reproduce: support for several flags /reproduce: now works correctly with: - /call-graph-ordering-file: - /def: - /natvis: - /order: - /pdbstream: I went through all instances of MemoryBuffer::getFile() and made sure everything that didn't already do so called takeBuffer(). For natvis, that wasn't possible since DebugInfo/PDB wants to take owernship of the natvis buffer. For that case, I'm manually adding the tar file entry. /natvis: and /pdbstream: is slightly awkward, since createResponseFile() always adds these flags to the response file but createPDB() (which ultimately adds the files referenced by the flags) is only called if /debug is also passed. So when using /natvis: without /debug with /reproduce:, lld won't warn, but when linking using the response file from the archive, it won't find the natvis file since it's not in the tar. This isn't a new issue though, and after this patch things at least work with using /natvis: _with_ debug with /reproduce:. (Same for /pdbstream:) Differential Revison: https://reviews.llvm.org/D97212	2021-02-22 16:52:49 -05:00
Abhina Sreeskantharajan	e59d336e75	[test] Use host platform specific error message substitution in lit tests - continued On z/OS, other error messages are not matched correctly in lit tests. ``` EDC5121I Invalid argument. EDC5111I Permission denied. ``` This patch adds a lit substitution to fix it. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95808	2021-02-03 09:53:22 -05:00
Fangrui Song	3f8dda50cb	[test] Fix unuses FileCheck prefixes in lld	2021-02-01 20:52:33 -08:00
Abhina Sreeskantharajan	42a21778f6	[test] Use host platform specific error message substitution in lit tests On z/OS, the following error message is not matched correctly in lit tests. ``` EDC5129I No such file or directory. ``` This patch uses a lit config substitution to check for platform specific error messages. Reviewed By: muiez, jhenderson Differential Revision: https://reviews.llvm.org/D95246	2021-01-29 07:16:30 -05:00
Abhina Sreeskantharajan	978444d531	Revert "[SystemZ][z/OS] Fix No such file or directory expression error" This reverts commit `06f8a49693`.	2021-01-25 08:29:38 -05:00
Reid Kleckner	9e708ac6b9	[COFF] Fix relocation offsets in pdb-file-statics test input The relocation offsets were incorrect. I fixed them with llvm-readobj -codeview -codeview-subsection-bytes, which has a helpful printout of the relocations that apply to a given symbol record with their offsets. With this, I was able to update the relocation offsets in the yaml to fix the line table and the S_DEFRANGE_REGISTER records. There is still some remaining inconsistency in yaml2obj and obj2yaml when round tripping MSVC objects, but that isn't a blocker for relanding D94267.	2021-01-20 11:45:30 -08:00
Hans Wennborg	ec877106a3	[ThinLTO] Also prune Thin-* files from the ThinLTO cache Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately after they're used (either by renaming or deletion). However, we've seen instances on Windows where this doesn't happen, probably due to the filesystem being flaky. This is effectively a resource leak which has prevented us from using the ThinLTO cache on Windows. Since those temporary files are in the thinlto cache directory which we prune periodically anyway, allowing them to be pruned too seems like a tidy way to solve the problem. Differential revision: https://reviews.llvm.org/D94962	2021-01-19 14:43:49 +01:00
Abhina Sreeskantharajan	689aaba7ac	[SystemZ][z/OS] Fix No such file or directory expression error matching in lit tests On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully. ``` EDC5129I No such file or directory. ``` Reviewed By: muiez Differential Revision: https://reviews.llvm.org/D94239	2021-01-18 07:14:37 -05:00
Alexandre Ganea	6acfc3a782	Fix build after `eaadb41db6` when the MSVC libs are not in PATH.	2021-01-07 18:52:00 -05:00
Alexandre Ganea	eaadb41db6	[LLD][COFF] When using PCH.OBJ, ensure func_id records indices are remapped under /DEBUG:GHASH Before this patch, when using LLD with /DEBUG:GHASH and MSVC precomp.OBJ files, we had a bunch of: lld-link: warning: S_[GL]PROC32ID record in blabla.obj refers to PDB item index 0x206ED1 which is not a LF[M]FUNC_ID record This was caused by LF_FUNC_ID and LF_MFUNC_ID which didn't have correct mapping to the corresponding TPI records. The root issue was that the indexMapStorage was improperly re-assembled in UsePrecompSource::remapTpiWithGHashes. After this patch, /DEBUG and /DEBUG:GHASH produce exactly the same debug infos in the PDB. Differential Revision: https://reviews.llvm.org/D93732	2021-01-07 17:27:13 -05:00
Amy Huang	7e13694ac7	[llvm-symbolizer][Windows] Add start line when searching in line table sections. Fixes issue where if a line section doesn't start with a line number then the addresses at the beginning of the section don't have line numbers. For example, for a line section like this ``` 0001:00000010-00000014, line/column/addr entries = 1 7 00000013 ! ``` a line number wouldn't be found for addresses from 10 to 12. This matches behavior when using the DIA SDK. Differential Revision: https://reviews.llvm.org/D93306	2020-12-17 07:57:36 -08:00
Fangrui Song	66bcbdbc9c	[AArch64InstPrinter] Change printADRPLabel to print the target address in hexadecimal form Similar to D77853. Change ADRP to print the target address in hex, instead of the raw immediate. The behavior is similar to GNU objdump but we also include `0x`. Note: GNU objdump is not consistent whether or not to emit `0x` for different architectures. We try emitting 0x consistently for all targets. ``` GNU objdump: adrp x16, 10000000 Old llvm-objdump: adrp x16, #0 New llvm-objdump: adrp x16, 0x10000000 ``` `adrp Xd, 0x...` assembles to a relocation referencing `ABS+0x10000` which is not intended. We need to use a linker or use yaml2obj. The main test is `test/tools/llvm-objdump/ELF/AArch64/pcrel-address.yaml` Differential Revision: https://reviews.llvm.org/D93241	2020-12-16 09:20:55 -08:00
Nico Weber	5dad062d7e	fix typo to cycle bots	2020-12-10 19:20:09 -05:00
Amy Huang	efd1ec0dec	Recommit "[llvm-symbolizer] Switch to using native symbolizer by default on Windows" This reverts commit `1b63177a56`.	2020-11-30 17:36:12 -08:00
Amy Huang	8cdf4920c4	[llvm-symbolizer] Fix typo in llvm-symbolizer test from a previous commit. (Commit was `00bbef2bb2`)	2020-11-30 15:08:11 -08:00
Amy Huang	00bbef2bb2	[llvm-symbolizer] Fix native symbolization on windows for inline sites. The existing code handles this correctly and I checked that the code in NativeInlineSiteSymbol also handles this correctly, but it was wrong in the NativeFunctionSymbol code. Differential Revision: https://reviews.llvm.org/D92134	2020-11-30 14:27:35 -08:00
Nico Weber	a0994cbe27	lld-link: Let LLD_REPRODUCE control /reproduce:, like in ld.lld Also sync help texts for the option between elf and coff ports. Decisions: - Do this even if /lldignoreenv is passed. /reproduce: does not affect the main output, and this makes the env var more convenient to use. (On the other hand, it's now possible to set this env var and forget about it, and all future builds in the same shell will be much slower. That's true for ld.lld, but posix shells have an easy way to set an env var for a single command; in cmd.exe this is not possible without contortions. Then again, lld-link runs in posix shells too.) Original patch rebased across D68378 and D68381. Differential Revision: https://reviews.llvm.org/D67707	2020-11-27 13:33:55 -05:00
Amy Huang	1363dfaf31	[CodeView] Avoid emitting empty debug globals subsection. In https://reviews.llvm.org/D89072 I added static const data members to the debug subsection for globals. It skipped emitting an S_CONSTANT if it didn't have a value, which meant the subsection could be empty. This patch fixes the empty subsection issue. Differential Revision: https://reviews.llvm.org/D92049	2020-11-25 16:13:32 -08:00
Martin Storsjö	0b2d84fba8	[LLD] [COFF] Allow wrapping dllimported functions GNU ld doesn't seem to do this though, but it looks like a reasonable use case, is easy to implement, and was requested in https://bugs.llvm.org/show_bug.cgi?id=47384. Differential Revision: https://reviews.llvm.org/D91689	2020-11-24 10:15:20 +02:00
Amy Huang	1b63177a56	Revert "[llvm-symbolizer] Switch to using native symbolizer by default on Windows" Breaks some asan tests on the buildbot. This reverts commit `c74b427cb2`.	2020-11-23 16:29:45 -08:00
Amy Huang	c74b427cb2	[llvm-symbolizer] Switch to using native symbolizer by default on Windows llvm-symbolizer used to use the DIA SDK for symbolization on Windows; this patch switches to using native symbolization, which was implemented recently. Users can still make the symbolizer use DIA by adding the `-dia` flag in the LLVM_SYMBOLIZER_OPTS environment variable. Differential Revision: https://reviews.llvm.org/D91814	2020-11-23 15:57:08 -08:00
Andrew Paverd	0139c8af8d	[CFGuard] Add address-taken IAT tables and delay-load support This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87544	2020-11-17 18:24:45 -08:00
Amy Huang	bc98034040	[llvm-symbolizer] Add inline stack traces for Windows. This adds inline stack frames for symbolizing on Windows. Differential Revision: https://reviews.llvm.org/D88988	2020-11-17 13:19:13 -08:00
Alexandre Ganea	45b8a741fb	[LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures This is a follow-up for D70378 (Cover usage of LLD as a library). While debugging an intermittent failure on a bot, I recalled this scenario which causes the issue: 1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach lld:🧝:Obj-File::ObjFile() which goes straight into its base ELFFileBase(), then ELFFileBase::init(). 2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a half-initialized ObjFile instance. 3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we hapily restore the control flow to CrashRecoveryContext::RunSafely() then back in lld::safeLldMain(). 4.Before this patch, we called errorHandler().reset() just after, and this attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried to free the half-initialized ObjFile instance, and more precisely its ObjFile::dwarf member. Sometimes that worked, sometimes it failed and was catched by the CrashRecoveryContext. This scenario was the reason we called errorHandler().reset() through a CrashRecoveryContext. But in some rare cases, the above repro somehow corrupted the heap, creating a stack overflow. When the CrashRecoveryContext's filter (that is, __except (ExceptionFilter(GetExceptionInformation()))) tried to handle the exception, it crashed again since the stack was exhausted -- and that took the whole application down. That is the issue seen on the bot. Locally it happens about 1 times out of 15. Now this situation can happen anywhere in LLD. Since catching stack overflows is not a reliable scenario ATM when using CrashRecoveryContext, we're now preventing further re-entrance when such failures occur, by signaling lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above), only one iteration will be executed, instead of two. Differential Revision: https://reviews.llvm.org/D88348	2020-11-12 08:14:43 -05:00
Hans Wennborg	418f18c6cd	Revert "Reland [CFGuard] Add address-taken IAT tables and delay-load support" This broke both Firefox and Chromium (PR47905) due to what seems like dllimport function not being handled correctly. > This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. > Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. > > Reviewed By: rnk > > Differential Revision: https://reviews.llvm.org/D87544 This reverts commit `cfd8481da1`.	2020-11-11 16:03:33 +01:00
David Zarzycki	179d91b376	[lld testing] Unbreak read-only source builds Tests must not modify the source tree.	2020-11-06 07:13:55 -05:00
rojamd	b79e990f40	[lld][COFF] Add command line options for LTO with new pass manager This is more or less a port of rL329598 (D45275) to the COFF linker. Since there were already LTO-related settings under -opt:, I added them there instead of new flags. Differential Revision: https://reviews.llvm.org/D90624	2020-11-05 14:41:35 -05:00
Martin Storsjö	3785a413fe	Reapply [LLD] [COFF] Implement a GNU/ELF like -wrap option Add a simple forwarding option in the MinGW frontend, and implement the private -wrap option in the COFF linker. The feature in lld-link isn't gated by the -lldmingw option, but the option is left as a private, undocumented option primarily used by the MinGW driver. The implementation is significantly based on the support for --wrap in the ELF linker, but many small nuance details are different between the ELF and COFF linkers, ending up with more than a few implementation differences. This fixes https://bugs.llvm.org/show_bug.cgi?id=47384. Differential Revision: https://reviews.llvm.org/D89004 Reapplied with the bitfield member canInline fixed so it doesn't break builds targeting windows.	2020-10-15 22:14:02 +03:00
Arthur Eubanks	3d338f6813	Revert "[LLD] [COFF] Implement a GNU/ELF like -wrap option" This reverts commit `a012c704b5`. Breaks Windows builds. C:\src\llvm-mint\lld\COFF\Symbols.cpp(26,1): error: static_assert failed due to requirement 'sizeof(lld::coff::SymbolUnion) <= 48' "symbols should be optimized for memory usage" static_assert(sizeof(SymbolUnion) <= 48,	2020-10-15 10:27:25 -07:00
Martin Storsjö	a012c704b5	[LLD] [COFF] Implement a GNU/ELF like -wrap option Add a simple forwarding option in the MinGW frontend, and implement the private -wrap option in the COFF linker. The feature in lld-link isn't gated by the -lldmingw option, but the option is left as a private, undocumented option primarily used by the MinGW driver. The implementation is significantly based on the support for --wrap in the ELF linker, but many small nuance details are different between the ELF and COFF linkers, ending up with more than a few implementation differences. This fixes https://bugs.llvm.org/show_bug.cgi?id=47384. Differential Revision: https://reviews.llvm.org/D89004	2020-10-15 18:34:02 +03:00
Luqman Aden	6a73d6564a	[LLD] Set alignment as part of Characteristics in TLS table. Fixes https://bugs.llvm.org/show_bug.cgi?id=46473 LLD wasn't previously specifying any specific alignment in the TLS table's Characteristics field so the loader would just assume the default value (16 bytes). This works most of the time except if you have thread locals that want specific higher alignments (e.g. 32 as in the bug) even if they specify an alignment on the thread local. This change updates LLD to take the max alignment from tls section. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D88637	2020-10-15 00:22:40 -07:00
Luqman Aden	f87c98def8	Revert "[LLD] Set alignment as part of Characteristics in TLS table." Revert individual wip commits and will instead follow up with a single commit with all the changes. Makes cherry-picking easier and will contain all the right tags. This reverts commit `32a4ad3b6c`. This reverts commit `7fe13af676`. This reverts commit `51fbc1bef6`. This reverts commit `f80950a8bb`. This reverts commit `0778cad9f3`. This reverts commit `8b70d527d7`.	2020-10-15 00:21:36 -07:00
Luqman Aden	f80950a8bb	Update tests.	2020-10-14 19:34:32 -07:00
Luqman Aden	dc128e5968	[test][lld] Mark TLS tests as REQUIRES: x86. Fixes http://lab.llvm.org:8011/#/builders/119/builds/92	2020-10-14 00:29:06 -07:00
Luqman Aden	6b7738e204	[LLD] Add baseline test for TLS alignment. NFC. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D88646	2020-10-13 20:53:32 -07:00
Alexandre Ganea	617d64f6c5	Re-land [ThinLTO] Re-order modules for optimal multi-threaded processing This reverts `9b5b305023` and fixes the unwanted re-ordering when generating ThinLTO indexes. The goal of this patch is to better balance thread utilization during ThinLTO in-process linking (in llvm-lto2 or in LLD). Before this patch, large modules would often be scheduled late during execution, taking a long time to complete, thus starving the thread pool. We now sort modules in descending order, based on each module's bitcode size, so that larger modules are processed first. By doing so, smaller modules have a better chance to keep the thread pool active, and thus avoid starvation when the bitcode compilation is almost complete. In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & -flto=thin, /opt:lldltojobs=all, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc. Before patch: 100 sec After patch: 85 sec Inspired by the work done by David Callahan in D60495. Differential Revision: https://reviews.llvm.org/D87966	2020-10-13 21:54:15 -04:00
Andrew Paverd	cfd8481da1	Reland [CFGuard] Add address-taken IAT tables and delay-load support This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87544	2020-10-13 13:20:52 -07:00
Martin Storsjö	45c4c54003	[LLD] [COFF] Add a private option for setting the os version separately from subsystem version The MinGW driver has separate options for OS and subsystem version. Having this available in lld-link allows the MinGW driver to both match GNU ld better and simplifies the code for merging two (potentially mismatching) arguments into one. Differential Revision: https://reviews.llvm.org/D88802	2020-10-05 23:08:01 +03:00
Martin Storsjö	19e86336ef	[LLD] [COFF] Fix parsing version numbers with leading zeros Parse the components as decimal, instead of decuding the base from the string. This avoids ambiguity if the second number contains leading zeros, which previously were parsed as indicating an octal number. MS link.exe doesn't support hexadecimal numbers in the version numbers, neither in /version nor in /subsystem. Differential Revision: https://reviews.llvm.org/D88801	2020-10-05 23:08:00 +03:00
Alexandre Ganea	55b97a6d2a	[LLD][COFF] Add more type record information to /summary This adds the following two new lines to /summary: 21351 Input OBJ files (expanded from all cmd-line inputs) 61 PDB type server dependencies 38 Precomp OBJ dependencies 1420669231 Input type records <<<< 78665073382 Input type records bytes <<<< 8801393 Merged TPI records 3177158 Merged IPI records 59194 Output PDB strings 71576766 Global symbol records 25416935 Module symbol records 2103431 Public symbol records Differential Revision: https://reviews.llvm.org/D88703	2020-10-02 09:36:11 -04:00
Alexandre Ganea	4140f0744f	[LLD][COFF] Fix crash with /summary and PCH input files Before this patch /summary was crashing with some .PCH.OBJ files, because tpiMap[srcIdx++] was reading at the wrong location. When the TpiSource depends on a .PCH.OBJ file, the types should be offset by the previously merged PCH.OBJ set of indices. Differential Revision: https://reviews.llvm.org/D88678	2020-10-01 17:08:35 -04:00
Arthur Eubanks	499260c03b	Revert "[CFGuard] Add address-taken IAT tables and delay-load support" This reverts commit `ef4e971e5e`.	2020-10-01 11:29:54 -07:00
Andrew Paverd	ef4e971e5e	[CFGuard] Add address-taken IAT tables and delay-load support This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87544	2020-10-01 12:45:07 +01:00
Reid Kleckner	5519e4da83	Re-land "[PDB] Merge types in parallel when using ghashing" Stored Error objects have to be checked, even if they are success values. This reverts commit `8d250ac3cd`. Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f.. Original commit message: ----------------------------------------- This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE \| AFTER Input File Reading: 956 ms \| 968 ms Code Layout: 258 ms \| 190 ms Commit Output File: 6 ms \| 7 ms PDB Emission (Cumulative): 6691 ms \| 4253 ms Add Objects: 4341 ms \| 2927 ms Type Merging: 2814 ms \| 1269 ms -55%! Symbol Merging: 1509 ms \| 1645 ms Publics Stream Layout: 111 ms \| 112 ms TPI Stream Layout: 764 ms \| 26 ms trivial Commit to Disk: 1322 ms \| 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805	2020-09-30 15:44:38 -07:00
Reid Kleckner	8d250ac3cd	Revert "[PDB] Merge types in parallel when using ghashing" This reverts commit `49b3459930`.	2020-09-30 14:55:32 -07:00
Reid Kleckner	49b3459930	[PDB] Merge types in parallel when using ghashing This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE \| AFTER Input File Reading: 956 ms \| 968 ms Code Layout: 258 ms \| 190 ms Commit Output File: 6 ms \| 7 ms PDB Emission (Cumulative): 6691 ms \| 4253 ms Add Objects: 4341 ms \| 2927 ms Type Merging: 2814 ms \| 1269 ms -55%! Symbol Merging: 1509 ms \| 1645 ms Publics Stream Layout: 111 ms \| 112 ms TPI Stream Layout: 764 ms \| 26 ms trivial Commit to Disk: 1322 ms \| 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805	2020-09-30 14:22:48 -07:00
Alexandre Ganea	f2efb5742c	[LLD][COFF] Cover usage of LLD-as-a-library in tests In lit tests, we run each LLD invocation twice (LLD_IN_TEST=2), without shutting down the process in-between. This ensures a full cleanup is properly done between runs. Only active for the COFF driver for now. Other drivers still use LLD_IN_TEST=1 which executes just one iteration with full cleanup, like before. When the environment variable LLD_IN_TEST is unset, a shortcut is taken, only one iteration is executed, no cleanup for faster exit, like before. A public API, lld::safeLldMain(), is also available when using LLD as a library. Differential Revision: https://reviews.llvm.org/D70378	2020-09-24 15:07:50 -04:00
Alexandre Ganea	55624237be	[LLD][COFF] Avoid overwriting inputs in tests Before this patch, these two tests were emitting both a .DLL and .LIB. The output .LIB file name also happens to be an input .LIB file name. This prevented the test from executing a second time when LLD is re-entrant (LLD_IN_TEST=2). This is a support patch for https://reviews.llvm.org/D70378.	2020-09-24 15:01:25 -04:00
Petr Hosek	9c73e55510	Revert "[DebugInfo] Remove dots from getFilenameByIndex return value" This is failing on Windows bots due to path separator normalization. This reverts commit `042c235068`.	2020-09-15 10:06:47 -07:00

1 2 3 4 5 ...

909 Commits