llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexander Shaposhnikov	4450a2a23d	[lld][ELF] Add support for ADRP+ADD optimization for AArch64 This diff adds support for ADRP+ADD optimization for AArch64 described in `d2ca58c54b` i.e. under appropriate constraints ADRP x0, symbol ADD x0, x0, :lo12: symbol can be turned into NOP ADR x0, symbol Test plan: make check-all Differential revision: https://reviews.llvm.org/D117614	2022-02-02 06:09:55 +00:00
Fangrui Song	fbf2f66400	[ELF] Update flag propagation rule to ignore discarded output sections See the updated insert-before.test for the effects: many synthetic sections are SHF_ALLOC\|SHF_WRITE. If they are discarded, we don't want to propagate their flags to subsequent output section descriptions. `getFirstInputSection(sec) == nullptr` can technically be merged into `isDiscardable` but I'd like to postpone that as not sharing code may give more refactoring opportunity. Depends on D118529. Reviewed By: peter.smith, bluca Differential Revision: https://reviews.llvm.org/D118530	2022-02-01 10:19:30 -08:00
Fangrui Song	a0318711c8	[ELF] Rename adjustSectionsBeforeSorting to adjustOutputSections and make it affect INSERT commands adjustSectionsBeforeSorting updates some output section attributes (alignment/flags) and removes discardable empty sections. When it is called, INSERT commands have not been processed. Therefore the flags propagation rule may not affect output sections defined in an INSERT command properly. Fix this by moving processInsertCommands before adjustSectionsBeforeSorting. adjustSectionsBeforeSorting is somewhat misnamed. The order between it and sortInputSections does not matter. With the pass shuffle, the name of adjustSectionsBeforeSorting becomes wrong. Therefore rename it. The new name is not set into stone. The function mixes several tasks and the code may be refactored in a way that we may give them more meaningful names. With this patch, I think the behavior of attribute propagation becomes more reasonable. In particular, in the absence of non-INSERT SECTIONS, inserting a section after a SHF_ALLOC one will give us a SHF_ALLOC section, not a non-SHF_ALLOC one (see linkerscript/insert-after.test). Reviewed By: peter.smith, bluca Differential Revision: https://reviews.llvm.org/D118529	2022-02-01 10:16:12 -08:00
Fangrui Song	0c3704fdbd	[ELF] Deduplicate names of local symbols only with -O2 The deduplication requires a DenseMap of the same size of the local part of .strtab . I optimized it in `e205445434` but it is still quite slow. For Release build of clang, deduplication makes .strtab 1.1% smaller and makes the link 3% slower. For chrome, deduplication makes .strtab 0.1% smaller and makes the link 6% slower. I suggest that we only perform the optimization with -O2 (default is -O1). Not deduplicating local symbol names will simplify parallel symbol table write. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D118577	2022-02-01 10:10:22 -08:00
Fangrui Song	17a39aecd1	[ELF] Simplify code with invokeELFT. NFC	2022-02-01 09:53:29 -08:00
Fangrui Song	7518d38f0a	[ELF] De-template LinkerDriver::link. NFC Replace `f<ELFT>(x)` with `InvokeELFT(f, x)`. The size reduction comes from turning `link` from 4 specializations into 1. My x86-64 lld executable is 26KiB smaller. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D118551	2022-02-01 09:47:56 -08:00
Fangrui Song	4d38d7684c	[ELF] Change vector<Symbol *> to SmallVector. NFC	2022-02-01 00:16:42 -08:00
Fangrui Song	196aedb843	[ELF] Change vector<InputSection *> to SmallVector. NFC My x86-64 lld executable is 8KiB smaller.	2022-02-01 00:14:21 -08:00
Fangrui Song	d97749fabc	[ELF] Switch split-stack to use SmallVector. NFC My x86-64 lld executable is 1.1KiB smaller.	2022-02-01 00:09:30 -08:00
Fangrui Song	7aaf024dac	[BitcodeWriter] Fix cases of some functions `WriteIndexToFile` is used by external projects so I do not touch it.	2022-01-31 16:46:11 -08:00
Fangrui Song	457273fda5	[ELF] splitStrings: replace entSize==1 special case with manual loop unswitch. NFC My x86-64 lld executable is actually smaller.	2022-01-30 17:15:45 -08:00
Fangrui Song	7cd0c45364	[ELF] Simplify SectionBase::partition handling and make it live by default. NFC Previously an InputSectionBase is dead (`partition==0`) by default. SyntheticSection calls markLive and BssSection overrides that with markDead. It is more natural to make InputSectionBase live by default and let --gc-sections mark InputSectionBase dead. When linking a Release build of clang: * --no-gc-sections:, the removed `inputSections` loop decreases markLive time from 4ms to 1ms. * --gc-sections: the extra `inputSections` loop increases markLive time from 0.181296s to 0.188526s. This is as of we lose the removing one `inputSections` loop optimization (`4374824ccf`). I believe the loss can be mitigated if we refactor markLive.	2022-01-30 15:12:09 -08:00
Fangrui Song	73fd7d2304	[ELF] Change splitSections to objectFiles based parallelForEach. NFC The work is more balanced.	2022-01-30 13:34:27 -08:00
Fangrui Song	5a2020d069	[ELF] copyShtGroup: replace unordered_set<uint32_t> with DenseSet<uint32_t>. NFC We don't need to support the empty/tombstone key section index.	2022-01-30 01:18:41 -08:00
Fangrui Song	f318fd9bf8	[ELF] crtbegin/crtend test: replace std::regex with hand-written matcher. NFC My x86-64 lld executable is 18KiB smaller.	2022-01-30 01:11:19 -08:00
Fangrui Song	fcd8817da5	[ELF] Simplify maybeCompress with lld::split. NFC	2022-01-30 00:44:19 -08:00
Fangrui Song	bc1369fae3	[ELF] Optimize MergeInputSection::splitNonStrings with resize_for_overwrite. NFC	2022-01-30 00:10:52 -08:00
Fangrui Song	988a03c585	[ELF] Add some MipsSection to InStruct and change make<MipsSection> to std::make_unique Similar to D116143. My x86-64 lld executable is 20+KiB smaller.	2022-01-29 23:55:29 -08:00
Fangrui Song	c0b986aa0c	[ELF] Remove make<std::unique_ptr<MemoryBuffer>>. NFC	2022-01-29 23:35:15 -08:00
Fangrui Song	8d8fce87bb	[ELF] De-template getErrorPlace. NFC	2022-01-29 23:05:54 -08:00
Fangrui Song	72a005bf19	[ELF] De-template getAndFeatures. NFC	2022-01-29 20:11:59 -08:00
Fangrui Song	d754c0b64f	[ELF] Make errorOrWarn opaque to decrease code size. NFC In my x86-64 lld, .text is -3.08Ki smaller.	2022-01-29 19:31:09 -08:00
Fangrui Song	ee647d4c96	[ELF] Optimize obj.getSectionIndex. NFC	2022-01-29 18:01:58 -08:00
Fangrui Song	5d00d37617	[ELF] Simplify eSyms. NFC	2022-01-29 17:00:38 -08:00
Fangrui Song	d86435c230	[ELF] createInputSection: remove unneeded argument. NFC	2022-01-29 16:52:32 -08:00
Fangrui Song	ee7720acd6	[ELF] Avoid repeated getObj construction in getSectionIndex. NFC	2022-01-29 16:51:00 -08:00
Fangrui Song	94e97e668c	[ELF] Reorder InputSectionBase::parent. NFC Move it before others.	2022-01-29 16:20:40 -08:00
Fangrui Song	b204d7c459	[ELF] Reorder InputFile members. NFC `symbols` is used frequently. Moving it before others can decrease offsets.	2022-01-29 16:10:52 -08:00
Fangrui Song	469c4124ab	[ELF] --gdb-index: switch to SmallVector. NFC	2022-01-29 15:24:56 -08:00
Fangrui Song	da0e5b885b	[ELF] Refactor -z combreloc * `RelocationBaseSection::addReloc` increases `numRelativeRelocs`, which duplicates the work done by RelocationSection<ELFT>::writeTo. * --pack-dyn-relocs=android has inappropropriate DT_RELACOUNT. AndroidPackedRelocationSection does not necessarily place relative relocations in the front and DT_RELACOUNT might cause semantics error (though our implementation doesn't and Android bionic doesn't use DT_RELACOUNT anyway.) Move `llvm::partition` to a new function `partitionRels` and compute `numRelativeRelocs` there. Now `RelocationBaseSection::addReloc` is trivial and can be moved to the header to enable inlining. The rest of DynamicReloc and `-z combreloc` handling is moved to the non-template `RelocationBaseSection::computeRels` to decrease code size. My x86-64 lld executable is 44+KiB smaller. While here, rename `sort` to `combreloc`.	2022-01-29 14:45:58 -08:00
Petr Hosek	71dcd9bd04	[ELF] Change the search order for dependent libraries When processing dependent libraries, if there's a directory of the same name as the library being searched for, either in the current directory or earlier in the search order, LLD will try to open it and report an error. This is because LLD uses file existence check. To address this issue we reverse the order, searching the library by basename first and only considering search paths later, and current directory last. Differential Revision: https://reviews.llvm.org/D118498	2022-01-28 20:46:01 -08:00
Fangrui Song	3bc152769d	[ELF] Parallelize computeIsPreemptible	2022-01-26 23:45:04 -08:00
Fangrui Song	1372d53639	[ELF] Optimize two vector. NFC	2022-01-26 23:10:40 -08:00
Fangrui Song	afeb4a6628	[ELF] Optimize -Map. NFC getVA is slow. Avoid calling it in the llvm::sort comparator.	2022-01-26 22:51:31 -08:00
Fangrui Song	14b7785c09	[ELF] Simplify InputSection::writeTo. NFC	2022-01-26 22:03:26 -08:00
Fangrui Song	913914f0f8	[ELF] Simplify writing the Elf_Chdr header. NFC And avoiding changing `size` in `writeTo`.	2022-01-26 10:23:56 -08:00
Fangrui Song	3704abaa16	[ELF] --gdb-index: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC	2022-01-25 23:53:23 -08:00
Fangrui Song	571d6a7120	[ELF] Optimize .relr.dyn to not grow vector<uint64_t>. NFC	2022-01-25 23:33:40 -08:00
Fangrui Song	9fac78d0e1	[ELF] Simplify and optimize .relr.dyn NFC	2022-01-25 22:50:03 -08:00
Fangrui Song	2a80c3dbe1	[ELF] Clarify that Z_BEST_SPEED==1 in a comment. NFC	2022-01-25 22:40:53 -08:00
Fangrui Song	07bd467643	[ELF] --build-id: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC We can't use C++20 make_unique_for_overwrite yet.	2022-01-25 22:39:43 -08:00
Fangrui Song	7438dbe078	[ELF] Cast size to size_t. NFC To fix ../../chromeclang/bin/../include/c++/v1/__algorithm/min.h:39:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'unsigned long long') on macOS arm64.	2022-01-25 22:38:24 -08:00
Fangrui Song	223f9dea3d	[ELF] maybeCompress: replace vector<uint8_t> with unique_ptr<uint8_t[]>. NFC And mention that it is zero-initialized. I do not notice a speed-up if changed to be uninitialized by forcing the zero filler in writeTo.	2022-01-25 22:15:44 -08:00
Fangrui Song	4cdc441690	[ELF] Parallelize --compress-debug-sections=zlib When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and in a --threads=8 link "Compress debug sections" takes ~70% time. This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly. DEFLATE blocks are a bit sequence. We need to ensure every shard starts at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can be used as well, but Z_FULL_FLUSH clears the hash table which just wastes time.) The last block requires the BFINAL flag. We call deflate with Z_FINISH to set the flag as well as flush the output to a byte boundary. Under the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a non-compressed block (called stored block in zlib). RFC1951 says "Any bits of input up to the next byte boundary are ignored." In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total speed is 2.54x. Because the hash table for one shard is not shared with the next shard, the output is slightly larger. Better compression ratio can be achieved by preloading the window size from the previous shard as dictionary (`deflateSetDictionary`), but that is overkill. ``` # 1MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.3% +129Ki [ = ] 0 .debug_str +0.1% +105Ki [ = ] 0 .debug_info +0.3% +101Ki [ = ] 0 .debug_line +0.2% +2.66Ki [ = ] 0 .debug_abbrev +0.0% +1.19Ki [ = ] 0 .debug_ranges +0.1% +341Ki [ = ] 0 TOTAL # 2MiB shards % bloaty clang.new -- clang.old FILE SIZE VM SIZE -------------- -------------- +0.2% +74.2Ki [ = ] 0 .debug_line +0.1% +72.3Ki [ = ] 0 .debug_str +0.0% +69.9Ki [ = ] 0 .debug_info +0.1% +976 [ = ] 0 .debug_abbrev +0.0% +882 [ = ] 0 .debug_ranges +0.0% +218Ki [ = ] 0 TOTAL ``` Bonus in not using zlib::compress * we can compress a debug section larger than 4GiB * peak memory usage is lower because for most shards the output size is less than 50% input size (all less than 55% for a large binary I tested, but decreasing the initial output size does not decrease memory usage) Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D117853	2022-01-25 10:29:04 -08:00
Fangrui Song	c03fdd3403	[ELF] Fix the branch range computation when reusing a thunk Notation: dst is `t->getThunkTargetSym()->getVA()` On AArch64, when `src-0x8000000-r_addend <= dst < src-0x8000000`, the condition `target->inBranchRange(rel.type, src, rel.sym->getVA(rel.addend))` may incorrectly consider a thunk reusable. `rel.addend = -getPCBias(rel.type)` resets the addend to 0 for AArch64/PPC and the zero addend is used by `rel.sym->getVA(rel.addend)` to check out-of-range relocations. See the test for a case this computation is wrong: `error: a.o:(.text_high+0x4): relocation R_AARCH64_JUMP26 out of range: -134217732 is not in [-134217728, 134217727]` I have seen a real world case with r_addend=19960. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D117734	2022-01-24 09:03:21 -08:00
serge-sans-paille	5f290c090a	Move STLFunctionalExtras out of STLExtras Only using that change in StringRef already decreases the number of preoprocessed lines from 7837621 to 7776151 for LLVMSupport Perhaps more interestingly, it shows that many files were relying on the inclusion of StringRef.h to have the declaration from STLExtras.h. This patch tries hard to patch relevant part of llvm-project impacted by this hidden dependency removal. Potential impact: - "llvm/ADT/StringRef.h" no longer includes <memory>, "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h" Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-24 14:13:21 +01:00
Alexandre Ganea	83d59e05b2	Re-land [LLD] Remove global state in lldCommon Move all variables at file-scope or function-static-scope into a hosting structure (lld::CommonLinkerContext) that lives at lldMain()-scope. Drivers will inherit from this structure and add their own global state, in the same way as for the existing COFFLinkerContext. See discussion in https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html The previous land `f860fe3622` caused issues in https://lab.llvm.org/buildbot/#/builders/123/builds/8383, fixed by `22ee510dac`. Differential Revision: https://reviews.llvm.org/D108850	2022-01-20 14:53:26 -05:00
Fangrui Song	a7a4115bf3	[ELF] Replace .zdebug string comparison with SHF_COMPRESSED check. NFC	2022-01-19 22:33:32 -08:00
Fangrui Song	03909c4400	[ELF] Remove StringRefZ StringRefZ does not improve performance. Non-local symbols always have eagerly computed nameSize. Most local symbols's lengths will be updated in either: * shouldKeepInSymtab * SymbolTableBaseSection::addSymbol Its benefit is offsetted by strlen in every call site (sums up to 5KiB code in a release x86-64 build), so using StringRefZ may be slower. In a -s link (uncommon) there is minor speedup, like ~0.3% for clang and chrome. Reviewed By: alexander-shaposhnikov Differential Revision: https://reviews.llvm.org/D117644	2022-01-19 20:09:41 -08:00
Fangrui Song	5bd38a2826	[ELF] Fix split-stack caller with hidden non-split-stack callee Fix a regression after `aabe901d57` (`[ELF] Remove one redundant computeBinding`): isLocal() does not indicate that the symbol is originally local. For simplicity, just drop this optimization.	2022-01-19 12:25:01 -08:00

1 2 3 4 5 ...

6698 Commits