llvm-project

Commit Graph

Author	SHA1	Message	Date
Greg McGary	f27e4548fc	[lld-macho] Implement ICF ICF = Identical C(ode\|OMDAT) Folding This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs. `check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64. We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`. Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF: ``` N Min Max Median Avg Stddev x 20 4.27 4.44 4.34 4.349 0.043029977 + 20 4.37 4.46 4.405 4.4115 0.025188761 Difference at 95.0% confidence 0.0625 +/- 0.0225658 1.43711% +/- 0.518873% (Student's t, pooled s = 0.0352566) ``` Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D103292	2021-06-17 10:07:44 -07:00
Jez Ng	b2a0739012	[lld-macho][nfc] Remove InputSection::outSecFileOff `outSecFileOff` and the associated `getFileOffset()` accessors were unnecessary. For all the cases we care about, `outSecFileOff` is the same as `outSecOff`. The only time they deviate is if there are zerofill sections within a given segment. But since zerofill sections are always at the end of a segment, the only sections where the two values deviate are zerofill sections themselves. And we never actually query the outSecFileOff of zerofill sections. As for `getFileOffset()`, the only place it was being used was to calculate the offset of the entry symbol. However, we can compute that value by just taking the difference between the address of the entry symbol and the address of the Mach-O header. In fact, this appears to be what ld64 itself does. This difference is the same as the file offset as long as there are no intervening zerofill sections, but since `__text` is the first section in `__TEXT`, this never happens, so our previous use of `getFileOffset()` was not wrong -- just inefficient. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104177	2021-06-13 19:51:30 -04:00
Jez Ng	681cfeb591	[lld-macho][nfc] Have InputSection ctors take some parameters This is motivated by an upcoming diff in which the WordLiteralInputSection ctor sets itself up based on the value of its section flags. As such, it needs to be passed the `flags` value as part of its ctor parameters, instead of having them assigned after the fact in `parseSection()`. While refactoring code to make that possible, I figured it would make sense for the other InputSections to also take their initial values as ctor parameters. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103978	2021-06-11 19:50:09 -04:00
Nico Weber	f2b1a1e10c	[lld/mac] Use sectionType() more Not sure sectionType() carries its weight, but while we have it we should use it consistently. No behavior change. Differential Revision: https://reviews.llvm.org/D104027	2021-06-11 11:15:47 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Nico Weber	a5645513db	[lld/mac] Implement -dead_strip Also adds support for live_support sections, no_dead_strip sections, .no_dead_strip symbols. Chromium Framework 345MB unstripped -> 250MB stripped (vs 290MB unstripped -> 236M stripped with ld64). Doing dead stripping is a bit faster than not, because so much less data needs to be processed: % ministat lld_* x lld_nostrip.txt + lld_strip.txt N Min Max Median Avg Stddev x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794 + 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651 Difference at 95.0% confidence -0.144711 +/- 0.0336749 -3.60967% +/- 0.839989% (Student's t, pooled s = 0.0358398) This interacts with many parts of the linker. I tried to add test coverage for all added `isLive()` checks, so that some test will fail if any of them is removed. I checked that the test expectations for the most part match ld64's behavior (except for live-support-iterations.s, see the comment in the test). Interacts with: - debug info - export tries - import opcodes - flags like -exported_symbol(s_list) - -U / dynamic_lookup - mod_init_funcs, mod_term_funcs - weak symbol handling - unwind info - stubs - map files - -sectcreate - undefined, dylib, common, defined (both absolute and normal) symbols It's possible it interacts with more features I didn't think of, of course. I also did some manual testing: - check-llvm check-clang check-lld work with lld with this patch as host linker and -dead_strip enabled - Chromium still starts - Chromium's base_unittests still pass, including unwind tests Implemenation-wise, this is InputSection-based, so it'll work for object files with .subsections_via_symbols (which includes all object files generated by clang). I first based this on the COFF implementation, but later realized that things are more similar to ELF. I think it'd be good to refactor MarkLive.cpp to look more like the ELF part at some point, but I'd like to get a working state checked in first. Mechanical parts: - Rename canOmitFromOutput to wasCoalesced (no behavior change) since it really is for weak coalesced symbols - Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP (`.no_dead_strip` in asm) Fixes PR49276. Differential Revision: https://reviews.llvm.org/D103324	2021-06-02 11:09:26 -04:00
Jez Ng	33706191d8	[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection The ELF format has the concept of merge sections (marked by SHF_MERGE), which contain data that can be safely deduplicated. The Mach-O equivalents are called literal sections (marked by S_CSTRING_LITERALS or S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word 'merge', to avoid confusion, I've renamed our MergedOutputSection to ConcatOutputSection. I believe it's a more descriptive name too. This renaming sets the stage for {D102964}. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102971	2021-05-25 14:58:29 -04:00

7 Commits