llvm-project

Commit Graph

Author	SHA1	Message	Date
Jez Ng	7f2ba39b16	[lld-macho][nfc] Move liveness-tracking fields into ConcatInputSection These fields currently live in the parent InputSection class, but they should be specific to ConcatInputSection, since the other InputSection classes (that contain literals) aren't atomically live or dead -- rather their component string/int literals should have individual liveness states. (An upcoming diff will add liveness bits for StringPieces and fixed-sized literals.) I also factored out some asserts for isCoalescedWeak() in MarkLive.cpp. We now avoid putting coalesced sections in the `inputSections` vector, so we don't have to check/assert against it everywhere. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103977	2021-06-11 19:50:08 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Nico Weber	a5645513db	[lld/mac] Implement -dead_strip Also adds support for live_support sections, no_dead_strip sections, .no_dead_strip symbols. Chromium Framework 345MB unstripped -> 250MB stripped (vs 290MB unstripped -> 236M stripped with ld64). Doing dead stripping is a bit faster than not, because so much less data needs to be processed: % ministat lld_* x lld_nostrip.txt + lld_strip.txt N Min Max Median Avg Stddev x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794 + 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651 Difference at 95.0% confidence -0.144711 +/- 0.0336749 -3.60967% +/- 0.839989% (Student's t, pooled s = 0.0358398) This interacts with many parts of the linker. I tried to add test coverage for all added `isLive()` checks, so that some test will fail if any of them is removed. I checked that the test expectations for the most part match ld64's behavior (except for live-support-iterations.s, see the comment in the test). Interacts with: - debug info - export tries - import opcodes - flags like -exported_symbol(s_list) - -U / dynamic_lookup - mod_init_funcs, mod_term_funcs - weak symbol handling - unwind info - stubs - map files - -sectcreate - undefined, dylib, common, defined (both absolute and normal) symbols It's possible it interacts with more features I didn't think of, of course. I also did some manual testing: - check-llvm check-clang check-lld work with lld with this patch as host linker and -dead_strip enabled - Chromium still starts - Chromium's base_unittests still pass, including unwind tests Implemenation-wise, this is InputSection-based, so it'll work for object files with .subsections_via_symbols (which includes all object files generated by clang). I first based this on the COFF implementation, but later realized that things are more similar to ELF. I think it'd be good to refactor MarkLive.cpp to look more like the ELF part at some point, but I'd like to get a working state checked in first. Mechanical parts: - Rename canOmitFromOutput to wasCoalesced (no behavior change) since it really is for weak coalesced symbols - Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP (`.no_dead_strip` in asm) Fixes PR49276. Differential Revision: https://reviews.llvm.org/D103324	2021-06-02 11:09:26 -04:00
Jez Ng	33706191d8	[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection The ELF format has the concept of merge sections (marked by SHF_MERGE), which contain data that can be safely deduplicated. The Mach-O equivalents are called literal sections (marked by S_CSTRING_LITERALS or S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word 'merge', to avoid confusion, I've renamed our MergedOutputSection to ConcatOutputSection. I believe it's a more descriptive name too. This renaming sets the stage for {D102964}. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102971	2021-05-25 14:58:29 -04:00
Jez Ng	9cc0d893f7	[lld-macho][nfc] clang-format everything	2021-05-25 14:58:29 -04:00
Nico Weber	4a12248ee2	[lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619	2021-05-17 14:22:12 -04:00
Nico Weber	7b6dd265ce	[lld/mac] Copy some of the commit message of `d5a70db193` into a comment	2021-05-08 13:03:17 -04:00
Nico Weber	d5a70db193	[lld/mac] Write every weak symbol only once in the output Before this, if an inline function was defined in several input files, lld would write each copy of the inline function the output. With this patch, it only writes one copy. Reduces the size of Chromium Framework from 378MB to 345MB (compared to 290MB linked with ld64, which also does dead-stripping, which we don't do yet), and makes linking it faster: N Min Max Median Avg Stddev x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097 + 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012 Difference at 95.0% confidence -0.172162 +/- 0.083847 -4.14165% +/- 2.01709% (Student's t, pooled s = 0.0892373) Implementation-wise, when merging two weak symbols, this sets a "canOmitFromOutput" on the InputSection belonging to the weak symbol not put in the symbol table. We then don't write InputSections that have this set, as long as they are not referenced from other symbols. (This happens e.g. for object files that don't set .subsections_via_symbols or that use .alt_entry.) Some restrictions: - not yet done for bitcode inputs - no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless. - However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size) - This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now Things that work based on symbols Just Work: - map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit) - exports Things that work with inputSections need to explicitly check if an inputSection is written (e.g. unwind info). This patch is useful in itself, but it's also likely also a useful foundation for dead_strip. I used to have a "canoncialRepresentative" pointer on InputSection instead of just the bool, which would be handy for ICF too. But I ended up not needing it for this patch, so I removed that again for now. Differential Revision: https://reviews.llvm.org/D102076	2021-05-07 17:11:40 -04:00
Jez Ng	05c5363b39	[lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag Eventually we'll use this flag to properly handle bl/blx opcodes. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D101558	2021-04-30 16:17:26 -04:00
Jez Ng	7ca133c360	[lld-macho] std::sort -> llvm::sort	2021-04-27 18:02:59 -04:00
Nico Weber	c1b2a7bfbf	[lld/mac] make a few "named parameter comments" more consistent Most of LLVM and almost all of lld/MachO uses `/foo=/bar` style. No behavior change.	2021-04-22 10:48:03 -04:00
Jez Ng	1460942c15	[lld-macho] Add 32-bit compact unwind support This could probably have been part of D99633, but I split it up to make things a bit more reviewable. I also fixed some bugs in the implementation that were masked through integer underflows when operating in 64-bit mode. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D99823	2021-04-15 21:16:33 -04:00
Jez Ng	8ca366935b	Revert "[lld-macho] Add support for arm64_32" and other stacked diffs This reverts commits: * `8914902b01` * `35a745d814` * `682d1dfe09`	2021-04-13 12:40:58 -04:00
Jez Ng	35a745d814	[lld-macho] Add 32-bit compact unwind support This could probably have been part of D99633, but I split it up to make things a bit more reviewable. I also fixed some bugs in the implementation that were masked through integer underflows when operating in 64-bit mode. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D99823	2021-04-13 10:43:28 -04:00
Jez Ng	817d98d841	[lld-macho][nfc] Refactor in preparation for 32-bit support The main challenge was handling the different on-disk structures (e.g. `mach_header` vs `mach_header_64`). I tried to strike a balance between sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly) and templatizing everything (causes code bloat, also ugly). I think I struck a decent balance by judicious use of type erasure. Note that LLD-ELF has a similar architecture, though it seems to use more templating. Linking chromium_framework takes about the same time before and after this change: N Min Max Median Avg Stddev x 20 4.52 4.67 4.595 4.5945 0.044423204 + 20 4.5 4.71 4.575 4.582 0.056344803 No difference proven at 95.0% confidence Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D99633	2021-04-02 18:46:39 -04:00
Alexander Shaposhnikov	f6ad045366	[lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols This diff addresses FIXME in SyntheticSections.cpp and removes the dependency of emitEndFunStab on .subsections_via_symbols. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D99054	2021-04-01 17:48:09 -07:00
Greg McGary	427d359721	[lld-macho][NFC] Drop unnecessary macho:: namespace prefix on unambiguous references to Symbol Within `lld/macho/`, only `InputFiles.cpp` and `Symbols.h` require the `macho::` namespace qualifier to disambiguate references to `class Symbol`. Add braces to outer `for` of a 5-level single-line `if`/`for` nest. Differential Revision: https://reviews.llvm.org/D99555	2021-03-30 14:58:35 -07:00
Greg McGary	98fe9e41f7	[lld-macho][NFC] add const to pointer/reference induction variables of range-based for loops Pointer and reference induction variables of range-based for loops are often const, and code authors often lax about qualifying them. Differential Revision: https://reviews.llvm.org/D98317	2021-03-10 12:07:31 -08:00
Greg McGary	fdc0c21973	[lld-macho][NFC] when reasonable, replace auto keyword with type names lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ... * redundancy, as for decls such as `auto t = mumble_cast<TYPE >` or similar that specifies the result type on the RHS * verbosity, as for iterators * gratuitous suffering, as for lambdas Along the way, add `const` when appropriate. Note: a future diff will ... * add more `const` qualifiers * remove `opt::` when we are already `using llvm::opt` Differential Revision: https://reviews.llvm.org/D98313	2021-03-09 22:08:32 -08:00
Nico Weber	0658fc654c	[lld/mac] Implement the missing bits of -undefined This adds support for `-undefined dynamic_lookup`, and for `-undefined warning` and `-undefined suppress` with `-flat_namespace`. We just replace undefined symbols with a DynamicLookup when we hit them. With this, `check-llvm` passes when using ld64.lld.darwinnew as host linker. Differential Revision: https://reviews.llvm.org/D97642	2021-03-01 15:30:53 -05:00
Jez Ng	4a5e111aea	[lld-macho] Better deduplication of personality pointers {D95809} introduced a mechanism for synthetic symbol creation of personality pointers. When multiple section relocations referred to the same personality pointer, it would deduplicate them. However, it neglected to consider that we could have symbol relocations that also refer to the same personality pointer. This diff fixes it. In practice, this mix of relocations arises when there is a statically-linked personality routine that is referenced from multiple object files. Within the same object file, it will be referred to via section relocations, but (obviously) other object files will refer to it via symbol relocations. Failing to deduplicate these references resulted in us going over the 3-personality-pointer limit when linking some larger applications. Fixes llvm.org/PR48389. Reviewed By: #lld-macho, thakis, alexshap Differential Revision: https://reviews.llvm.org/D97245	2021-02-23 22:02:38 -05:00
Jez Ng	ac9dd247da	[lld-macho] Try to make ubsan happy Summary: We should avoid passing a null pointer to memcpy.	2021-02-08 14:51:36 -05:00
Jez Ng	5112035751	[lld-macho] Emit LSDA info in compact unwind The LSDA pointers are encoded as offsets from the image base, and arranged in one big contiguous array. Each second-level page records the offset within that LSDA array which corresponds to the LSDA for its first CU entry. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D95810	2021-02-08 13:48:00 -05:00
Jez Ng	525bfa10ec	[lld-macho] Emit personalities in compact unwind Note that there is a triple indirection involved with personalities and compact unwind: 1. Two bits of each CU encoding are used as an offset into the personality array. 2. Each entry of the personality array is an offset from the image base. The resulting address (after adding the image base) should point within the GOT. 3. The corresponding GOT entry contains the actual pointer to the personality function. To further complicate things, when the personality function is in the object file (as opposed to a dylib), its references in `__compact_unwind` may refer to it via a section + offset relocation instead of a symbol relocation. Since our GOT implementation can only create entries for symbols, we have to create a synthetic symbol at the given section offset. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D95809	2021-02-08 13:47:59 -05:00
Greg McGary	c3e4f3b231	[lld-macho] Fix alignment & layout to match ld64 and satisfy kernel & codesign The Mach kernel & codesign on arm64 macOS has strict requirements for alignment and sequence of segments and sections. Dyld probably is just as picky, though kernel & codesign reject malformed Mach-O files before dyld ever has a chance. I developed this diff by incrementally changing alignments & sequences to match the output of ld64. I stopped when my hello-world test program started working: `codesign --verify` succeded, and `execve(2)` didn't immediately fail with `errno == EBADMACHO` = `"Malformed Mach-O file"`. Differential Revision: https://reviews.llvm.org/D94935	2021-02-05 17:22:03 -07:00
Nico Weber	568824798f	fix typo to cycle bots	2021-01-01 22:28:11 -05:00
Fangrui Song	791fe7ac57	[lld-macho] Fix memcpy ub after D93267	2020-12-20 20:01:20 -08:00
Greg McGary	99930719c6	Handle overflow beyond the 127 common encodings limit The common encodings table holds only 127 entries. The encodings index for compact entries is 8 bits wide, and indexes 127..255 are stored locally to each second-level page. Prior to this diff, lld would `fatal()` if encodings overflowed the 127 limit. This diff populates a per-second-level-page encodings table as needed. When the per-page encodings table hits its limit, we must terminate the page. If such early termination would consume fewer entries than a regular (non-compact) encoding page, then we prefer the regular format. Caveat: one reason the common-encoding table might overflow is because of DWARF debug-info references, which are not yet implemented and will come with a later diff. Differential Revision: https://reviews.llvm.org/D93267	2020-12-19 14:54:37 -08:00
Nico Weber	126f58e838	fix typos to cycle bots	2020-12-01 20:27:33 -05:00
Greg McGary	cba45514fb	align __TEXT,__unwind_info to 8 byte boundary	2020-09-19 12:43:30 -07:00
Greg McGary	2124ca1d5c	[lld-macho] create __TEXT,__unwind_info from __LD,__compact_unwind Digest the input `__LD,__compact_unwind` and produce the output `__TEXT,__unwind_info`. This is the initial commit with the major functionality. Successor commits will add handling for ... * `__TEXT,__eh_frame` * personalities & LSDA * `-r` pass-through Differential Revision: https://reviews.llvm.org/D86805	2020-09-18 22:01:03 -07:00

1 2

81 Commits