llvm-project

Commit Graph

Author	SHA1	Message	Date
Shoaib Meenai	2f5d6a0ea5	[MachO] Fix struct size assertion std::vector can have different sizes depending on the STL's debug level, so account for its size separately. (You could argue that we should be accounting for all the other members separately as well, but that would be very unergonomic, and std::vector is the only one that's caused problems so far.)	2021-11-22 15:02:30 -08:00
Greg McGary	9cc489a4b2	[lld-macho][nfc] Factor-out NFC changes from main __eh_frame diff In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here. Differential Revision: https://reviews.llvm.org/D114017	2021-11-17 15:16:44 -07:00
Shoaib Meenai	01510ac084	[MachO] Move type size asserts to source files. NFC As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit unfortunate to move the asserts away from the structs whose sizes they're checking, but it's a far better developer experience when one of the asserts is violated, because you get a single error instead of every single source file including the header erroring out.	2021-11-16 17:14:16 -08:00
Vy Nguyen	3f35dd06a5	[lld-macho][nfc][cleanup] Fix a few code style lints and clang-tidy findings - Use .empty() instead of `size() == 0` when possible. - Use const-ref to avoid copying Differential Revision: https://reviews.llvm.org/D112978	2021-11-02 11:26:15 -04:00
Nico Weber	6503a68565	[lld/mac] Don't assert when ICFing arm64 code WordLiteralSection dedupes literals by content. WordLiteralInputSection::getOffset() used to read a literal at the passed-in offset and look up this value in the deduping map to find the offset of the deduped value. But it's possible that (e.g.) a 16-byte literal's value is accessed 4 bytes in. To get the offset at that address, we have to get the deduped value at offset 0 and then apply the offset 4 to the result. (See also WordLiteralSection::finalizeContents() which fills in those maps.) Only a problem on arm64 because in x86_64 the offset is part of the instruction instead of a separate ARM64_RELOC_ADDEND relocation. (See bug for more details.) Fixes PR51999. Differential Revision: https://reviews.llvm.org/D112584	2021-10-27 14:02:07 -04:00
Jez Ng	002eda7056	[lld-macho] Associate compact unwind entries with function symbols Compact unwind entries (CUEs) contain pointers to their respective function symbols. However, during the link process, it's far more useful to have pointers from the function symbol to the CUE than vice versa. This diff adds that pointer in the form of `Defined::compactUnwind`. In particular, when doing dead-stripping, we want to mark CUEs live when their function symbol is live; and when doing ICF, we want to dedup sections iff the symbols in that section have identical CUEs. In both cases, we want to be able to locate the symbols within a given section, as well as locate the CUEs belonging to those symbols. So this diff also adds `InputSection::symbols`. The ultimate goal of this refactor is to have ICF support dedup'ing functions with unwind info, but that will be handled in subsequent diffs. This diff focuses on simplifying `-dead_strip` -- `findFunctionsWithUnwindInfo` is no longer necessary, and `Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no longer has to check for dead CUEs -- we simply avoid adding them in the first place. Additionally, we now support stripping of dead LSDAs, which follows quite naturally since `markLive()` can now reach them via the CUEs. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D109944	2021-10-26 16:04:15 -04:00
Nico Weber	393116faad	[lld/mac] Remove "else" after return No behavior change	2021-07-22 21:31:52 -04:00
Jez Ng	428a7c1b38	[lld-macho] Have ICF operate on all sections at once ICF previously operated only within a given OutputSection. We would merge all CFStrings first, then merge all regular code sections in a second phase. This worked fine since CFStrings would never reference regular `__text` sections. However, I would like to expand ICF to merge functions that reference unwind info. Unwind info references the LSDA section, which can in turn reference the `__text` section, so we cannot perform ICF in phases. In order to have ICF operate on InputSections spanning multiple OutputSections, we need a way to distinguish InputSections that are destined for different OutputSections, so that we don't fold across section boundaries. We achieve this by creating OutputSections early, and setting `InputSection::parent` to point to them. This is what LLD-ELF does. (This change should also make it easier to implement the `section$start$` symbols.) This diff also folds InputSections w/o checking their flags, which I think is the right behavior -- if they are destined for the same OutputSection, they will have the same flags in the output (even if their input flags differ). I.e. the `parent` pointer check subsumes the `flags` check. In practice this has nearly no effect (ICF did not become any more effective on chromium_framework). I've also updated ICF.cpp's block comment to better reflect its current status. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D105641	2021-07-17 13:42:51 -04:00
Jez Ng	f6b6e72143	[lld-macho] Factor out common InputSection members We have been creating many ConcatInputSections with identical values due to .subsections_via_symbols. This diff factors out the identical values into a Shared struct, to reduce memory consumption and make copying cheaper. I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an extra word. All in all, this takes InputSection from 120 to 72 bytes (and ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in ConcatInputSection. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.14 4.24 4.18 4.183 0.027548999 + 20 4.04 4.11 4.075 4.0775 0.018027756 Difference at 95.0% confidence -0.1055 +/- 0.0149005 -2.52211% +/- 0.356215% (Student's t, pooled s = 0.0232803) Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105305	2021-07-01 21:22:39 -04:00
Jez Ng	ac2dd06b91	[lld-macho] Deduplicate CFStrings `__cfstring` is a special literal section, so instead of breaking it up at symbol boundaries, we break it up at fixed-width boundaries (since each literal is the same size). Symbols can only occur at one of those boundaries, so this is strictly more powerful than `.subsections_via_symbols`. With that in place, we then run the section through ICF. This change is about perf-neutral when linking chromium_framework. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D105045	2021-07-01 21:22:38 -04:00
Jez Ng	3a11528d97	[lld-macho] Move ICF earlier to avoid emitting redundant binds This is a pretty big refactoring diff, so here are the motivations: Previously, ICF ran after scanRelocations(), where we emitting bind/rebase opcodes etc. So we had a bunch of redundant leftovers after ICF. Having ICF run before Writer seems like a better design, and is what LLD-ELF does, so this diff refactors it accordingly. However, ICF had two dependencies on things occurring in Writer: 1) it needs literals to be deduplicated beforehand and 2) it needs to know which functions have unwind info, which was being handled by `UnwindInfoSection::prepareRelocations()`. In order to do literal deduplication earlier, we need to add literal input sections to their corresponding output sections. So instead of putting all input sections into the big `inputSections` vector, and then filtering them by type later on, I've changed things so that literal sections get added directly to their output sections during the 'gather' phase. Likewise for compact unwind sections -- they get added directly to the UnwindInfoSection now. This latter change is not strictly necessary, but makes it easier for ICF to determine which functions have unwind info. Adding literal sections directly to their output sections means that we can no longer determine `inputOrder` from iterating over `inputSections`. Instead, we store that order explicitly on InputSection. Bloating the size of InputSection for this purpose would be unfortunate -- but LLD-ELF has already solved this problem: it reuses `outSecOff` to store this order value. One downside of this refactor is that we now make an additional pass over the unwind info relocations to figure out which functions have unwind info, since want to know that before `processRelocations()`. I've made sure to run that extra loop only if ICF is enabled, so there should be no overhead in non-optimizing runs of the linker. The upside of all this is that the `inputSections` vector now contains only ConcatInputSections that are destined for ConcatOutputSections, so we can clean up a bunch of code that just existed to filter out other elements from that vector. I will test for the lack of redundant binds/rebases in the upcoming cfstring deduplication diff. While binds/rebases can also happen in the regular `.text` section, they're more common in `.data` sections, so it seems more natural to test it that way. This change is perf-neutral when linking chromium_framework. Reviewed By: oontvoo Differential Revision: https://reviews.llvm.org/D105044	2021-07-01 21:22:38 -04:00
Leonard Grey	a8a6e5b094	[lld-macho] Preserve alignment for non-deduplicated cstrings Fixes PR50637. Downstream bug: https://crbug.com/1218958 Currently, we split __cstring along symbol boundaries with .subsections_via_symbols when not deduplicating, and along null bytes when deduplicating. This change splits along null bytes unconditionally, and preserves original alignment in the non- deduplicated case. Removing subsections-section-relocs.s because with this change, __cstring is never reordered based on the order file. Differential Revision: https://reviews.llvm.org/D104919	2021-06-28 22:26:43 -04:00
Jez Ng	557e1fa02f	[lld-macho] Extend ICF to literal sections Literal sections can be deduplicated before running ICF. That makes it easy to compare them during ICF: we can tell if two literals are constant-equal by comparing their offsets in their OutputSection. LLD-ELF takes a similar approach. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D104671	2021-06-28 14:49:39 -04:00
Jez Ng	8aa17d1eae	[lld-macho] Move ICF members from InputSection to ConcatInputSection `icfEqClass` only makes sense on ConcatInputSections since (in contrast to literal sections) they are deduplicated as an atomic unit. Similarly, `hasPersonality` and `replacement` don't make sense on literal sections. This mirrors LLD-ELF, which stores `icfEqClass` only on non-mergeable sections. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D104670	2021-06-24 22:23:12 -04:00
Greg McGary	f27e4548fc	[lld-macho] Implement ICF ICF = Identical C(ode\|OMDAT) Folding This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs. `check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64. We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`. Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF: ``` N Min Max Median Avg Stddev x 20 4.27 4.44 4.34 4.349 0.043029977 + 20 4.37 4.46 4.405 4.4115 0.025188761 Difference at 95.0% confidence 0.0625 +/- 0.0225658 1.43711% +/- 0.518873% (Student's t, pooled s = 0.0352566) ``` Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D103292	2021-06-17 10:07:44 -07:00
Jez Ng	b8bbb9723a	[lld-macho][nfc] Put back shouldOmitFromOutput() asserts I removed them in rG5de7467e982 but @thakis pointed out that they were useful to keep, so here they are again. I've also converted the `!isCoalescedWeak()` asserts into `!shouldOmitFromOutput()` asserts, since the latter check subsumes the former. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104169	2021-06-16 15:23:04 -04:00
Jez Ng	b2a0739012	[lld-macho][nfc] Remove InputSection::outSecFileOff `outSecFileOff` and the associated `getFileOffset()` accessors were unnecessary. For all the cases we care about, `outSecFileOff` is the same as `outSecOff`. The only time they deviate is if there are zerofill sections within a given segment. But since zerofill sections are always at the end of a segment, the only sections where the two values deviate are zerofill sections themselves. And we never actually query the outSecFileOff of zerofill sections. As for `getFileOffset()`, the only place it was being used was to calculate the offset of the entry symbol. However, we can compute that value by just taking the difference between the address of the entry symbol and the address of the Mach-O header. In fact, this appears to be what ld64 itself does. This difference is the same as the file offset as long as there are no intervening zerofill sections, but since `__text` is the first section in `__TEXT`, this never happens, so our previous use of `getFileOffset()` was not wrong -- just inefficient. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104177	2021-06-13 19:51:30 -04:00
Jez Ng	5de7467e98	[lld-macho] Fix debug build D103977 broke a bunch of stuff as I had only tested the release build which eliminated asserts. I've retained the asserts where possible, but I also removed a bunch instead of adding a whole lot of verbose ConcatInputSection casts.	2021-06-11 20:21:27 -04:00
Jez Ng	464d3dc3d1	[lld-macho] Have dead-stripping work with literal sections Literal sections are not atomically live or dead. Rather, liveness is tracked for each individual literal they contain. CStrings have their liveness tracked via a `live` bit in StringPiece, and fixed-width literals have theirs tracked via a BitVector. The live-marking code now needs to track the offset within each section that is to be marked live, in order to identify the literal at that particular offset. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W with both `-dead_strip` and `--deduplicate-literals`, with and without this diff applied: ``` N Min Max Median Avg Stddev x 20 4.32 4.44 4.375 4.372 0.03105174 + 20 4.3 4.39 4.36 4.3595 0.023277502 No difference proven at 95.0% confidence ``` This gives us size savings of about 0.4%. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103979	2021-06-11 19:50:09 -04:00
Jez Ng	681cfeb591	[lld-macho][nfc] Have InputSection ctors take some parameters This is motivated by an upcoming diff in which the WordLiteralInputSection ctor sets itself up based on the value of its section flags. As such, it needs to be passed the `flags` value as part of its ctor parameters, instead of having them assigned after the fact in `parseSection()`. While refactoring code to make that possible, I figured it would make sense for the other InputSections to also take their initial values as ctor parameters. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103978	2021-06-11 19:50:09 -04:00
Jez Ng	5d88f2dd94	[lld-macho] Deduplicate fixed-width literals Conceptually, the implementation is pretty straightforward: we put each literal value into a hashtable, and then write out the keys of that hashtable at the end. In contrast with ELF, the Mach-O format does not support variable-length literals that aren't strings. Its literals are either 4, 8, or 16 bytes in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since we don't need to worry about overly-long values, we should be able to do a faster job by just hashing. That said, the implementation right now is far from optimal, because we add to those hashtables serially. To parallelize this, we'll need a basic concurrent hashtable (only needs to support concurrent writes w/o interleave reads), which shouldn't be to hard to implement, but I'd like to punt on it for now. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.27 4.39 4.315 4.3225 0.033225703 + 20 4.36 4.82 4.44 4.4845 0.13152846 Difference at 95.0% confidence 0.162 +/- 0.0613971 3.74783% +/- 1.42041% (Student's t, pooled s = 0.0959262) This corresponds to binary size savings of 2MB out of 335MB, or 0.6%. It's not a great tradeoff as-is, but as mentioned our implementation can be signficantly optimized, and literal dedup will unlock more opportunities for ICF to identify identical structures that reference the same literals. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D103113	2021-06-11 19:50:08 -04:00
Nico Weber	f2b1a1e10c	[lld/mac] Use sectionType() more Not sure sectionType() carries its weight, but while we have it we should use it consistently. No behavior change. Differential Revision: https://reviews.llvm.org/D104027	2021-06-11 11:15:47 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Alexander Shaposhnikov	dc2c6cf274	[lld][MachO] Adjust isCodeSection signature This diff changes the type of the argument of isCodeSection to const InputSection *. NFC. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D102664	2021-05-17 22:09:47 -07:00
Greg McGary	93c8559baf	[lld-macho] Implement branch-range-extension thunks Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it. Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`. FIXME: * arm64-stubs.s test is broken * add thunk tests * Handle thunks to DylibSymbol in MergedOutputSection::finalize() Differential Revision: https://reviews.llvm.org/D100818	2021-05-12 09:44:58 -07:00
Nico Weber	d5a70db193	[lld/mac] Write every weak symbol only once in the output Before this, if an inline function was defined in several input files, lld would write each copy of the inline function the output. With this patch, it only writes one copy. Reduces the size of Chromium Framework from 378MB to 345MB (compared to 290MB linked with ld64, which also does dead-stripping, which we don't do yet), and makes linking it faster: N Min Max Median Avg Stddev x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097 + 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012 Difference at 95.0% confidence -0.172162 +/- 0.083847 -4.14165% +/- 2.01709% (Student's t, pooled s = 0.0892373) Implementation-wise, when merging two weak symbols, this sets a "canOmitFromOutput" on the InputSection belonging to the weak symbol not put in the symbol table. We then don't write InputSections that have this set, as long as they are not referenced from other symbols. (This happens e.g. for object files that don't set .subsections_via_symbols or that use .alt_entry.) Some restrictions: - not yet done for bitcode inputs - no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless. - However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size) - This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now Things that work based on symbols Just Work: - map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit) - exports Things that work with inputSections need to explicitly check if an inputSection is written (e.g. unwind info). This patch is useful in itself, but it's also likely also a useful foundation for dead_strip. I used to have a "canoncialRepresentative" pointer on InputSection instead of just the bool, which would be handy for ICF too. But I ended up not needing it for this patch, so I removed that again for now. Differential Revision: https://reviews.llvm.org/D102076	2021-05-07 17:11:40 -04:00
Greg McGary	465204d63a	[lld-macho][NFC] define more strings in section_names:: and segment_names:: As preparation for a subsequent diff that implements builtin section renaming, define more `constexpr` strings in namespaces `lld::macho::segment_names` and `lld::macho::section_names`, and use them to replace string literals. Differential Revision: https://reviews.llvm.org/D101393	2021-04-27 17:48:45 -07:00
Jez Ng	1aa29dffce	[lld-macho] Support subtractor relocations that reference sections The minuend (but not the subtrahend) can reference a section. Note that we do not yet properly validate that the subtrahend isn't referencing a section; I've filed PR50034 to track that. I've also extended the reloc-subtractor.s test to reorder symbols, to make sure that the addends are being associated with the minuend (and not the subtrahend) relocation. Fixes PR49999. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D100804	2021-04-20 16:58:57 -04:00
Jez Ng	817d98d841	[lld-macho][nfc] Refactor in preparation for 32-bit support The main challenge was handling the different on-disk structures (e.g. `mach_header` vs `mach_header_64`). I tried to strike a balance between sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly) and templatizing everything (causes code bloat, also ugly). I think I struck a decent balance by judicious use of type erasure. Note that LLD-ELF has a similar architecture, though it seems to use more templating. Linking chromium_framework takes about the same time before and after this change: N Min Max Median Avg Stddev x 20 4.52 4.67 4.595 4.5945 0.044423204 + 20 4.5 4.71 4.575 4.582 0.056344803 No difference proven at 95.0% confidence Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D99633	2021-04-02 18:46:39 -04:00
Greg McGary	427d359721	[lld-macho][NFC] Drop unnecessary macho:: namespace prefix on unambiguous references to Symbol Within `lld/macho/`, only `InputFiles.cpp` and `Symbols.h` require the `macho::` namespace qualifier to disambiguate references to `class Symbol`. Add braces to outer `for` of a 5-level single-line `if`/`for` nest. Differential Revision: https://reviews.llvm.org/D99555	2021-03-30 14:58:35 -07:00
Jez Ng	a723db92d8	[lld-macho][nfc] Refactor subtractor reloc handling SUBTRACTOR relocations are always paired with UNSIGNED relocations to indicate a pair of symbols whose address difference we want. Functionally they are like a single relocation: only one pointer gets written / relocated. Previously, we would handle these pairs by skipping over the SUBTRACTOR relocation and writing the pointer when handling the UNSIGNED reloc. This diff reverses things, so we write while handling SUBTRACTORs and skip over the UNSIGNED relocs instead. Being able to distinguish between SUBTRACTOR and UNSIGNED relocs in the write phase (i.e. inside `relocateOne`) is useful for the upcoming range check diff: we want to check that SUBTRACTOR relocs write signed values, but UNSIGNED relocs (naturally) write unsigned values. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98386	2021-03-11 13:28:13 -05:00
Jez Ng	5433a79176	[lld-macho][nfc] Create Relocations.{h,cpp} for relocation-specific code This more closely mirrors the structure of lld-ELF. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D98384	2021-03-11 13:28:09 -05:00
Jez Ng	1752f28506	[lld-macho][nfc] Remove `MachO::` prefix where possible Previously, SyntheticSections.cpp did not have a top-level `using namespace llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would collide with `lld::macho::Symbol`. `MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this name collision in other files where we are only dealing with LLD's own symbols. Along the way, I removed all unnecessary "MachO::" prefixes in our code. Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other header file in the future, we could run into this collision again. Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different namespace. Most of the benefit of `using namespace llvm::MachO` comes from being able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a different (and fully-qualified) namespace like `llvm::tapi` that would solve our problems. Cons: lots of files across llvm-project will need to be updated, and folks who own the TextAPI code need to agree to the name change. Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is ugly. Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is worthwhile, this diff's halfway solution seems good enough to me. Thoughts? Reviewed By: #lld-macho, oontvoo, MaskRay Differential Revision: https://reviews.llvm.org/D98149	2021-03-11 13:28:08 -05:00
Greg McGary	fdc0c21973	[lld-macho][NFC] when reasonable, replace auto keyword with type names lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ... * redundancy, as for decls such as `auto t = mumble_cast<TYPE >` or similar that specifies the result type on the RHS * verbosity, as for iterators * gratuitous suffering, as for lambdas Along the way, add `const` when appropriate. Note: a future diff will ... * add more `const` qualifiers * remove `opt::` when we are already `using llvm::opt` Differential Revision: https://reviews.llvm.org/D98313	2021-03-09 22:08:32 -08:00
Jez Ng	541390131e	[lld-macho] Don't emit rebase opcodes for subtractor minuend relocs Also add a few asserts to verify that we are indeed handling an UNSIGNED relocation as the minued. I haven't made it an actual user-facing error since I don't think llvm-mc is capable of generating SUBTRACTOR relocations without an associated UNSIGNED. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D97103	2021-02-27 12:31:34 -05:00
Jez Ng	5e851733c5	[lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs I've adjusted the RelocAttrBits to better fit the semantics of the relocations. In particular: 1. _UNSIGNED relocations are no longer marked with the `TLV` bit, even though they can occur within TLV sections. Instead the `TLV` bit is reserved for relocations that can reference thread-local symbols, and _UNSIGNED relocations have their own `UNSIGNED` bit. The previous implementation caused TLV and regular UNSIGNED semantics to be conflated, resulting in rebase opcodes being incorrectly emitted for TLV relocations. 2. I've added a new `POINTER` bit to denote non-relaxable GOT relocations. This distinction isn't important on x86 -- the GOT relocations there are either relaxable or non-relaxable loads -- but arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent symbol is in (regardless of whether the symbol ends up in the GOT). This relocation must reference a GOT symbol (so must have the `GOT` bit set) but isn't itself relaxable (so must not have the `LOAD` bit). The `POINTER` bit is used for relocations that must reference a GOT slot. 3. A similar situation occurs for TLV relocations. 4. ld64 supports both a pcrel and an absolute version of ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version are pretty weird -- it results in the value of the GOT slot being written, rather than the address. (That means a reference to a dynamically-bound slot will result in zeroes being written.) The programs I've tried linking don't use this form of the relocation, so I've dropped our partial support for it by removing the relevant RelocAttrBits. Reviewed By: alexshap Differential Revision: https://reviews.llvm.org/D97031	2021-02-23 22:02:38 -05:00
Greg McGary	87104faac4	[lld-macho] Add ARM64 target arch This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working. I can add more tests to this base commit, or add them in follow-up commits. It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated. Differential Revision: https://reviews.llvm.org/D88629	2021-02-08 18:14:07 -07:00
Greg McGary	3a9d2f1488	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Jez Ng	daaaed6bb8	[lld-macho] Fix TLV data initialization We were mishandling the case where both `__tbss` and `__thread_data` sections were present. TLVP relocations should be encoded as offsets from the start of `__thread_data`, even if the symbol is actually located in `__thread_bss`. Previously, we were writing the offset from the start of the containing section, which doesn't really make sense since there's no way `tlv_get_addr()` can know which section a given `tlv$init` symbol is in at runtime. In addition, this patch ensures that we place `__thread_data` immediately before `__thread_bss`. This is what ld64 does, likely for performance reasons. Zerofill sections must also be at the end of their segments; we were already doing this, but now we ensure that `__thread_bss` occurs before `__bss`, so that it's always possible to have it contiguous with `__thread_data`. Fixes llvm.org/PR48657. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D94329	2021-01-08 18:48:12 -05:00
Jez Ng	7b007ac080	[lld-macho][nfc] Move some methods from InputFile to ObjFile Additionally: 1. Move the helper functions in InputSection.h below the definition of `InputSection`, so the important stuff is on top 2. Remove unnecessary `explicit` Reviewed By: #lld-macho, compnerd Differential Revision: https://reviews.llvm.org/D92453	2020-12-08 10:34:32 -08:00
Jez Ng	c7dbaec396	[lld-macho] Add isCodeSection() This is the same logic that ld64 uses to determine which sections contain functions. This was added so that we could determine which STABS entries should be N_FUN. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D92430	2020-12-01 15:05:21 -08:00
Greg McGary	1a3ef0417c	[lld-macho] In the context of relocs, s/target/referent/ for sections & symbols The word "target" is overloaded, so lighten its load by using another word to denote the symbol or section to which a reloc points. While more stilted than "target", "referent" is rather less pompous than "designatum" or "denotatum". :P Along the way, make a few neighboring variable names more descriptive. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D87584	2020-09-22 20:31:01 -07:00
Jez Ng	3c9100fb78	[lld-macho] Support dynamic linking of thread-locals References to symbols in dylibs work very similarly regardless of whether the symbol is a TLV. The main difference is that we have a separate `__thread_ptrs` section that acts as the GOT for these thread-locals. We can identify thread-locals in dylibs by a flag in their export trie entries, and we cross-check it with the relocations that refer to them to ensure that we are not using a GOT relocation to reference a thread-local (or vice versa). Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D85081	2020-08-12 19:50:09 -07:00
Jez Ng	ca85e37338	[lld-macho] Support static linking of thread-locals Note: What ELF refers to as "TLS", Mach-O seems to refer to as "TLV", i.e. thread-local variables. This diff implements support for TLV relocations that reference defined symbols. On x86_64, TLV relocations are always used with movq opcodes, so for defined TLVs, we don't need to create a synthetic section to store the addresses of the symbols -- we can just convert the `movq` to a `leaq`. One notable quirk of Mach-O's TLVs is that absolute-address relocations inside TLV-defining sections behave differently -- their addresses are no longer absolute, but relative to the start of the target section. (AFAICT, RIP-relative relocations are not allowed in these sections.) Reviewed By: #lld-macho, compnerd, smeenai Differential Revision: https://reviews.llvm.org/D85080	2020-08-07 11:04:52 -07:00
Jez Ng	74871cdad7	[lld-macho] Ensure __bss sections we output have file offset of zero Summary: llvm-mc emits `__bss` sections with an offset of zero, but we weren't expecting that in our input, so we were copying non-zero data from the start of the file and putting it in `__bss`, with obviously undesirable runtime results. (It appears that the kernel will copy those nonzero bytes as long as the offset is nonzero, regardless of whether S_ZERO_FILL is set.) I debated on whether to make a special ZeroFillSection -- separate from a regular InputSection -- but it seemed like too much work for now. But I'm happy to refactor if anyone feels strongly about having it as a separate class. Depends on D80857. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Reviewed By: smeenai Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80859	2020-06-17 20:41:28 -07:00
Jez Ng	a12e7d406d	[lld-macho] Handle GOT relocations of non-dylib symbols Summary: Turns out this case is actually really common -- it happens whenever there's a reference to an `extern` variable that ends up statically linked. Depends on D80856. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Reviewed By: smeenai Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80857	2020-06-17 20:41:28 -07:00
Jez Ng	53c796b948	[lld-macho] Properly handle & validate relocation r_length Summary: We should be reading / writing our addends / relocated addresses based on r_length, and not just based on the type of the relocation. But since only some r_length values are valid for a given reloc type, I've also added some validation. ld64 has code to allow for r_length = 0 in X86_64_RELOC_BRANCH relocs, but I'm not sure how to create such a relocation... Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D80854	2020-06-14 16:35:23 -07:00
Jez Ng	df2a5778c3	[lld-macho] Error on encountering undefined symbols ... instead of silently emitting a reference to the zero address. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D80169	2020-06-02 13:19:38 -07:00
Jez Ng	ce0d8beebc	[lld-macho][re-land] Support X86_64_RELOC_UNSIGNED This reverts commit `db8559eee4`.	2020-05-19 12:31:55 -07:00

1 2

65 Commits