llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	7effcbda49	Rename parallelForEachN to just parallelFor Patch created by running: rg -l parallelForEachN \| xargs sed -i '' -c 's/parallelForEachN/parallelFor/' No behavior change. Differential Revision: https://reviews.llvm.org/D128140	2022-06-19 17:49:00 -04:00
Daniel Bertalan	f2e92cf60e	[lld-macho] Print the name of functions containing undefined references The error used to look like this: ld64.lld: error: undefined symbol: _foo >>> referenced by /path/to/bar.o Now it displays the name of the function that contains the undefined reference as well: ld64.lld: error: undefined symbol: _foo >>> referenced by /path/to/bar.o:(symbol _baz+0x4) Differential Revision: https://reviews.llvm.org/D127696	2022-06-14 09:41:28 -04:00
Jez Ng	e183bf8e15	[lld-macho][reland] Initial support for EH Frames This reverts commit `942f4e3a7c`. The additional change required to avoid the assertion errors seen previously is: --- a/lld/MachO/ICF.cpp +++ b/lld/MachO/ICF.cpp @@ -443,7 +443,9 @@ void macho::foldIdenticalSections() { /relocVA=/0); isec->data = copy; } - } else { + } else if (!isEhFrameSection(isec)) { + // EH frames are gathered as hashables from unwindEntry above; give a + // unique ID to everything else. isec->icfEqClass[0] = ++icfUniqueID; } } Differential Revision: https://reviews.llvm.org/D123435	2022-06-13 07:45:16 -04:00
Douglas Yung	942f4e3a7c	Revert "[lld-macho] Initial support for EH Frames" This reverts commit `826be330af`. This was causing a test failure on build bots: - https://lab.llvm.org/buildbot/#/builders/36/builds/21770 - https://lab.llvm.org/buildbot/#/builders/58/builds/23913	2022-06-09 05:25:43 -07:00
Jez Ng	826be330af	[lld-macho] Initial support for EH Frames == Background == `llvm-mc` generates unwind info in both compact unwind and DWARF formats. LLD already handles the compact unwind format; this diff gets us close to handling the DWARF format properly. == Caveats == It's not quite done yet, but I figure it's worth getting this reviewed and landed first as it's shaping up to be a fairly large code change. Known limitations of the current code: * Only works for x86_64, for which `llvm-mc` emits "abs-ified" relocations as described in `618def651b`. `llvm-mc` emits regular relocations for ARM EH frames, which we do not yet handle correctly. Since the feature is not ready for real use yet, I've gated it behind a flag that only gets toggled on during test suite runs. With most of the new code disabled, we see just a hint of perf regression, so I don't think it'd be remiss to land this as-is: base diff difference (95% CI) sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%] user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%] wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%] samples 30 31 == Design == Like compact unwind entries, EH frames are also represented as regular ConcatInputSections that get pointed to via `Defined::unwindEntry`. This allows them to be handled generically by e.g. the MarkLive and ICF code. (But note that unlike compact unwind subsections, EH frame subsections do end up in the final binary.) In order to make EH frames "look like" a regular ConcatInputSection, some processing is required. First, we need to split the `__eh_frame` section along EH frame boundaries rather than along symbol boundaries. We do this by decoding the length field of each EH frame. Second, the abs-ified relocations need to be turned into regular Relocs. == Next Steps == In order to support EH frames on ARM targets, we will either have to teach LLD how to handle EH frames with explicit relocs, or we can try to make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do the latter as I think it will make the LLD implementation both simpler and faster to execute. == Misc == The `obj-file-with-stabs.s` test had to be updated as the previous version would trip assertion errors in the code. It appears that in our attempt to produce a minimal YAML test input, we created a file with invalid EH frame data. I've fixed this by re-generating the YAML and not doing any hand-pruning of it. Reviewed By: #lld-macho, Roger Differential Revision: https://reviews.llvm.org/D123435	2022-06-08 23:40:52 -04:00
Alex Brachet	190b0f42cf	[lld-macho] Stop crash when emitting personalities with -dead_strip The <internal> symbol was tripping an assertion in getVA() because it was not marked as used. Per the comment above that symbols creation, dead stripping has already occurred so marking this symbol as used is accurate. Fixes https://github.com/llvm/llvm-project/issues/55565 Differential revision: https://reviews.llvm.org/D126072	2022-05-20 21:40:47 +00:00
Jez Ng	2a6669060f	[lld-macho][nfc] De-templatize UnwindInfoSection Follow-on to {D123276}. Now that we work with an internal representation of compact unwind entries, we no longer need to template our UnwindInfoSectionImpl code based on the pointer size of the target architecture. I've still kept the split between `UnwindInfoSectionImpl` and `UnwindInfoSection`. I'd introduced that split in order to do type erasure, but I think it's still useful to have in order to keep `UnwindInfoSection`'s definition in the header file clean. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D123277	2022-04-13 16:19:22 -04:00
Jez Ng	1cff723ff5	[lld-macho][nfc] Use includeInSymtab for all symtab-skipping logic {D123302} got me looking deeper at `includeInSymtab`. I thought it was a little odd that there were excluded (live) symbols for which `includeInSymtab` was false; we shouldn't have so many different ways to exclude a symbol. As such, this diff makes the `L`-prefixed-symbol exclusion code use `includeInSymtab` too. (Note that as part of our support for `__eh_frame`, we will also be excluding all `__eh_frame` symbols from the symtab in a future diff.) Another thing I noticed is that the `emitStabs` code never has to deal with excluded symbols because `SymtabSection::finalize()` already filters them out. As such, I've updated the comments and asserts from {D123302} to reflect this. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D123433	2022-04-11 15:45:46 -04:00
Jez Ng	82dcf30636	[lld-macho] Use fewer indirections in UnwindInfo implementation The previous implementation of UnwindInfoSection materialized all the compact unwind entries & applied their relocations, then parsed the resulting data to generate the final unwind info. This design had some unfortunate conseqeuences: since relocations can only be applied after their referents have had addresses assigned, operations that need to happen before address assignment must contort themselves. (See {D113582} and observe how this diff greatly simplifies it.) Moreover, it made synthesizing new compact unwind entries awkward. Handling PR50956 will require us to do this synthesis, and is the main motivation behind this diff. Previously, instead of generating a new CompactUnwindEntry directly, we would have had to generate a ConcatInputSection with a number of `Reloc`s that would then get "flattened" into a CompactUnwindEntry. This diff introduces an internal representation of `CompactUnwindEntry` (the former `CompactUnwindEntry` has been renamed to `CompactUnwindLayout`). The new CompactUnwindEntry stores references to its personality symbol and LSDA section directly, without the use of `Reloc` structs. In addition to being easier to work with, this diff also allows us to handle unwind info whose personality symbols are located in sections placed after the `__unwind_info`. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D123276	2022-04-08 23:49:07 -04:00
Jez Ng	06f863ac5e	[lld-macho] Include address offsets in error messages This makes it easier to pinpoint the source of the problem. TODO: Have more relocation error messages make use of this functionality. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D118798	2022-02-07 21:06:18 -05:00
Jez Ng	3e951808d5	[lld-macho][nfc] Comments and style fixes Added some comments (particularly around finalize() and finalizeContents()) as well as doing some rephrasing / grammar fixes for existing comments. Also did some minor style fixups, such as by putting methods together in a class definition and having fields of similar types next to each other. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D118714	2022-02-01 13:45:59 -05:00
Fangrui Song	0aae2bf373	[lld-macho] Add --start-lib --end-lib In ld.lld, when an ObjFile/BitcodeFile is read in --start-lib state, the file is given archive semantics. --end-lib closes the previous --start-lib. A build system can use this feature as an alternative to archives. This patch ports the feature to lld-macho. --start-lib and --end-lib are positional, unlike usual ld64 options. I think the slight drawback does not matter as (a) reusing option names make build systems convenient (b) `--start-lib a.o b.o --end-lib` conveys more information than an alternative design: `-objlib a.o -objlib b.o` because --start-lib makes it clear which objects are in the same conceptual archive. This provides flexibility (c) `-objlib`/`-filelist` interaction may be weird. Close https://github.com/llvm/llvm-project/issues/52931 Reviewed By: #lld-macho, Jez Ng, oontvoo Differential Revision: https://reviews.llvm.org/D116913	2022-01-19 10:14:49 -08:00
Fangrui Song	97a5dccb7d	[lld-macho] Rename LazySymbol to LazyArchive. NFC D116913 will add LazyObject. Rename LazySymbol to LazyArchive to avoid confusion and mirror ELF. Reviewed By: #lld-macho, Jez Ng Differential Revision: https://reviews.llvm.org/D116914	2022-01-11 16:49:06 -08:00
Vy Nguyen	944071eca2	[lld-macho] Don't replace local personality symbol with LazySymbol Follup-up to D107533, where we replaced local syms with non-local. It doesn't make sense to replace local symbol with lazy. Differential Revision: https://reviews.llvm.org/D110040	2021-11-22 14:09:54 -05:00
Greg McGary	9cc489a4b2	[lld-macho][nfc] Factor-out NFC changes from main __eh_frame diff In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here. Differential Revision: https://reviews.llvm.org/D114017	2021-11-17 15:16:44 -07:00
Jez Ng	9d0b237c51	[lld-macho] Fix symbol relocs handling for LSDAs Similar to D113702, but for the LSDAs. Clang seems to emit all LSDA relocs as section relocs, but ld -r can turn those relocs into symbol ones. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D113721	2021-11-12 16:02:49 -05:00
Jez Ng	d9b6f7e312	[lld-macho] Teach ICF to dedup functions with identical unwind info Dedup'ing unwind info is tricky because each CUE contains a different function address, if ICF operated naively and compared the entire contents of each CUE, entries with identical unwind info but belonging to different functions would never be considered identical. To work around this problem, we slice away the function address before performing ICF. We rely on `relocateCompactUnwind()` to correctly handle these truncated input sections. Here are the numbers before and after D109944, D109945, and this diff were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W: Without any optimizations: base diff difference (95% CI) sys_time 0.849 ± 0.015 0.896 ± 0.012 [ +4.8% .. +6.2%] user_time 3.357 ± 0.030 3.512 ± 0.023 [ +4.3% .. +5.0%] wall_time 3.944 ± 0.039 4.032 ± 0.031 [ +1.8% .. +2.6%] samples 40 38 With `-dead_strip`: base diff difference (95% CI) sys_time 0.847 ± 0.010 0.896 ± 0.012 [ +5.2% .. +6.5%] user_time 3.377 ± 0.014 3.532 ± 0.015 [ +4.4% .. +4.8%] wall_time 3.962 ± 0.024 4.060 ± 0.030 [ +2.1% .. +2.8%] samples 47 30 With `-dead_strip` and `--icf=all`: base diff difference (95% CI) sys_time 0.935 ± 0.013 0.957 ± 0.018 [ +1.5% .. +3.2%] user_time 3.472 ± 0.022 6.531 ± 0.046 [ +87.6% .. +88.7%] wall_time 4.080 ± 0.040 5.329 ± 0.060 [ +30.0% .. +31.2%] samples 37 30 Unsurprisingly, ICF is now a lot slower, likely due to the much larger number of input sections it needs to process. But the rest of the linker only suffers a mild slowdown. Note that the compact-unwind-bad-reloc.s test was expanded because we now handle the relocation for CUE's function address in a separate code path from the rest of the CUE relocations. The extended test covers both code paths. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D109946	2021-11-12 16:02:49 -05:00
Jez Ng	a2404f11c7	[lld-macho] Support renaming of LSDA section Previously, our unwind info finalization logic assumed that the LSDA section referenced by `__compact_unwind` was already finalized before `__TEXT,__unwind_info` itself. However, that assumption could be broken by the use of `-rename_section` -- it could be (and is) used to move `__gcc_except_tab` it into a different segment later in the file. (__TEXT is always the first non-zerofill segment, so any rename basically guarantees that the section will be ordered after `__unwind_info`.) To handle this case, we compare LSDA relocations instead of their final values in `UnwindInfoSection::finalize()`, and we actually relocate those LSDAs in `UnwindInfoSection::writeTo()`. In order to do this, we need an easy way to track which Symbol a given CUE corresponds to. My solution was to change our `cuPtrVector` into a vector of indices, with each index used for both the symbols vector (`symbolsVec`) as well as the CUE vector (`cuVector`). This change seems perf neutral. Numbers for linking chromium_framework on my 16 core Mac Pro: base diff difference (95% CI) sys_time 1.248 ± 0.025 1.245 ± 0.026 [ -1.3% .. +0.8%] user_time 3.588 ± 0.045 3.587 ± 0.037 [ -0.6% .. +0.5%] wall_time 4.605 ± 0.069 4.595 ± 0.069 [ -1.0% .. +0.5%] samples 42 26 Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D113582	2021-11-10 19:31:54 -05:00
Vy Nguyen	3f35dd06a5	[lld-macho][nfc][cleanup] Fix a few code style lints and clang-tidy findings - Use .empty() instead of `size() == 0` when possible. - Use const-ref to avoid copying Differential Revision: https://reviews.llvm.org/D112978	2021-11-02 11:26:15 -04:00
Jez Ng	a9353dbe51	[lld-macho] Simplify the handling of "no unwind info" functions This diff does away with `addEntriesForFunctionsWithoutUnwindInfo()`, because `addSymbol()` can now determine which functions need those entries. While overhauling UnwindInfoSection, I also parallelized the relocation of the contents of the CUEs. This somewhat offsets the time regression from creating one InputSection per CUE (which was done in D109944). Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D109945	2021-10-26 16:04:16 -04:00
Jez Ng	002eda7056	[lld-macho] Associate compact unwind entries with function symbols Compact unwind entries (CUEs) contain pointers to their respective function symbols. However, during the link process, it's far more useful to have pointers from the function symbol to the CUE than vice versa. This diff adds that pointer in the form of `Defined::compactUnwind`. In particular, when doing dead-stripping, we want to mark CUEs live when their function symbol is live; and when doing ICF, we want to dedup sections iff the symbols in that section have identical CUEs. In both cases, we want to be able to locate the symbols within a given section, as well as locate the CUEs belonging to those symbols. So this diff also adds `InputSection::symbols`. The ultimate goal of this refactor is to have ICF support dedup'ing functions with unwind info, but that will be handled in subsequent diffs. This diff focuses on simplifying `-dead_strip` -- `findFunctionsWithUnwindInfo` is no longer necessary, and `Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no longer has to check for dead CUEs -- we simply avoid adding them in the first place. Additionally, we now support stripping of dead LSDAs, which follows quite naturally since `markLive()` can now reach them via the CUEs. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D109944	2021-10-26 16:04:15 -04:00
Vy Nguyen	b428c3e8c1	[lld-macho] Ignore local personality symbols if non-local with the same name exisst, to avoid "too many personalities" error. Sometimes people intentionally re-define a dylib personlity symbol as a local defined symbol as a workaround to a ld -r bug. As a result, we could see "too many personalities" to encode. This patch tries to handle this case by ignoring the local symbols entirely. Differential Revision: https://reviews.llvm.org/D107533	2021-09-17 12:59:42 -04:00
Jez Ng	2179930868	[lld-macho] Fix unwind info personality size This was missed by {D107035}. This fix addresses the following warning: loop variable 'personality' has type 'const uint32_t &' (aka 'const unsigned int &') but is initialized with type 'const unsigned long long' resulting in a copy [-Wrange-loop-analysis] In addition to fixing the size, I also removed the const reference, since there's no performance benefit to avoiding copies of integer-sized values.	2021-08-26 18:52:06 -04:00
Vy Nguyen	0bd14711ac	[lld-macho] Change personalities entry type to Ptr to avoid overflowing uint32 PR51262 Differential Revision: https://reviews.llvm.org/D107035	2021-07-29 14:26:07 -04:00
Jez Ng	428a7c1b38	[lld-macho] Have ICF operate on all sections at once ICF previously operated only within a given OutputSection. We would merge all CFStrings first, then merge all regular code sections in a second phase. This worked fine since CFStrings would never reference regular `__text` sections. However, I would like to expand ICF to merge functions that reference unwind info. Unwind info references the LSDA section, which can in turn reference the `__text` section, so we cannot perform ICF in phases. In order to have ICF operate on InputSections spanning multiple OutputSections, we need a way to distinguish InputSections that are destined for different OutputSections, so that we don't fold across section boundaries. We achieve this by creating OutputSections early, and setting `InputSection::parent` to point to them. This is what LLD-ELF does. (This change should also make it easier to implement the `section$start$` symbols.) This diff also folds InputSections w/o checking their flags, which I think is the right behavior -- if they are destined for the same OutputSection, they will have the same flags in the output (even if their input flags differ). I.e. the `parent` pointer check subsumes the `flags` check. In practice this has nearly no effect (ICF did not become any more effective on chromium_framework). I've also updated ICF.cpp's block comment to better reflect its current status. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D105641	2021-07-17 13:42:51 -04:00
Jez Ng	28a2102ee3	[lld-macho][nfc] Remove unnecessary llvm:: namespace prefixes	2021-07-11 18:36:53 -04:00
Vy Nguyen	3822e3d5b0	[lld-macho] Fix bug in handling unwind info from ld -r Two changess: - Drop assertions that all symbols are in GOT - Set allEntriesAreOmitted correctly Related bug: 50812 Differential Revision: https://reviews.llvm.org/D105364	2021-07-09 22:44:51 -04:00
Nico Weber	8a7b5ebf4d	[lld/mac] Don't crash when dead-stripping removes all unwind info If the input has compact unwind info but all of it is removed after dead stripping, we would crash. Now we don't write any __unwind_info section at all, like ld64. This is a bit awkward to implement because we only know the final state of unwind info after UnwindInfoSectionImpl<Ptr>::finalize(), which is called after sections are added. So add a small amount of bookkeeping to relocateCompactUnwind() instead (which runs earlier) so that we can predict what finalize() will do before it runs. Fixes PR51010. Differential Revision: https://reviews.llvm.org/D105557	2021-07-07 13:05:40 -04:00
Nico Weber	d7e65757ed	[lld/mac] Tweak reserve() argument in unwind code addEntriesForFunctionsWithoutUnwindInfo() can add entries to cuVector, so cuCount can be stale. Use cuVector.size() instead. No behavior change.	2021-07-07 11:44:22 -04:00
Nico Weber	9e24979d73	[lld/mac] Fix function offset on 1st-level unwind table sentinel Two bugs: 1. This tries to take the address of the last symbol plus the length of the last symbol. However, the sorted vector is cuPtrVector, not cuVector. Also, cuPtrVector has tombstone values removed and cuVector doesn't. If there was a stripped value at the end, the "last" element's value was UINT64_MAX, which meant the sentinel value was one less than the length of that "last" dead symbol. 2. We have to subtract in.header->addr. For 64-bit binaries that's (1 << 32) and functionAddress is 32-bit so this is a no-op, but for 32-bit binaries the sentinel's value was too large. I believe this has no effect in practice since the first-level binary search code in libunwind (in UnwindCursor.hpp) does: uint32_t low = 0; uint32_t high = sectionHeader.indexCount(); uint32_t last = high - 1; while (low < high) { uint32_t mid = (low + high) / 2; if ((mid == last) \|\| (topIndex.functionOffset(mid + 1) > targetFunctionOffset)) { low = mid; break; } else { low = mid + 1; } So the address of the last entry in the first-level table isn't really checked -- except for the very end, but the check against `last` means we just run the loop once more than necessary. But it makes `unwinddump` output look less confusing, and it's what it looks was the intention here. (No test since I can't think of a way to make FileCheck check that one number is larger than another.) Differential Revision: https://reviews.llvm.org/D105404	2021-07-04 18:06:20 -04:00
Nico Weber	d2d6da3011	[lld/mac] Don't crash on 32-bit output binaries when dead-stripping Fixes PR50974. Differential Revision: https://reviews.llvm.org/D105399	2021-07-04 18:03:31 -04:00
Jez Ng	f6b6e72143	[lld-macho] Factor out common InputSection members We have been creating many ConcatInputSections with identical values due to .subsections_via_symbols. This diff factors out the identical values into a Shared struct, to reduce memory consumption and make copying cheaper. I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an extra word. All in all, this takes InputSection from 120 to 72 bytes (and ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in ConcatInputSection. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.14 4.24 4.18 4.183 0.027548999 + 20 4.04 4.11 4.075 4.0775 0.018027756 Difference at 95.0% confidence -0.1055 +/- 0.0149005 -2.52211% +/- 0.356215% (Student's t, pooled s = 0.0232803) Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105305	2021-07-01 21:22:39 -04:00
Jez Ng	3a11528d97	[lld-macho] Move ICF earlier to avoid emitting redundant binds This is a pretty big refactoring diff, so here are the motivations: Previously, ICF ran after scanRelocations(), where we emitting bind/rebase opcodes etc. So we had a bunch of redundant leftovers after ICF. Having ICF run before Writer seems like a better design, and is what LLD-ELF does, so this diff refactors it accordingly. However, ICF had two dependencies on things occurring in Writer: 1) it needs literals to be deduplicated beforehand and 2) it needs to know which functions have unwind info, which was being handled by `UnwindInfoSection::prepareRelocations()`. In order to do literal deduplication earlier, we need to add literal input sections to their corresponding output sections. So instead of putting all input sections into the big `inputSections` vector, and then filtering them by type later on, I've changed things so that literal sections get added directly to their output sections during the 'gather' phase. Likewise for compact unwind sections -- they get added directly to the UnwindInfoSection now. This latter change is not strictly necessary, but makes it easier for ICF to determine which functions have unwind info. Adding literal sections directly to their output sections means that we can no longer determine `inputOrder` from iterating over `inputSections`. Instead, we store that order explicitly on InputSection. Bloating the size of InputSection for this purpose would be unfortunate -- but LLD-ELF has already solved this problem: it reuses `outSecOff` to store this order value. One downside of this refactor is that we now make an additional pass over the unwind info relocations to figure out which functions have unwind info, since want to know that before `processRelocations()`. I've made sure to run that extra loop only if ICF is enabled, so there should be no overhead in non-optimizing runs of the linker. The upside of all this is that the `inputSections` vector now contains only ConcatInputSections that are destined for ConcatOutputSections, so we can clean up a bunch of code that just existed to filter out other elements from that vector. I will test for the lack of redundant binds/rebases in the upcoming cfstring deduplication diff. While binds/rebases can also happen in the regular `.text` section, they're more common in `.data` sections, so it seems more natural to test it that way. This change is perf-neutral when linking chromium_framework. Reviewed By: oontvoo Differential Revision: https://reviews.llvm.org/D105044	2021-07-01 21:22:38 -04:00
Jez Ng	bf457919f2	[lld-macho][nfc] Remove unnecessary dyn_cast and simplify code	2021-06-28 14:50:44 -04:00
Nico Weber	0f24ffcdfa	[lld/mac] Don't fold UNWIND_X86_64_MODE_STACK_IND unwind entries libunwind uses unwind info to find the function address belonging to the current instruction pointer. libunwind/src/CompactUnwinder.hpp's step functions read functionStart for UNWIND_X86_64_MODE_STACK_IND (and for nothing else), so these encodings need a dedicated entry per function, so that the runtime can get the stacksize off the `subq` instrunction in the function's prologue. This matches ld64. (CompactUnwinder.hpp from https://opensource.apple.com/source/libunwind/ also reads functionStart in a few more cases if `SUPPORT_OLD_BINARIES` is set, but it defaults to 0, and ld64 seems to not worry about these additional cases.) Related upstream bug: https://crbug.com/1220175 Differential Revision: https://reviews.llvm.org/D104978	2021-06-27 06:49:32 -04:00
Jez Ng	8aa17d1eae	[lld-macho] Move ICF members from InputSection to ConcatInputSection `icfEqClass` only makes sense on ConcatInputSections since (in contrast to literal sections) they are deduplicated as an atomic unit. Similarly, `hasPersonality` and `replacement` don't make sense on literal sections. This mirrors LLD-ELF, which stores `icfEqClass` only on non-mergeable sections. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D104670	2021-06-24 22:23:12 -04:00
Nico Weber	ef75358080	[lld/mac] Delete incorrect FIXME """Bitcode symbols only exist before LTO runs, and only serve the purpose of resolving visibility so LTO can better optimize. Running LTO creates ObjFiles from BitcodeFiles, and those ObjFiles contain regular Defined symbols (with isec set and all) that will replace the bitcode symbols. So things should (hopefully) work as-is :)""" -- https://reviews.llvm.org/rGdbbc8d8333f29cf4ad6f4793da1adf71bbfdac69#inline-6081	2021-06-23 16:25:34 -04:00
Nico Weber	dbbc8d8333	[lld/mac] Don't crash on absolute symbols in unwind info generation Fixes a regression from `d6565a2dbc` and PR50820.	2021-06-23 14:25:34 -04:00
Nico Weber	d6565a2dbc	[lld/mac] Add explicit "no unwind info" entries for functions without unwind info Fixes PR50529. With this, lld-linked Chromium base_unittests passes on arm macs. Surprisingly, no measurable impact on link time. Differential Revision: https://reviews.llvm.org/D104681	2021-06-22 06:12:42 -04:00
Greg McGary	f27e4548fc	[lld-macho] Implement ICF ICF = Identical C(ode\|OMDAT) Folding This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs. `check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64. We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`. Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF: ``` N Min Max Median Avg Stddev x 20 4.27 4.44 4.34 4.349 0.043029977 + 20 4.37 4.46 4.405 4.4115 0.025188761 Difference at 95.0% confidence 0.0625 +/- 0.0225658 1.43711% +/- 0.518873% (Student's t, pooled s = 0.0352566) ``` Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D103292	2021-06-17 10:07:44 -07:00
Jez Ng	b8bbb9723a	[lld-macho][nfc] Put back shouldOmitFromOutput() asserts I removed them in rG5de7467e982 but @thakis pointed out that they were useful to keep, so here they are again. I've also converted the `!isCoalescedWeak()` asserts into `!shouldOmitFromOutput()` asserts, since the latter check subsumes the former. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104169	2021-06-16 15:23:04 -04:00
Jez Ng	b2a0739012	[lld-macho][nfc] Remove InputSection::outSecFileOff `outSecFileOff` and the associated `getFileOffset()` accessors were unnecessary. For all the cases we care about, `outSecFileOff` is the same as `outSecOff`. The only time they deviate is if there are zerofill sections within a given segment. But since zerofill sections are always at the end of a segment, the only sections where the two values deviate are zerofill sections themselves. And we never actually query the outSecFileOff of zerofill sections. As for `getFileOffset()`, the only place it was being used was to calculate the offset of the entry symbol. However, we can compute that value by just taking the difference between the address of the entry symbol and the address of the Mach-O header. In fact, this appears to be what ld64 itself does. This difference is the same as the file offset as long as there are no intervening zerofill sections, but since `__text` is the first section in `__TEXT`, this never happens, so our previous use of `getFileOffset()` was not wrong -- just inefficient. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104177	2021-06-13 19:51:30 -04:00
Nico Weber	7d4c8a2b8f	[lld/mac] clarify comment This is a "we should do X in the future" fixme, not an "X might go wrong" fixme.	2021-06-13 13:30:07 -04:00
Jez Ng	5de7467e98	[lld-macho] Fix debug build D103977 broke a bunch of stuff as I had only tested the release build which eliminated asserts. I've retained the asserts where possible, but I also removed a bunch instead of adding a whole lot of verbose ConcatInputSection casts.	2021-06-11 20:21:27 -04:00
Jez Ng	7f2ba39b16	[lld-macho][nfc] Move liveness-tracking fields into ConcatInputSection These fields currently live in the parent InputSection class, but they should be specific to ConcatInputSection, since the other InputSection classes (that contain literals) aren't atomically live or dead -- rather their component string/int literals should have individual liveness states. (An upcoming diff will add liveness bits for StringPieces and fixed-sized literals.) I also factored out some asserts for isCoalescedWeak() in MarkLive.cpp. We now avoid putting coalesced sections in the `inputSections` vector, so we don't have to check/assert against it everywhere. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103977	2021-06-11 19:50:08 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Nico Weber	a5645513db	[lld/mac] Implement -dead_strip Also adds support for live_support sections, no_dead_strip sections, .no_dead_strip symbols. Chromium Framework 345MB unstripped -> 250MB stripped (vs 290MB unstripped -> 236M stripped with ld64). Doing dead stripping is a bit faster than not, because so much less data needs to be processed: % ministat lld_* x lld_nostrip.txt + lld_strip.txt N Min Max Median Avg Stddev x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794 + 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651 Difference at 95.0% confidence -0.144711 +/- 0.0336749 -3.60967% +/- 0.839989% (Student's t, pooled s = 0.0358398) This interacts with many parts of the linker. I tried to add test coverage for all added `isLive()` checks, so that some test will fail if any of them is removed. I checked that the test expectations for the most part match ld64's behavior (except for live-support-iterations.s, see the comment in the test). Interacts with: - debug info - export tries - import opcodes - flags like -exported_symbol(s_list) - -U / dynamic_lookup - mod_init_funcs, mod_term_funcs - weak symbol handling - unwind info - stubs - map files - -sectcreate - undefined, dylib, common, defined (both absolute and normal) symbols It's possible it interacts with more features I didn't think of, of course. I also did some manual testing: - check-llvm check-clang check-lld work with lld with this patch as host linker and -dead_strip enabled - Chromium still starts - Chromium's base_unittests still pass, including unwind tests Implemenation-wise, this is InputSection-based, so it'll work for object files with .subsections_via_symbols (which includes all object files generated by clang). I first based this on the COFF implementation, but later realized that things are more similar to ELF. I think it'd be good to refactor MarkLive.cpp to look more like the ELF part at some point, but I'd like to get a working state checked in first. Mechanical parts: - Rename canOmitFromOutput to wasCoalesced (no behavior change) since it really is for weak coalesced symbols - Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP (`.no_dead_strip` in asm) Fixes PR49276. Differential Revision: https://reviews.llvm.org/D103324	2021-06-02 11:09:26 -04:00
Jez Ng	33706191d8	[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection The ELF format has the concept of merge sections (marked by SHF_MERGE), which contain data that can be safely deduplicated. The Mach-O equivalents are called literal sections (marked by S_CSTRING_LITERALS or S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word 'merge', to avoid confusion, I've renamed our MergedOutputSection to ConcatOutputSection. I believe it's a more descriptive name too. This renaming sets the stage for {D102964}. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102971	2021-05-25 14:58:29 -04:00
Jez Ng	9cc0d893f7	[lld-macho][nfc] clang-format everything	2021-05-25 14:58:29 -04:00
Nico Weber	4a12248ee2	[lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619	2021-05-17 14:22:12 -04:00

1 2

75 Commits