llvm-project

Commit Graph

Author	SHA1	Message	Date
Jez Ng	098430cd25	[lld-macho][nfc] Simplify LC_DATA_IN_CODE generation 1. After D113241, we have the section address easily accessible and no longer need to iterate across the LC_SEGMENT commands to emit LC_DATA_IN_CODE. 2. There's no need to store a pointer to the data in code entries during the parse step; we can just look it up as part of the output step. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D115556	2021-12-11 01:01:57 -05:00
Vy Nguyen	944071eca2	[lld-macho] Don't replace local personality symbol with LazySymbol Follup-up to D107533, where we replaced local syms with non-local. It doesn't make sense to replace local symbol with lazy. Differential Revision: https://reviews.llvm.org/D110040	2021-11-22 14:09:54 -05:00
Greg McGary	3a1b3c9afe	[lld-macho][nfc] rename parsed-section types & variables This is an NFC diff that prepares for pruning & relocating `__eh_frame`. Along the way, I made the following changes to ... * clarify usage of `section` vs. `subsection` * remove `map` & `vec` from type names * disambiguate class `Section` from template parameter `SectionHeader`. Differential Revision: https://reviews.llvm.org/D113241	2021-11-16 07:06:41 -07:00
Jez Ng	a271f2410f	[lld-macho][nfc] Canonicalize all pointers to InputSections early on Having to remember to call `canonical()` all over the place is error-prone; let's do it in a centralized location instead. It also appears to improve performance slightly. base diff difference (95% CI) sys_time 0.984 ± 0.009 0.983 ± 0.014 [ -0.8% .. +0.6%] user_time 6.508 ± 0.035 6.475 ± 0.036 [ -0.8% .. -0.2%] wall_time 5.321 ± 0.034 5.300 ± 0.033 [ -0.7% .. -0.1%] samples 36 23 Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D112687	2021-10-29 11:00:28 -04:00
Vincent Lee	d54360cd32	[lld-macho] Implement -S There are a couple internal builds that require the use of this flag. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D112594	2021-10-27 17:09:57 -07:00
Nuri Amari	a299b24712	Regenerate LC_CODE_SIGNATURE during llvm-objcopy operations Context: This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication. Summary Prior to this change, if a LC_CODE_SIGNATURE load command was included in the binary passed to llvm-objcopy, the command and associated section were simply copied and included verbatim in the new binary. If rest of the binary was modified at all, this results in an invalid Mach-O file. This change regenerates the signature rather than copying it. The code_signature_lc.test test was modified to include the yaml representation of a small signed MachO executable in order to effectively test the signature generation. Reviewed By: alexander-shaposhnikov, #lld-macho Differential Revision: https://reviews.llvm.org/D111164	2021-10-26 14:51:13 -07:00
Jez Ng	002eda7056	[lld-macho] Associate compact unwind entries with function symbols Compact unwind entries (CUEs) contain pointers to their respective function symbols. However, during the link process, it's far more useful to have pointers from the function symbol to the CUE than vice versa. This diff adds that pointer in the form of `Defined::compactUnwind`. In particular, when doing dead-stripping, we want to mark CUEs live when their function symbol is live; and when doing ICF, we want to dedup sections iff the symbols in that section have identical CUEs. In both cases, we want to be able to locate the symbols within a given section, as well as locate the CUEs belonging to those symbols. So this diff also adds `InputSection::symbols`. The ultimate goal of this refactor is to have ICF support dedup'ing functions with unwind info, but that will be handled in subsequent diffs. This diff focuses on simplifying `-dead_strip` -- `findFunctionsWithUnwindInfo` is no longer necessary, and `Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no longer has to check for dead CUEs -- we simply avoid adding them in the first place. Additionally, we now support stripping of dead LSDAs, which follows quite naturally since `markLive()` can now reach them via the CUEs. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D109944	2021-10-26 16:04:15 -04:00
Jez Ng	622150ad5f	[lld-macho] Put GOT into `__DATA` segment where appropriate We were previously always emitting the GOT into `__DATA_CONST`, even for target platforms where it should end up in `__DATA`. I stumbled onto this while trying to use the `class-dump` tool -- with the wrong segment names, it fails to locate the ObjC runtime info and therefore fails to dump any classes. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D112500	2021-10-26 11:38:01 -04:00
Vy Nguyen	236197e2d0	[lld-macho] Implement -oso_prefix https://bugs.llvm.org/show_bug.cgi?id=50229 Differential Revision: https://reviews.llvm.org/D112291	2021-10-22 16:32:42 -04:00
Nico Weber	4e572db0c2	[lld/mac] Mark private externs with GOT relocs as LOCAL in indirect symbtab prepareSymbolRelocation() in Writer.cpp adds both symbols that need binding and symbols relocated with a pointer relocation to the got. Pointer relocations are emitted for non-movq GOTPCREL(%rip) loads. (movqs become GOT_LOADs so that the linker knows they can be relaxed to leaqs, while others, such as addq, become just GOT -- a pointer relocation -- since they can't be relaxed in that way). For example, this C file produces a private_extern GOT relocation when compiled with -O2 with clang: extern const char kString[]; const char* g(int a) { return kString + a; } Linkers need to put pointer-relocated symbols into the GOT, but ld64 marks them as LOCAL in the indirect symbol table. This matters, since `strip -x` looks at the indirect symbol table when deciding what to strip. The indirect symtab emitting code was assuming that only symbols that need binding are in the GOT, but pointer relocations where there too. Hence, the code needs to explicitly check if a symbol is a private extern. Fixes https://crbug.com/1242638, which has some more information in comments 14 and 15. With this patch, the output of `nm -U` on Chromium Framework after stripping now contains just two symbols when using lld, just like with ld64. Differential Revision: https://reviews.llvm.org/D111852	2021-10-15 13:24:47 -04:00
Daniel Rodríguez Troitiño	657f02d458	Revert "Extract LC_CODE_SIGNATURE related implementation out of LLD" This reverts commit `cc8229603b`. As discussed in the review of https://reviews.llvm.org/D109972, this was not right approach, so we are reverting to start with a different approach. Differential Revision: https://reviews.llvm.org/D110974	2021-10-01 17:19:50 -07:00
Mike Hommey	08ef24f6ab	Wrap xar/xar.h include in extern "C" block Without such wrapping, linking lld fails with missing symbols because of C++ symbol mangling with older versions of the MacOSX SDK, in which xar.h doesn't have an extern "C" block itself. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D110224	2021-09-23 09:37:30 +02:00
Nuri Amari	cc8229603b	Extract LC_CODE_SIGNATURE related implementation out of LLD Move the functionality in lld that handles writing of the LC_CODE_SIGNATURE load command and associated data section to a central reusable location. This change is in preparation for another change that modifies llvm-objcopy to reproduce the LC_CODE_SIGNATURE load command and corresponding data section to maintain the validity of signed macho object files passed through llvm-objcopy. Reviewed By: #lld-macho, int3, oontvoo Differential Revision: https://reviews.llvm.org/D109803	2021-09-16 17:43:39 -07:00
Nico Weber	9482aa98e5	[lld/mac] Let OutputSegment store its start address segment$start$/segment$end$ symbols allow creating segments without sections, so getting the segment address off the first section won't work there. Storing the address on the segment is arguably a bit simpler too. No behavior change, part of PR50760. Differential Revision: https://reviews.llvm.org/D106665	2021-07-23 11:43:25 -04:00
Vincent Lee	33ab995617	Recommit "[lld-macho] Use DO_BIND_ADD_ADDR_IMM_SCALED for bind opcodes" Implement pass 3 of bind opcodes from ld64 (which supports both 32-bit and 64-bit). Pass 3 implementation condenses BIND_OPCODE_DO_BIND_ADD_ADDR_ULEB opcode to BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED. This change is already behind an O2 flag so it shouldn't impact current performance. I verified ld64's output with x86_64 LLD and they were both emitting the same optimized bind opcodes (although in a slightly different order). Tested with arm64_32 LLD and compared that with x86 LLD that the order of the bind opcodes are the same (offset values are different which should be expected). Reviewed By: int3, #lld-macho, MaskRay Differential Revision: https://reviews.llvm.org/D106128	2021-07-20 13:45:24 -07:00
Fangrui Song	88e2268a34	Revert D106128 "[lld-macho] Use DO_BIND_ADD_ADDR_IMM_SCALED for bind opcodes" This reverts commit `321b2bef09`. `for (BindIR *p = &opcodes[0]; p->opcode != BIND_OPCODE_DONE; ++p) {` has a heap-buffer-overflow with test/MachO/bind-opcodes.	2021-07-19 18:13:52 -07:00
Vincent Lee	321b2bef09	[lld-macho] Use DO_BIND_ADD_ADDR_IMM_SCALED for bind opcodes Implement pass 3 of bind opcodes from ld64 (which supports both 32-bit and 64-bit). Pass 3 implementation condenses BIND_OPCODE_DO_BIND_ADD_ADDR_ULEB opcode to BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED. This change is already behind an O2 flag so it shouldn't impact current performance. I verified ld64's output with x86_64 LLD and they were both emitting the same optimized bind opcodes (although in a slightly different order). Tested with arm64_32 LLD and compared that with x86 LLD that the order of the bind opcodes are the same (offset values are different which should be expected). Reviewed By: int3, #lld-macho Differential Revision: https://reviews.llvm.org/D106128	2021-07-19 16:18:33 -07:00
Jez Ng	428a7c1b38	[lld-macho] Have ICF operate on all sections at once ICF previously operated only within a given OutputSection. We would merge all CFStrings first, then merge all regular code sections in a second phase. This worked fine since CFStrings would never reference regular `__text` sections. However, I would like to expand ICF to merge functions that reference unwind info. Unwind info references the LSDA section, which can in turn reference the `__text` section, so we cannot perform ICF in phases. In order to have ICF operate on InputSections spanning multiple OutputSections, we need a way to distinguish InputSections that are destined for different OutputSections, so that we don't fold across section boundaries. We achieve this by creating OutputSections early, and setting `InputSection::parent` to point to them. This is what LLD-ELF does. (This change should also make it easier to implement the `section$start$` symbols.) This diff also folds InputSections w/o checking their flags, which I think is the right behavior -- if they are destined for the same OutputSection, they will have the same flags in the output (even if their input flags differ). I.e. the `parent` pointer check subsumes the `flags` check. In practice this has nearly no effect (ICF did not become any more effective on chromium_framework). I've also updated ICF.cpp's block comment to better reflect its current status. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D105641	2021-07-17 13:42:51 -04:00
Vincent Lee	d695d0d6f6	[lld-macho] Optimize bind opcodes with multiple passes In D105866, we used an intermediate container to store a list of opcodes. Here, we use that data structure to help us perform optimization passes that would allow a more efficient encoding of bind opcodes. Currently, the functionality mirrors the optimization pass {1,2} done in ld64 for bind opcodes under optimization gate to prevent slight regressions. Reviewed By: int3, #lld-macho Differential Revision: https://reviews.llvm.org/D105867	2021-07-15 20:52:46 -07:00
Vincent Lee	f2b1264141	[lld-macho] Use intermediate arrays to store opcodes We want to incorporate some of the optimization passes in bind opcodes from ld64. This revision makes no functional changes but to start storing opcodes in intermediate containers in preparation for implementing the optimization passes in a follow-up revision. Differential Revision: https://reviews.llvm.org/D105866	2021-07-15 16:57:45 -07:00
Nico Weber	f21801dab2	[lld/mac] Implement -application_extension Differential Revision: https://reviews.llvm.org/D105818	2021-07-12 13:42:16 -04:00
Nico Weber	10e28a7484	[lld/mac] Use normal Undefined machinery for dyld_stub_binder lookup This is for aesthetic reasons, I'm not aware of anything that needs this in practice. It does have a few effects: - `-undefined dynamic_lookup` now has an effect for dyld_stub_binder. This matches ld64. - `-U dyld_stub_binder` now works like you'd expect (it doesn't work in ld64). - The error message for a missing dyld_stub_binder symbol now looks like other undefined reference symbols, it changes from symbol dyld_stub_binder not found (normally in libSystem.dylib). Needed to perform lazy binding. to error: undefined symbol: dyld_stub_binder >>> referenced by lazy binding (normally in libSystem.dylib) Also add test coverage for that error message. But in practice, this should have no interesting effects since everything links in dyld_stub_binder via libSystem anyways. Differential Revision: https://reviews.llvm.org/D105781	2021-07-11 12:48:59 -04:00
Nico Weber	2c25f39fcc	[lld/mac] Implement -final_output This is one of two flags clang passes to the linker when giving calling clang with multiple -arch flags. I think it'd make sense to also use finalOutput instead of outputFile in CodeSignatureSection() and when replacing @executable_path, but ld64 doesn't do that, so I'll at least put those in separate commits. Differential Revision: https://reviews.llvm.org/D105449	2021-07-05 20:06:26 -04:00
Jez Ng	718c32175b	[lld-macho] Only emit one BIND_OPCODE_SET_SYMBOL per symbol Size-wise, BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM is the most expensive opcode, since it comes with an associated symbol string. We were previously emitting it once per binding, instead of once per symbol. This diff groups all bindings for a given symbol together and ensures we only emit one such opcode per symbol. This matches ld64's behavior. While this is a relatively small win on chromium_framework (-72KiB), for programs that have more dynamic bindings, the difference can be quite large. This change is perf-neutral when linking chromium_framework. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105075	2021-07-05 20:00:19 -04:00
Jez Ng	f6b6e72143	[lld-macho] Factor out common InputSection members We have been creating many ConcatInputSections with identical values due to .subsections_via_symbols. This diff factors out the identical values into a Shared struct, to reduce memory consumption and make copying cheaper. I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an extra word. All in all, this takes InputSection from 120 to 72 bytes (and ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in ConcatInputSection. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.14 4.24 4.18 4.183 0.027548999 + 20 4.04 4.11 4.075 4.0775 0.018027756 Difference at 95.0% confidence -0.1055 +/- 0.0149005 -2.52211% +/- 0.356215% (Student's t, pooled s = 0.0232803) Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105305	2021-07-01 21:22:39 -04:00
Jez Ng	3a11528d97	[lld-macho] Move ICF earlier to avoid emitting redundant binds This is a pretty big refactoring diff, so here are the motivations: Previously, ICF ran after scanRelocations(), where we emitting bind/rebase opcodes etc. So we had a bunch of redundant leftovers after ICF. Having ICF run before Writer seems like a better design, and is what LLD-ELF does, so this diff refactors it accordingly. However, ICF had two dependencies on things occurring in Writer: 1) it needs literals to be deduplicated beforehand and 2) it needs to know which functions have unwind info, which was being handled by `UnwindInfoSection::prepareRelocations()`. In order to do literal deduplication earlier, we need to add literal input sections to their corresponding output sections. So instead of putting all input sections into the big `inputSections` vector, and then filtering them by type later on, I've changed things so that literal sections get added directly to their output sections during the 'gather' phase. Likewise for compact unwind sections -- they get added directly to the UnwindInfoSection now. This latter change is not strictly necessary, but makes it easier for ICF to determine which functions have unwind info. Adding literal sections directly to their output sections means that we can no longer determine `inputOrder` from iterating over `inputSections`. Instead, we store that order explicitly on InputSection. Bloating the size of InputSection for this purpose would be unfortunate -- but LLD-ELF has already solved this problem: it reuses `outSecOff` to store this order value. One downside of this refactor is that we now make an additional pass over the unwind info relocations to figure out which functions have unwind info, since want to know that before `processRelocations()`. I've made sure to run that extra loop only if ICF is enabled, so there should be no overhead in non-optimizing runs of the linker. The upside of all this is that the `inputSections` vector now contains only ConcatInputSections that are destined for ConcatOutputSections, so we can clean up a bunch of code that just existed to filter out other elements from that vector. I will test for the lack of redundant binds/rebases in the upcoming cfstring deduplication diff. While binds/rebases can also happen in the regular `.text` section, they're more common in `.data` sections, so it seems more natural to test it that way. This change is perf-neutral when linking chromium_framework. Reviewed By: oontvoo Differential Revision: https://reviews.llvm.org/D105044	2021-07-01 21:22:38 -04:00
Jez Ng	0d6d35e63b	[lld-macho] -section_rename should work on synthetic sections too Previously, we only applied the renames to ConcatOutputSections. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D105079	2021-06-30 18:55:48 -04:00
Leonard Grey	a8a6e5b094	[lld-macho] Preserve alignment for non-deduplicated cstrings Fixes PR50637. Downstream bug: https://crbug.com/1218958 Currently, we split __cstring along symbol boundaries with .subsections_via_symbols when not deduplicating, and along null bytes when deduplicating. This change splits along null bytes unconditionally, and preserves original alignment in the non- deduplicated case. Removing subsections-section-relocs.s because with this change, __cstring is never reordered based on the order file. Differential Revision: https://reviews.llvm.org/D104919	2021-06-28 22:26:43 -04:00
Jez Ng	557e1fa02f	[lld-macho] Extend ICF to literal sections Literal sections can be deduplicated before running ICF. That makes it easy to compare them during ICF: we can tell if two literals are constant-equal by comparing their offsets in their OutputSection. LLD-ELF takes a similar approach. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D104671	2021-06-28 14:49:39 -04:00
Jez Ng	4a8503c8e0	[lld-macho] Align all cstrings to 16 bytes when deduplicating We previously did this only for x86_64, but it turns out that arm64 needs this too -- see PR50791. Ultimately this is a hack, and we should avoid over-aligning strings that don't need it. I'm just having a hard time figuring out how ld64 is determining the right alignment. No new test for this since we were already testing this behavior for x86_64, and extending it to arm64 seems too trivial. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104835	2021-06-24 16:53:29 -04:00
Nico Weber	17271ece0d	[lld/mac] Give __DATA,__thread_ptrs type S_THREAD_LOCAL_VARIABLE_POINTERS ...instead of S_NON_LAZY_SYMBOL_POINTERS. This matches ld64. Part of PR50769. While here, also remove an old TODO that was done in D87178. Differential Revision: https://reviews.llvm.org/D104594	2021-06-19 12:56:42 -04:00
Greg McGary	f27e4548fc	[lld-macho] Implement ICF ICF = Identical C(ode\|OMDAT) Folding This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs. `check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64. We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`. Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF: ``` N Min Max Median Avg Stddev x 20 4.27 4.44 4.34 4.349 0.043029977 + 20 4.37 4.46 4.405 4.4115 0.025188761 Difference at 95.0% confidence 0.0625 +/- 0.0225658 1.43711% +/- 0.518873% (Student's t, pooled s = 0.0352566) ``` Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D103292	2021-06-17 10:07:44 -07:00
Vitaly Buka	b01bfdfda6	[lld][MachO] Fix UB after D103006 ubsan detected: lld/MachO/SyntheticSections.cpp:636:15: runtime error: null pointer passed as argument 2, which is declared to never be null	2021-06-14 21:15:54 -07:00
Alexander Shaposhnikov	928394d109	[lld][MachO] Add support for LC_DATA_IN_CODE Add first bits for emitting LC_DATA_IN_CODE. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D103006	2021-06-14 19:21:59 -07:00
Jez Ng	da24e6d43e	[lld-macho][nfc] Add `final` to classes where possible I wanted to see if we would get any perf wins out of this, but it doesn't seem to be the case. But it still seems worth committing. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104200	2021-06-13 19:52:03 -04:00
Jez Ng	c5c05ffa45	[lld-macho][nfc] Represent the image loader cache with a ConcatInputSection We don't need to define any special behavior for this section, so creating a subclass for it is redundant. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D104199	2021-06-13 19:51:31 -04:00
Nico Weber	5f9bc580d8	fix comment typos to cycle bots	2021-06-13 10:18:51 -04:00
Alexander Shaposhnikov	b9095f5e1a	[lld][MachO] Fix function starts section Sort the addresses stored in FunctionStarts section. Previously we were encoding potentially large numbers (due to unsigned overflow). Test plan: make check-all Differential revision: https://reviews.llvm.org/D103662	2021-06-11 17:47:28 -07:00
Jez Ng	464d3dc3d1	[lld-macho] Have dead-stripping work with literal sections Literal sections are not atomically live or dead. Rather, liveness is tracked for each individual literal they contain. CStrings have their liveness tracked via a `live` bit in StringPiece, and fixed-width literals have theirs tracked via a BitVector. The live-marking code now needs to track the offset within each section that is to be marked live, in order to identify the literal at that particular offset. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W with both `-dead_strip` and `--deduplicate-literals`, with and without this diff applied: ``` N Min Max Median Avg Stddev x 20 4.32 4.44 4.375 4.372 0.03105174 + 20 4.3 4.39 4.36 4.3595 0.023277502 No difference proven at 95.0% confidence ``` This gives us size savings of about 0.4%. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103979	2021-06-11 19:50:09 -04:00
Jez Ng	681cfeb591	[lld-macho][nfc] Have InputSection ctors take some parameters This is motivated by an upcoming diff in which the WordLiteralInputSection ctor sets itself up based on the value of its section flags. As such, it needs to be passed the `flags` value as part of its ctor parameters, instead of having them assigned after the fact in `parseSection()`. While refactoring code to make that possible, I figured it would make sense for the other InputSections to also take their initial values as ctor parameters. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D103978	2021-06-11 19:50:09 -04:00
Jez Ng	5d88f2dd94	[lld-macho] Deduplicate fixed-width literals Conceptually, the implementation is pretty straightforward: we put each literal value into a hashtable, and then write out the keys of that hashtable at the end. In contrast with ELF, the Mach-O format does not support variable-length literals that aren't strings. Its literals are either 4, 8, or 16 bytes in length. LLD-ELF dedups its literals via sorting + uniq'ing, but since we don't need to worry about overly-long values, we should be able to do a faster job by just hashing. That said, the implementation right now is far from optimal, because we add to those hashtables serially. To parallelize this, we'll need a basic concurrent hashtable (only needs to support concurrent writes w/o interleave reads), which shouldn't be to hard to implement, but I'd like to punt on it for now. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 4.27 4.39 4.315 4.3225 0.033225703 + 20 4.36 4.82 4.44 4.4845 0.13152846 Difference at 95.0% confidence 0.162 +/- 0.0613971 3.74783% +/- 1.42041% (Student's t, pooled s = 0.0959262) This corresponds to binary size savings of 2MB out of 335MB, or 0.6%. It's not a great tradeoff as-is, but as mentioned our implementation can be signficantly optimized, and literal dedup will unlock more opportunities for ICF to identify identical structures that reference the same literals. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D103113	2021-06-11 19:50:08 -04:00
Nico Weber	54418c5a35	[lld/mac] Make binaries written by lld strippable Be less clever when writing the indirect symbols in LC_DYSYMTAB: lld used to make point __stubs and __la_symbol_ptr point at the same bytes in the indirect symbol table in the __LINKEDIT segment. That confused strip, so write the same bytes twice and make __stubs and __la_symbol_ptr point at one copy each, so that they don't share data. This unconfuses strip, and seems to be what ld64 does too, so hopefully tools are generally more used to this. This makes the output binaries a bit larger, but not much: 4 bytes for roughly each called function from a dylib and each weak function. Chromium Framewoork grows by 6536 bytes, clang-format by a few hundred. With this, `strip -x Chromium\ Framework` works (244 MB before stripping to 171 MB after stripping, compared to 236 MB=>164 MB with ld64). Running strip without `-x` produces the same error message now for lld-linked Chromium Framework as for when using ld64 as a linker. `strip clang-format` also works now but didn't previously. Fixes PR50657. Differential Revision: https://reviews.llvm.org/D104081	2021-06-11 00:18:03 -04:00
Jez Ng	04259cde15	[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. Alignment Issues Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I think ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. Performance Numbers CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964	2021-06-07 23:48:35 -04:00
Nico Weber	a5645513db	[lld/mac] Implement -dead_strip Also adds support for live_support sections, no_dead_strip sections, .no_dead_strip symbols. Chromium Framework 345MB unstripped -> 250MB stripped (vs 290MB unstripped -> 236M stripped with ld64). Doing dead stripping is a bit faster than not, because so much less data needs to be processed: % ministat lld_* x lld_nostrip.txt + lld_strip.txt N Min Max Median Avg Stddev x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794 + 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651 Difference at 95.0% confidence -0.144711 +/- 0.0336749 -3.60967% +/- 0.839989% (Student's t, pooled s = 0.0358398) This interacts with many parts of the linker. I tried to add test coverage for all added `isLive()` checks, so that some test will fail if any of them is removed. I checked that the test expectations for the most part match ld64's behavior (except for live-support-iterations.s, see the comment in the test). Interacts with: - debug info - export tries - import opcodes - flags like -exported_symbol(s_list) - -U / dynamic_lookup - mod_init_funcs, mod_term_funcs - weak symbol handling - unwind info - stubs - map files - -sectcreate - undefined, dylib, common, defined (both absolute and normal) symbols It's possible it interacts with more features I didn't think of, of course. I also did some manual testing: - check-llvm check-clang check-lld work with lld with this patch as host linker and -dead_strip enabled - Chromium still starts - Chromium's base_unittests still pass, including unwind tests Implemenation-wise, this is InputSection-based, so it'll work for object files with .subsections_via_symbols (which includes all object files generated by clang). I first based this on the COFF implementation, but later realized that things are more similar to ELF. I think it'd be good to refactor MarkLive.cpp to look more like the ELF part at some point, but I'd like to get a working state checked in first. Mechanical parts: - Rename canOmitFromOutput to wasCoalesced (no behavior change) since it really is for weak coalesced symbols - Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP (`.no_dead_strip` in asm) Fixes PR49276. Differential Revision: https://reviews.llvm.org/D103324	2021-06-02 11:09:26 -04:00
Nico Weber	2c1903412b	[lld/mac] Implement removal of unused dylibs This omits load commands for unreferenced dylibs if: - the dylib was loaded implicitly, - it is marked MH_DEAD_STRIPPABLE_DYLIB - or -dead_strip_dylibs is passed This matches ld64. Currently, the "is dylib referenced" state is computed before dead code stripping and is not updated after dead code stripping. This too matches ld64. We should do better here. With this, clang-format linked with lld (like with ld64) no longer has libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport of CoreFoundation.framework, but it's not needed.) Differential Revision: https://reviews.llvm.org/D103430	2021-06-01 16:06:30 -04:00
Jez Ng	33706191d8	[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection The ELF format has the concept of merge sections (marked by SHF_MERGE), which contain data that can be safely deduplicated. The Mach-O equivalents are called literal sections (marked by S_CSTRING_LITERALS or S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word 'merge', to avoid confusion, I've renamed our MergedOutputSection to ConcatOutputSection. I believe it's a more descriptive name too. This renaming sets the stage for {D102964}. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102971	2021-05-25 14:58:29 -04:00
Jez Ng	8535834ef7	[lld-macho][nfc] Misc code cleanup * Move `static_asserts` into cpp instead of header file. I noticed they had been separated from the main class definition in the header, so I set about to clean that up, then figured it made more sense as part of the cpp file so as not to incur unnecessary compile-time overhead. * Remove unnecessary `virtual`s * Remove unnecessary comment / reword another comment	2021-05-25 14:58:29 -04:00
Mariusz Ceier	9383e9c1e6	Fix lld macho standalone build by including llvm/Config/llvm-config.h instead of llvm/Config/config.h lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include llvm/Config/config.h which doesn't exist when building standalone lld. This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so with XAR_LIB if LLVM_HAVE_LIBXAR is set. Differential Revision: https://reviews.llvm.org/D102084	2021-05-19 11:15:07 -04:00
Nico Weber	b4ead2c37b	[lld/mac] Correctly set nextdefsym In LC_DYSYMTAB, private externs were still emitted as exported symbols instead of as locals. Fixes PR50373. See bug for details. Differential Revision: https://reviews.llvm.org/D102662	2021-05-18 13:53:55 -04:00
Nico Weber	095c520fb4	[lld/mac] Propagate -(un)exported_symbol(s_list) to privateExtern in Driver That way, it's done only once instead of every time shouldExportSymbol() is called. Possibly a bit faster: % ministat at_main at_symtodo x at_main + at_symtodo N Min Max Median Avg Stddev x 30 3.9732189 4.114846 4.024621 4.0304692 0.037022865 + 30 3.93766 4.0510042 3.9973931 3.991469 0.028472565 Difference at 95.0% confidence -0.0390002 +/- 0.0170714 -0.967635% +/- 0.423559% (Student's t, pooled s = 0.0330256) In other runs with n=30 it makes no perf difference, so maybe it's just noise. But being able to quickly and conveniently answer "is this symbol exported?" is useful for fixing PR50373 and for implementing -dead_strip, so this seems like a good change regardless. No behavior change. Differential Revision: https://reviews.llvm.org/D102661	2021-05-18 07:42:58 -04:00

1 2 3 4

151 Commits