llvm-project

Commit Graph

Author	SHA1	Message	Date
Jez Ng	33706191d8	[lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection The ELF format has the concept of merge sections (marked by SHF_MERGE), which contain data that can be safely deduplicated. The Mach-O equivalents are called literal sections (marked by S_CSTRING_LITERALS or S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word 'merge', to avoid confusion, I've renamed our MergedOutputSection to ConcatOutputSection. I believe it's a more descriptive name too. This renaming sets the stage for {D102964}. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102971	2021-05-25 14:58:29 -04:00
Jez Ng	9cc0d893f7	[lld-macho][nfc] clang-format everything	2021-05-25 14:58:29 -04:00
Nico Weber	4a12248ee2	[lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619	2021-05-17 14:22:12 -04:00
Nico Weber	7b6dd265ce	[lld/mac] Copy some of the commit message of `d5a70db193` into a comment	2021-05-08 13:03:17 -04:00
Nico Weber	d5a70db193	[lld/mac] Write every weak symbol only once in the output Before this, if an inline function was defined in several input files, lld would write each copy of the inline function the output. With this patch, it only writes one copy. Reduces the size of Chromium Framework from 378MB to 345MB (compared to 290MB linked with ld64, which also does dead-stripping, which we don't do yet), and makes linking it faster: N Min Max Median Avg Stddev x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097 + 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012 Difference at 95.0% confidence -0.172162 +/- 0.083847 -4.14165% +/- 2.01709% (Student's t, pooled s = 0.0892373) Implementation-wise, when merging two weak symbols, this sets a "canOmitFromOutput" on the InputSection belonging to the weak symbol not put in the symbol table. We then don't write InputSections that have this set, as long as they are not referenced from other symbols. (This happens e.g. for object files that don't set .subsections_via_symbols or that use .alt_entry.) Some restrictions: - not yet done for bitcode inputs - no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless. - However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size) - This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now Things that work based on symbols Just Work: - map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit) - exports Things that work with inputSections need to explicitly check if an inputSection is written (e.g. unwind info). This patch is useful in itself, but it's also likely also a useful foundation for dead_strip. I used to have a "canoncialRepresentative" pointer on InputSection instead of just the bool, which would be handy for ICF too. But I ended up not needing it for this patch, so I removed that again for now. Differential Revision: https://reviews.llvm.org/D102076	2021-05-07 17:11:40 -04:00
Jez Ng	05c5363b39	[lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag Eventually we'll use this flag to properly handle bl/blx opcodes. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D101558	2021-04-30 16:17:26 -04:00
Jez Ng	7ca133c360	[lld-macho] std::sort -> llvm::sort	2021-04-27 18:02:59 -04:00
Nico Weber	c1b2a7bfbf	[lld/mac] make a few "named parameter comments" more consistent Most of LLVM and almost all of lld/MachO uses `/foo=/bar` style. No behavior change.	2021-04-22 10:48:03 -04:00
Jez Ng	1460942c15	[lld-macho] Add 32-bit compact unwind support This could probably have been part of D99633, but I split it up to make things a bit more reviewable. I also fixed some bugs in the implementation that were masked through integer underflows when operating in 64-bit mode. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D99823	2021-04-15 21:16:33 -04:00
Jez Ng	8ca366935b	Revert "[lld-macho] Add support for arm64_32" and other stacked diffs This reverts commits: * `8914902b01` * `35a745d814` * `682d1dfe09`	2021-04-13 12:40:58 -04:00
Jez Ng	35a745d814	[lld-macho] Add 32-bit compact unwind support This could probably have been part of D99633, but I split it up to make things a bit more reviewable. I also fixed some bugs in the implementation that were masked through integer underflows when operating in 64-bit mode. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D99823	2021-04-13 10:43:28 -04:00
Jez Ng	817d98d841	[lld-macho][nfc] Refactor in preparation for 32-bit support The main challenge was handling the different on-disk structures (e.g. `mach_header` vs `mach_header_64`). I tried to strike a balance between sprinkling `target->wordSize == 8` checks everywhere (branchy = slow, and ugly) and templatizing everything (causes code bloat, also ugly). I think I struck a decent balance by judicious use of type erasure. Note that LLD-ELF has a similar architecture, though it seems to use more templating. Linking chromium_framework takes about the same time before and after this change: N Min Max Median Avg Stddev x 20 4.52 4.67 4.595 4.5945 0.044423204 + 20 4.5 4.71 4.575 4.582 0.056344803 No difference proven at 95.0% confidence Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D99633	2021-04-02 18:46:39 -04:00
Alexander Shaposhnikov	f6ad045366	[lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols This diff addresses FIXME in SyntheticSections.cpp and removes the dependency of emitEndFunStab on .subsections_via_symbols. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D99054	2021-04-01 17:48:09 -07:00
Greg McGary	427d359721	[lld-macho][NFC] Drop unnecessary macho:: namespace prefix on unambiguous references to Symbol Within `lld/macho/`, only `InputFiles.cpp` and `Symbols.h` require the `macho::` namespace qualifier to disambiguate references to `class Symbol`. Add braces to outer `for` of a 5-level single-line `if`/`for` nest. Differential Revision: https://reviews.llvm.org/D99555	2021-03-30 14:58:35 -07:00
Greg McGary	98fe9e41f7	[lld-macho][NFC] add const to pointer/reference induction variables of range-based for loops Pointer and reference induction variables of range-based for loops are often const, and code authors often lax about qualifying them. Differential Revision: https://reviews.llvm.org/D98317	2021-03-10 12:07:31 -08:00
Greg McGary	fdc0c21973	[lld-macho][NFC] when reasonable, replace auto keyword with type names lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ... * redundancy, as for decls such as `auto t = mumble_cast<TYPE >` or similar that specifies the result type on the RHS * verbosity, as for iterators * gratuitous suffering, as for lambdas Along the way, add `const` when appropriate. Note: a future diff will ... * add more `const` qualifiers * remove `opt::` when we are already `using llvm::opt` Differential Revision: https://reviews.llvm.org/D98313	2021-03-09 22:08:32 -08:00
Nico Weber	0658fc654c	[lld/mac] Implement the missing bits of -undefined This adds support for `-undefined dynamic_lookup`, and for `-undefined warning` and `-undefined suppress` with `-flat_namespace`. We just replace undefined symbols with a DynamicLookup when we hit them. With this, `check-llvm` passes when using ld64.lld.darwinnew as host linker. Differential Revision: https://reviews.llvm.org/D97642	2021-03-01 15:30:53 -05:00
Jez Ng	4a5e111aea	[lld-macho] Better deduplication of personality pointers {D95809} introduced a mechanism for synthetic symbol creation of personality pointers. When multiple section relocations referred to the same personality pointer, it would deduplicate them. However, it neglected to consider that we could have symbol relocations that also refer to the same personality pointer. This diff fixes it. In practice, this mix of relocations arises when there is a statically-linked personality routine that is referenced from multiple object files. Within the same object file, it will be referred to via section relocations, but (obviously) other object files will refer to it via symbol relocations. Failing to deduplicate these references resulted in us going over the 3-personality-pointer limit when linking some larger applications. Fixes llvm.org/PR48389. Reviewed By: #lld-macho, thakis, alexshap Differential Revision: https://reviews.llvm.org/D97245	2021-02-23 22:02:38 -05:00
Jez Ng	ac9dd247da	[lld-macho] Try to make ubsan happy Summary: We should avoid passing a null pointer to memcpy.	2021-02-08 14:51:36 -05:00
Jez Ng	5112035751	[lld-macho] Emit LSDA info in compact unwind The LSDA pointers are encoded as offsets from the image base, and arranged in one big contiguous array. Each second-level page records the offset within that LSDA array which corresponds to the LSDA for its first CU entry. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D95810	2021-02-08 13:48:00 -05:00
Jez Ng	525bfa10ec	[lld-macho] Emit personalities in compact unwind Note that there is a triple indirection involved with personalities and compact unwind: 1. Two bits of each CU encoding are used as an offset into the personality array. 2. Each entry of the personality array is an offset from the image base. The resulting address (after adding the image base) should point within the GOT. 3. The corresponding GOT entry contains the actual pointer to the personality function. To further complicate things, when the personality function is in the object file (as opposed to a dylib), its references in `__compact_unwind` may refer to it via a section + offset relocation instead of a symbol relocation. Since our GOT implementation can only create entries for symbols, we have to create a synthetic symbol at the given section offset. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D95809	2021-02-08 13:47:59 -05:00
Greg McGary	c3e4f3b231	[lld-macho] Fix alignment & layout to match ld64 and satisfy kernel & codesign The Mach kernel & codesign on arm64 macOS has strict requirements for alignment and sequence of segments and sections. Dyld probably is just as picky, though kernel & codesign reject malformed Mach-O files before dyld ever has a chance. I developed this diff by incrementally changing alignments & sequences to match the output of ld64. I stopped when my hello-world test program started working: `codesign --verify` succeded, and `execve(2)` didn't immediately fail with `errno == EBADMACHO` = `"Malformed Mach-O file"`. Differential Revision: https://reviews.llvm.org/D94935	2021-02-05 17:22:03 -07:00
Nico Weber	568824798f	fix typo to cycle bots	2021-01-01 22:28:11 -05:00
Fangrui Song	791fe7ac57	[lld-macho] Fix memcpy ub after D93267	2020-12-20 20:01:20 -08:00
Greg McGary	99930719c6	Handle overflow beyond the 127 common encodings limit The common encodings table holds only 127 entries. The encodings index for compact entries is 8 bits wide, and indexes 127..255 are stored locally to each second-level page. Prior to this diff, lld would `fatal()` if encodings overflowed the 127 limit. This diff populates a per-second-level-page encodings table as needed. When the per-page encodings table hits its limit, we must terminate the page. If such early termination would consume fewer entries than a regular (non-compact) encoding page, then we prefer the regular format. Caveat: one reason the common-encoding table might overflow is because of DWARF debug-info references, which are not yet implemented and will come with a later diff. Differential Revision: https://reviews.llvm.org/D93267	2020-12-19 14:54:37 -08:00
Nico Weber	126f58e838	fix typos to cycle bots	2020-12-01 20:27:33 -05:00
Greg McGary	cba45514fb	align __TEXT,__unwind_info to 8 byte boundary	2020-09-19 12:43:30 -07:00
Greg McGary	2124ca1d5c	[lld-macho] create __TEXT,__unwind_info from __LD,__compact_unwind Digest the input `__LD,__compact_unwind` and produce the output `__TEXT,__unwind_info`. This is the initial commit with the major functionality. Successor commits will add handling for ... * `__TEXT,__eh_frame` * personalities & LSDA * `-r` pass-through Differential Revision: https://reviews.llvm.org/D86805	2020-09-18 22:01:03 -07:00

28 Commits