llvm-project

Commit Graph

Author	SHA1	Message	Date
Greg McGary	93c8559baf	[lld-macho] Implement branch-range-extension thunks Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it. Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`. FIXME: * arm64-stubs.s test is broken * add thunk tests * Handle thunks to DylibSymbol in MergedOutputSection::finalize() Differential Revision: https://reviews.llvm.org/D100818	2021-05-12 09:44:58 -07:00
Nico Weber	d5a70db193	[lld/mac] Write every weak symbol only once in the output Before this, if an inline function was defined in several input files, lld would write each copy of the inline function the output. With this patch, it only writes one copy. Reduces the size of Chromium Framework from 378MB to 345MB (compared to 290MB linked with ld64, which also does dead-stripping, which we don't do yet), and makes linking it faster: N Min Max Median Avg Stddev x 10 3.9957051 4.3496981 4.1411121 4.156837 0.10092097 + 10 3.908154 4.169318 3.9712729 3.9846753 0.075773012 Difference at 95.0% confidence -0.172162 +/- 0.083847 -4.14165% +/- 2.01709% (Student's t, pooled s = 0.0892373) Implementation-wise, when merging two weak symbols, this sets a "canOmitFromOutput" on the InputSection belonging to the weak symbol not put in the symbol table. We then don't write InputSections that have this set, as long as they are not referenced from other symbols. (This happens e.g. for object files that don't set .subsections_via_symbols or that use .alt_entry.) Some restrictions: - not yet done for bitcode inputs - no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless. - However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size) - This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now Things that work based on symbols Just Work: - map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit) - exports Things that work with inputSections need to explicitly check if an inputSection is written (e.g. unwind info). This patch is useful in itself, but it's also likely also a useful foundation for dead_strip. I used to have a "canoncialRepresentative" pointer on InputSection instead of just the bool, which would be handy for ICF too. But I ended up not needing it for this patch, so I removed that again for now. Differential Revision: https://reviews.llvm.org/D102076	2021-05-07 17:11:40 -04:00
Jez Ng	05c5363b39	[lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag Eventually we'll use this flag to properly handle bl/blx opcodes. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D101558	2021-04-30 16:17:26 -04:00
Jez Ng	eb5b7d4497	[lld-macho] LTO: Unset VisibleToRegularObj where possible This allows LLVM's LTO to internalize symbols that are not referenced directly by regular objects. Naturally, this means we need to track which symbols are referenced by regular objects. The approach taken here is similar to LLD-COFF's: like the COFF port, we extend `SymbolTable::insert()` to set the isVisibleToRegularObj bit. (LLD-ELF relies on the Symbol constructor and `Symbol::mergeProperties()`, but the Mach-O port does not have a `mergeProperties()` equivalent.) From what I can tell, ld64 (which uses libLTO) doesn't do this optimization at all. I'm not even sure libLTO provides a way to do this. Not having ld64's behavior as a reference implementation is unfortunate; instead, I am relying on LLD-ELF/COFF's behavior as references while erring on the conservative side. In particular, LLD-MachO will only do this optimization for executables right now. We also don't attempt it when `-flat_namespace` is used -- otherwise we'd need scan the symbol table to find matches for every un-namespaced symbol reference, which is expensive. internalize.ll is based off the LLD-ELF tests `internalize-basic.ll` and `internalize-undef.ll`. Looks like @davide added some of LLD-ELF's internalize tests, so adding him as a reviewer... Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D99105	2021-04-15 21:16:33 -04:00
Jez Ng	ceec610754	[lld-macho] Fix & refactor symbol size calculations I noticed two problems with the previous implementation: * N_ALT_ENTRY symbols weren't being handled correctly -- they should determine the size of the previous symbol, even though they don't cause a new section to be created * The last symbol in a section had its size calculated wrongly; the first subsection's size was used instead of the last one I decided to take the opportunity to refactor things as well, mainly to realize my observation [here](https://reviews.llvm.org/D98837#inline-931511) that we could avoid doing a binary search to match symbols with subsections. I think the resulting code is a bit simpler too. N Min Max Median Avg Stddev x 20 4.31 4.43 4.37 4.3775 0.034162922 + 20 4.32 4.43 4.38 4.3755 0.02799906 No difference proven at 95.0% confidence Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D99972	2021-04-06 15:10:01 -04:00
Jez Ng	174deb0539	[lld-macho] clang-format cleanup find . -type f -name ".cpp" -o -name ".h" \| xargs clang-format -i	2021-04-06 14:26:13 -04:00
Greg McGary	3f8c6f493b	[lld-macho][NFC] Remove redundant member from class Defined `class Symbol` defines a data member `InputFile file;` `class Defined` inherits from `Symbol` and also defines a data member `InputFile file;` for no apparent purpose. Differential Revision: https://reviews.llvm.org/D99783	2021-04-02 09:19:15 -07:00
Alexander Shaposhnikov	f6ad045366	[lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols This diff addresses FIXME in SyntheticSections.cpp and removes the dependency of emitEndFunStab on .subsections_via_symbols. Test plan: make check-lld-macho Differential revision: https://reviews.llvm.org/D99054	2021-04-01 17:48:09 -07:00
Vy Nguyen	e2f34cc330	[lld-macho][nfc] Removed unnecessary static_cast Differential Revision: https://reviews.llvm.org/D99365	2021-03-25 15:07:46 -04:00
Vy Nguyen	66f340051a	[lld-macho] Define __mh_*_header synthetic symbols. Bug: https://bugs.llvm.org/show_bug.cgi?id=49290 Differential Revision: https://reviews.llvm.org/D97007	2021-03-19 14:14:40 -04:00
Jez Ng	d8283d9ddc	[lld-macho][nfc] Give every SyntheticSection a fake InputSection Previously, it was difficult to write code that handled both synthetic and regular sections generically. We solve this problem by creating a fake InputSection at the start of every SyntheticSection. This refactor allows us to handle DSOHandle like a regular Defined symbol (since Defined symbols must be attached to an InputSection), and paves the way for supporting `__mh_*header` symbols. Additionally, it simplifies our binding/rebase code. I did have to extend Defined a little -- it now has a `linkerInternal` flag, to indicate that `___dso_handle` should not be in the final symbol table. I've also added some additional testing for `___dso_handle`. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D98545	2021-03-12 17:26:27 -05:00
Nico Weber	cafb6cd10c	[lld/mac] Add some support for dynamic lookup symbols, and implement -U Dynamic lookup symbols are symbols that work like dynamic symbols in ELF: They're not bound to a dylib like normal Mach-O twolevel lookup symbols, but they live in a global pool and dyld resolves them against exported symbols from all loaded dylibs. This adds support for dynamical lookup symbols to lld/mac. They are represented as DylibSymbols with file set to nullptr. This also uses this support to implement the -U flag, which makes a specific symbol that's undefined at the end of the link a dynamic lookup symbol. For -U, it'd be sufficient to just to a pass over remaining undefined symbols at the end of the link and to replace them with dynamic lookup symbols then. But I'd like to use this code to implement flat_namespace too, and that will require real support for resolving dynamic lookup symbols in SymbolTable. So this patch adds this now already. While writing tests for this, I noticed that we didn't set N_WEAK_DEF in the symbol table for DylibSymbols, so this fixes that too. Differential Revision: https://reviews.llvm.org/D97521	2021-02-26 16:50:53 -05:00
Jez Ng	163dcd8513	[lld-macho] Associate each Symbol with an InputFile This makes our error messages more informative. But the bigger motivation is for LTO symbol resolution, which will be in an upcoming diff. The changes in this one are largely mechanical. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D94316	2021-02-03 13:43:47 -05:00
Nico Weber	13f439a187	[lld/mac] Implement support for private extern symbols Private extern symbols are used for things scoped to the linkage unit. They cause duplicate symbol errors (so they're in the symbol table, unlike TU-scoped truly local symbols), but they don't make it into the export trie. They are created e.g. by compiling with -fvisibility=hidden. If two weak symbols have differing privateness, the combined symbol is non-private external. (Example: inline functions and some TUs that include the header defining it were built with -fvisibility-inlines-hidden and some weren't). A weak private external symbol implicitly has its "weak" dropped and behaves like a regular strong private external symbol: Weak is an export trie concept, and private symbols are not in the export trie. If a weak and a strong symbol have different privateness, the strong symbol wins. If two common symbols have differing privateness, the larger symbol wins. If they have the same size, the privateness of the symbol seen later during the link wins (!) -- this is a bit lame, but it matches ld64 and this behavior takes 2 lines less to implement than the less surprising "result is non-private external), so match ld64. (Example: `int a` in two .c files, both built with -fcommon, one built with -fvisibility=hidden and one without.) This also makes `__dyld_private` a true TU-local symbol, matching ld64. To make this work, make the `const char*` StringRefZ ctor to correctly set `size` (without this, writing the string table crashed when calling getName() on the __dyld_private symbol). Mention in CommonSymbol's comment that common symbols are now disabled by default in clang. Mention in -keep_private_externs's HelpText that the flag only has an effect with `-r` (which we don't implement yet -- so this patch here doesn't regress any behavior around -r + -keep_private_externs)). ld64 doesn't explicitly document it, but the commit text of http://reviews.llvm.org/rL216146 does, and ld64's OutputFile::buildSymbolTable() checks `_options.outputKind() == Options::kObjectFile` before calling `_options.keepPrivateExterns()` (the only reference to that function). Fixes PR48536. Differential Revision: https://reviews.llvm.org/D93609	2020-12-21 21:23:33 -05:00
Jez Ng	4c8276cdc1	[lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs Note that dylibs without any refs will still be loaded in the usual (strong) fashion. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D93435	2020-12-17 08:49:17 -05:00
Jez Ng	811444d7a1	[lld-macho] Add support for weak references Weak references need not necessarily be satisfied at runtime (but they must still be satisfied at link time). So symbol resolution still works as per usual, but we now pass around a flag -- ultimately emitting it in the bind table -- to indicate if a given dylib symbol is a weak reference. ld64's behavior for symbols that have both weak and strong references is a bit bizarre. For non-function symbols, it will emit a weak import. For function symbols (those referenced by BRANCH relocs), it will emit a regular import. I'm not sure what value there is in that behavior, and since emulating it will make our implementation more complex, I've decided to treat regular weakrefs like function symbol ones for now. Fixes PR48511. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D93369	2020-12-17 08:49:16 -05:00
Nico Weber	07ab597bb0	[lld/mac] Fix issues around thin archives - most importantly, fix a use-after-free when using thin archives, by putting the archive unique_ptr to the arena allocator. This ports D65565 to MachO - correctly demangle symbol namess from archives in diagnostics - add a test for thin archives -- it finds this UaF, but only when running it under asan (it also finds the demangling fix) - make forceLoadArchive() use addFile() with a bool to have the archive loading code in fewer places. no behavior change; matches COFF port a bit better Differential Revision: https://reviews.llvm.org/D92360	2020-12-01 18:48:29 -05:00
Jez Ng	62a3f0c984	[lld-macho] Support absolute symbols They operate like Defined symbols but with no associated InputSection. Note that `ld64` seems to treat the weak definition flag like a no-op for absolute symbols, so I have replicated that behavior. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D87909	2020-09-25 11:28:35 -07:00
Jez Ng	c32e69b2ce	[lld-macho][re-land] Initial support for common symbols Fix earlier build break via a static_cast. This reverts commit `8112d494d3`. Differential Revision: https://reviews.llvm.org/D86909	2020-09-24 15:00:20 -07:00
Muhammad Omair Javaid	8112d494d3	Revert "[lld-macho] Initial support for common symbols" This reverts commit `63ace77962`. Breaks LLDB Arm build: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4409	2020-09-24 12:26:40 +05:00
Jez Ng	5d26bd3b75	[lld-macho] Emit indirect symbol table Makes it a little easier to read objdump's disassembly. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D87178	2020-09-23 19:26:40 -07:00
Jez Ng	63ace77962	[lld-macho] Initial support for common symbols On Unix, it is traditionally allowed to write variable definitions without initialization expressions (such as "int foo;") to header files. These are called tentative definitions. The compiler creates common symbols when it sees tentative definitions. When linking the final binary, if there are remaining common symbols after name resolution is complete, the linker converts them to regular defined symbols in a `__common` section. This diff implements most of that functionality, though we do not yet handle the case where there are both common and non-common definitions of the same symbol. Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D86909	2020-09-23 19:26:40 -07:00
Jez Ng	0407197711	[lld-macho] Support GOT relocations to __dso_handle Found such a relocation while testing some real world programs. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D86642	2020-08-27 17:44:17 -07:00
Jez Ng	2a38dba7dd	[lld-macho] Emit binding opcodes for defined symbols that override weak dysyms These opcodes tell dyld to coalesce the overridden weak dysyms to this particular symbol definition. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D86575	2020-08-27 17:44:16 -07:00
Jez Ng	e263287c79	[lld-macho] Implement weak binding for branch relocations Since there is no "weak lazy" lookup, function calls to weak symbols are always non-lazily bound. We emit both regular non-lazy bindings as well as weak bindings, in order that the weak bindings may overwrite the non-lazy bindings if an appropriate symbol is found at runtime. However, the bound addresses will still be written (non-lazily) into the LazyPointerSection. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D86573	2020-08-27 17:44:15 -07:00
Jez Ng	cbe27316ef	[lld-macho] Implement weak bindings for GOT/TLV Previously, we were only emitting regular bindings to weak dynamic symbols; this diff adds support for the weak bindings too, which can overwrite the regular bindings at runtime. We also treat weak defined global symbols similarly -- since they can also be interposed at runtime, they need to be treated as potentially dynamic symbols. Note that weak bindings differ from regular bindings in that they do not specify the dylib to do the lookup in (i.e. weak symbol lookup happens in a flat namespace.) Differential Revision: https://reviews.llvm.org/D86572	2020-08-26 19:21:09 -07:00
Jez Ng	3c9100fb78	[lld-macho] Support dynamic linking of thread-locals References to symbols in dylibs work very similarly regardless of whether the symbol is a TLV. The main difference is that we have a separate `__thread_ptrs` section that acts as the GOT for these thread-locals. We can identify thread-locals in dylibs by a flag in their export trie entries, and we cross-check it with the relocations that refer to them to ensure that we are not using a GOT relocation to reference a thread-local (or vice versa). Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D85081	2020-08-12 19:50:09 -07:00
Jez Ng	3587de2281	[lld-macho] Support __dso_handle for C++ The C++ ABI requires dylibs to pass a pointer to __cxa_atexit which does e.g. cleanup of static global variables. The C++ spec says that the pointer can point to any address in one of the dylib's segments, but in practice ld64 seems to set it to point to the header, so that's what's implemented here. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D83603	2020-07-30 14:28:41 -07:00
Jez Ng	31d5885842	[lld-macho] Partial support for weak definitions This diff adds support for weak definitions, though it doesn't handle weak symbols in dylibs quite correctly -- we need to emit binding opcodes for them in the weak binding section rather than the lazy binding section. What is covered in this diff: 1. Reading the weak flag from symbol table / export trie, and writing it to the export trie 2. Refining the symbol table's rules for choosing one symbol definition over another. Wrote a few dozen test cases to make sure we were matching ld64's behavior. We can now link basic C++ programs. Reviewed By: #lld-macho, compnerd Differential Revision: https://reviews.llvm.org/D83532	2020-07-24 15:55:25 -07:00
Jez Ng	a12e7d406d	[lld-macho] Handle GOT relocations of non-dylib symbols Summary: Turns out this case is actually really common -- it happens whenever there's a reference to an `extern` variable that ends up statically linked. Depends on D80856. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Reviewed By: smeenai Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80857	2020-06-17 20:41:28 -07:00
Jez Ng	a04c133564	[lld-macho] Set __PAGEZERO size to 4GB That's what ld64 uses for 64-bit targets. I figured it's best to make this change sooner rather than later since a bunch of our tests are relying on hardcoded addresses that depend on this value. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D80177	2020-06-02 13:19:38 -07:00
Kellie Medlin	2b920ae78c	[lld] Add archive file support to Mach-O backend With this change, basic archive files can be linked together. Input section discovery has been refactored into a function since archive files lazily resolve their symbols / the object files containing those symbols. Reviewed By: int3, smeenai Differential Revision: https://reviews.llvm.org/D78342	2020-05-14 12:58:35 -07:00
Jez Ng	b3e2fc931d	[lld-macho] Support calls to functions in dylibs Summary: This diff implements lazy symbol binding -- very similar to the PLT mechanism in ELF. ELF's .plt section is broken up into two sections in Mach-O: StubsSection and StubHelperSection. Calls to functions in dylibs will end up calling into StubsSection, which contains indirect jumps to addresses stored in the LazyPointerSection (the counterpart to ELF's .plt.got). Initially, the LazyPointerSection contains addresses that point into one of the entry points in the middle of the StubHelperSection. The code in StubHelperSection will push on the stack an offset into the LazyBindingSection. The push is followed by a jump to the beginning of the StubHelperSection (similar to PLT0), which then calls into dyld_stub_binder. dyld_stub_binder is a non-lazily bound symbol, so this call looks it up in the GOT. The stub binder will look up the bind opcodes in the LazyBindingSection at the given offset. The bind opcodes will tell the binder to update the address in the LazyPointerSection to point to the symbol, so that subsequent calls don't have to redo the symbol resolution. The binder will then jump to the resolved symbol. Depends on D78269. Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78270	2020-05-09 20:56:22 -07:00
Kellie Medlin	6cb073133c	[lld] Merge Mach-O input sections Summary: Similar to other formats, input sections in the MachO implementation are now grouped under output sections. This is primarily a refactor, although there's some new logic (like resolving the output section's flags based on its inputs). Differential Revision: https://reviews.llvm.org/D77893	2020-05-01 16:57:18 -07:00
Jez Ng	df92377823	[lld-macho] Have Symbol::getVA() return a non-relative virtual address Currently, getVA() returns a virtual address with the assumption that the ImageBase is zero. As I understand, this is what lld-ELF is doing. However, under our current design, it seems like an awkward setup -- I'm finding that I have to add and subtract ImageBase in several places to make things work out. As such, I think it's simpler to have getVA() return a non-relative VA, but I'm not sure if I'm missing something. Would love to hear more from folks familiar with lld-ELF. Differential Revision: https://reviews.llvm.org/D78168	2020-04-29 15:44:50 -07:00
Jez Ng	060efd24c7	[lld-macho] Add basic support for linking against dylibs This diff implements: * dylib loading (much of which is being restored from @pcc and @ruiu's original work) * The GOT_LOAD relocation, which allows us to load non-lazy dylib symbols * Basic bind opcode emission, which tells `dyld` how to populate the GOT Differential Revision: https://reviews.llvm.org/D76252	2020-04-21 13:43:19 -07:00
Fangrui Song	6acd300375	Reland D75382 "[lld] Initial commit for new Mach-O backend" With a fix for http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/3636 Also trims some unneeded dependencies.	2020-04-02 12:03:43 -07:00
Oliver Stannard	af39151f3c	Revert "[lld] Initial commit for new Mach-O backend" This is causing buildbot failures on 32-bit hosts, for example: http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/3636 This reverts commit `03f43b3aca`.	2020-04-02 13:23:30 +01:00
Jez Ng	03f43b3aca	[lld] Initial commit for new Mach-O backend Summary: This is the first commit for the new Mach-O backend, designed to roughly follow the architecture of the existing ELF and COFF backends, and building off work that @ruiu and @pcc did in a branch a while back. Note that this is a very stripped-down commit with the bare minimum of functionality for ease of review. We'll be following up with more diffs soon. Currently, we're able to generate a simple "Hello World!" executable that runs on OS X Catalina (and possibly on earlier OS X versions; I haven't tested them). (This executable can be obtained by compiling `test/MachO/relocations.s`.) We're mocking out a few load commands to achieve this -- for example, we can't load dynamic libraries, but Catalina requires binaries to be linked against `dyld`, so we hardcode the emission of a `LC_LOAD_DYLIB` command. Other mocked out load commands include LC_SYMTAB and LC_DYSYMTAB. Differential Revision: https://reviews.llvm.org/D75382	2020-03-31 11:58:47 -07:00

39 Commits