Commit Graph

67 Commits

Author SHA1 Message Date
Jez Ng 55a32812fa [lld-macho] Filter TAPI re-exports by target
Previously, we were loading re-exports without checking whether
they were compatible with our target. Prior to {D97209}, it meant that
we were defining dylib symbols that were invalid -- usually a silent
failure unless our binary actually used them. D97209 exposed this as an
explicit error.

Along the way, I've extended our TAPI compatibility check to cover the
platform as well, instead of just checking the arch. To this end, I've
replaced MachO::Architecture with MachO::Target in our Config struct.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97867
2021-03-04 14:36:47 -05:00
Mikael Holmen 85b67d5fa9 [lld][MachO] Silence "enumeral and non-enumeral type" warning from gcc
gcc complained with

[1110/1140] Building CXX object tools/lld/MachO/CMakeFiles/lldMachO2.dir/SyntheticSections.cpp.o
../../lld/MachO/SyntheticSections.cpp: In function 'int16_t ordinalForDylibSymbol(const lld::macho::DylibSymbol&)':
../../lld/MachO/SyntheticSections.cpp:287:14: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
  286 |   return config->namespaceKind == NamespaceKind::flat || dysym.isDynamicLookup()
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  287 |              ? MachO::BIND_SPECIAL_DYLIB_FLAT_LOOKUP
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  288 |              : dysym.getFile()->ordinal;
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-03-03 10:39:35 +01:00
Nico Weber 8174f33dc9 [lld/mac] Add support for -flat_namespace
-flat_namespace makes lld emit binaries that use name lookup that's more in
line with other POSIX systems: Instead of looking up symbols as (dylib,name)
pairs by dyld, they're instead looked up just by name.

-flat_namespace has three effects:

1. MH_TWOLEVEL and MH_NNOUNDEFS are no longer set in the Mach-O header
2. All symbols use BIND_SPECIAL_DYLIB_FLAT_LOOKUP as ordinal
3. When a dylib is added to the link, its dependent dylibs are also added,
   so that lld can verify that no undefined symbols remain at the end of
   a link with -flat_namespace. These transitive dylibs are added for symbol
   resolution, but they are not emitted in LC_LOAD_COMMANDs.

-undefined with -flat_namespace still isn't implemented. Before this change,
it was impossible to hit that combination because -flat_namespace caused a
diagnostic. Now that it no longer does, emit a dedicated temporary diagnostic
when both flags are used.

Differential Revision: https://reviews.llvm.org/D97641
2021-03-01 15:25:10 -05:00
Nico Weber 860e862f34 [lld/mac] Simplify encodeDylibOrdinal() a bit
Only one of the two callers used the lastBinding parameter, so
do that work at that one call site. Extract a ordinalForDylibSymbol()
helper to make this tidy.

No behavior change.

Differential Revision: https://reviews.llvm.org/D97597
2021-02-28 09:16:45 -05:00
Nico Weber cafb6cd10c [lld/mac] Add some support for dynamic lookup symbols, and implement -U
Dynamic lookup symbols are symbols that work like dynamic symbols
in ELF: They're not bound to a dylib like normal Mach-O twolevel lookup
symbols, but they live in a global pool and dyld resolves them against
exported symbols from all loaded dylibs.

This adds support for dynamical lookup symbols to lld/mac. They are
represented as DylibSymbols with file set to nullptr.

This also uses this support to implement the -U flag, which makes
a specific symbol that's undefined at the end of the link a
dynamic lookup symbol.

For -U, it'd be sufficient to just to a pass over remaining undefined symbols
at the end of the link and to replace them with dynamic lookup symbols then.
But I'd like to use this code to implement flat_namespace too, and that will
require real support for resolving dynamic lookup symbols in SymbolTable. So
this patch adds this now already.

While writing tests for this, I noticed that we didn't set N_WEAK_DEF in the
symbol table for DylibSymbols, so this fixes that too.

Differential Revision: https://reviews.llvm.org/D97521
2021-02-26 16:50:53 -05:00
Greg McGary 151990dd94 [lld-macho] add code signature for native arm64 macOS
Differential Revision: https://reviews.llvm.org/D96164
2021-02-24 17:05:23 -08:00
Vy Nguyen 5a856f5b44 Reland [lld-macho]Implement bundle_loader
Reland 1a0afcf518
  https://reviews.llvm.org/D95913

New change: fix UB bug caused by copying empty path/name. (since the executable does not have a name)
2021-02-22 14:05:12 -05:00
Jez Ng 8c4638c367 [lld-macho] Clean up comments 2021-02-22 12:13:17 -05:00
Jez Ng 5bfdbdeb40 [lld-macho] Fix cpuSubtype for non-x86_64 archs
dyld on iOS will complain if the LIB64 bit is set.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D96565
2021-02-22 12:13:17 -05:00
Vitaly Buka c17547df44 Revert "Implement -bundle_loader"
D95913 passes null pointer into memcpy

This reverts commit 1a0afcf518.
2021-02-19 17:40:07 -08:00
Vy Nguyen 1a0afcf518 Implement -bundle_loader
Differential Revision: https://reviews.llvm.org/D95913

Usage: -bundle_loader <executable>
This option specifies the executable that will load the build output file being linked.
When building a bundle, users can use the --bundle_loader  to specify an executable
that contains symbols referenced, but not implemented in the bundle.
2021-02-18 16:11:37 -05:00
Greg McGary 87104faac4 [lld-macho] Add ARM64 target arch
This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working.

I can add more tests to this base commit, or add them in follow-up commits.

It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated.

Differential Revision: https://reviews.llvm.org/D88629
2021-02-08 18:14:07 -07:00
Jez Ng 525bfa10ec [lld-macho] Emit personalities in compact unwind
Note that there is a triple indirection involved with
personalities and compact unwind:

1. Two bits of each CU encoding are used as an offset into the
   personality array.
2. Each entry of the personality array is an offset from the image base.
   The resulting address (after adding the image base) should point within the
   GOT.
3. The corresponding GOT entry contains the actual pointer to the
   personality function.

To further complicate things, when the personality function is in the
object file (as opposed to a dylib), its references in
`__compact_unwind` may refer to it via a section + offset relocation
instead of a symbol relocation. Since our GOT implementation can only
create entries for symbols, we have to create a synthetic symbol at the
given section offset.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D95809
2021-02-08 13:47:59 -05:00
Greg McGary c3e4f3b231 [lld-macho] Fix alignment & layout to match ld64 and satisfy kernel & codesign
The Mach kernel & codesign on arm64 macOS has strict requirements for alignment and sequence of segments and sections. Dyld probably is just as picky, though kernel & codesign reject malformed Mach-O files before dyld ever has a chance.

I developed this diff by incrementally changing alignments & sequences to match the output of ld64. I stopped when my hello-world test program started working: `codesign --verify` succeded, and `execve(2)` didn't immediately fail with `errno == EBADMACHO` = `"Malformed Mach-O file"`.

Differential Revision: https://reviews.llvm.org/D94935
2021-02-05 17:22:03 -07:00
Jez Ng 163dcd8513 [lld-macho] Associate each Symbol with an InputFile
This makes our error messages more informative. But the bigger motivation is for
LTO symbol resolution, which will be in an upcoming diff. The changes in this
one are largely mechanical.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D94316
2021-02-03 13:43:47 -05:00
Greg McGary 3a9d2f1488 [lld-macho][NFC] refactor relocation handling
Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes.

Many cleanups

Differential Revision: https://reviews.llvm.org/D95121
2021-02-02 10:54:53 -07:00
Fangrui Song 7916fd71e9 [lld-macho] Fix GCC -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build 2021-01-06 10:58:46 -08:00
Nico Weber 57ffbe020a glld/mac] Don't add names of unreferenced symbols to string table
Before this, a hello world program would contain many many unnecessary
entries in its string table.

No behavior change, just makes the string table in the output smaller
and more like ld64's.

Differential Revision: https://reviews.llvm.org/D93711
2020-12-22 15:52:33 -05:00
Nico Weber 13f439a187 [lld/mac] Implement support for private extern symbols
Private extern symbols are used for things scoped to the linkage unit.
They cause duplicate symbol errors (so they're in the symbol table,
unlike TU-scoped truly local symbols), but they don't make it into the
export trie. They are created e.g. by compiling with
-fvisibility=hidden.

If two weak symbols have differing privateness, the combined symbol is
non-private external. (Example: inline functions and some TUs that
include the header defining it were built with
-fvisibility-inlines-hidden and some weren't).

A weak private external symbol implicitly has its "weak" dropped and
behaves like a regular strong private external symbol: Weak is an export
trie concept, and private symbols are not in the export trie.

If a weak and a strong symbol have different privateness, the strong
symbol wins.

If two common symbols have differing privateness, the larger symbol
wins. If they have the same size, the privateness of the symbol seen
later during the link wins (!) -- this is a bit lame, but it matches
ld64 and this behavior takes 2 lines less to implement than the less
surprising "result is non-private external), so match ld64.
(Example: `int a` in two .c files, both built with -fcommon,
one built with -fvisibility=hidden and one without.)

This also makes `__dyld_private` a true TU-local symbol, matching ld64.
To make this work, make the `const char*` StringRefZ ctor to correctly
set `size` (without this, writing the string table crashed when calling
getName() on the __dyld_private symbol).

Mention in CommonSymbol's comment that common symbols are now disabled
by default in clang.

Mention in -keep_private_externs's HelpText that the flag only has an
effect with `-r` (which we don't implement yet -- so this patch here
doesn't regress any behavior around -r + -keep_private_externs)). ld64
doesn't explicitly document it, but the commit text of
http://reviews.llvm.org/rL216146 does, and ld64's
OutputFile::buildSymbolTable() checks `_options.outputKind() ==
Options::kObjectFile` before calling `_options.keepPrivateExterns()`
(the only reference to that function).

Fixes PR48536.

Differential Revision: https://reviews.llvm.org/D93609
2020-12-21 21:23:33 -05:00
Jez Ng 811444d7a1 [lld-macho] Add support for weak references
Weak references need not necessarily be satisfied at runtime (but they must
still be satisfied at link time). So symbol resolution still works as per usual,
but we now pass around a flag -- ultimately emitting it in the bind table -- to
indicate if a given dylib symbol is a weak reference.

ld64's behavior for symbols that have both weak and strong references is
a bit bizarre. For non-function symbols, it will emit a weak import. For
function symbols (those referenced by BRANCH relocs), it will emit a
regular import. I'm not sure what value there is in that behavior, and
since emulating it will make our implementation more complex, I've
decided to treat regular weakrefs like function symbol ones for now.

Fixes PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93369
2020-12-17 08:49:16 -05:00
Nico Weber 601f0fb846 [lld/mac] Set ordinal on dynamic undefined symbols in symbol table
This lets `nm -m` print "(from libfoo)" in its output, which is more
accessible than dumping the bind table.

See https://reviews.llvm.org/D57190#2455761 for the somewhat
surprising `AltEntry` that appears in symtab.s.

Differential Revision: https://reviews.llvm.org/D93318
2020-12-15 14:39:36 -05:00
Jez Ng c7dbaec396 [lld-macho] Add isCodeSection()
This is the same logic that ld64 uses to determine which sections
contain functions. This was added so that we could determine which
STABS entries should be N_FUN.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D92430
2020-12-01 15:05:21 -08:00
Jez Ng 78f6498cdc [lld-macho] Flesh out STABS implementation
This addresses a lot of the comments in {D89257}. Ideally it'd have been
done in the same diff, but the commits in between make that difficult.

This diff implements:
* N_GSYM and N_STSYM, the STABS for global and static symbols
* Has the STABS reflect the section IDs of their referent symbols
* Ensures we don't fail when encountering absolute symbols or files with
  no debug info
* Sorts STABS symbols by file to minimize the number of N_OSO entries

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D92366
2020-12-01 15:05:21 -08:00
Jez Ng b768d57b36 [lld-macho] Add archive name and file modtime to STABS output
We should also set the modtime when running LTO. That will be done in a
future diff, together with support for the `-object_path_lto` flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D91318
2020-12-01 15:05:21 -08:00
Jez Ng 51629abce0 [lld-macho] Emit local symbols in symtab; record metadata in LC_DYSYMTAB
Symbols of the same type must be laid out contiguously: following ld64's
lead, we choose to emit all local symbols first, then external symbols,
and finally undefined symbols. For each symbol type, the LC_DYSYMTAB
load command will record the range (start index and total number) of
those symbols in the symbol table.

This work was motivated by the fact that LLDB won't search for debug
info if LC_DYSYMTAB says there are no local symbols (since STABS symbols
are all local symbols). With this change, LLDB is now able to display
the source lines at a given breakpoint when debugging our binaries.

Some tests had to be updated due to local symbol names now appearing in
`llvm-objdump`'s output.

Reviewed By: #lld-macho, smeenai, clayborg

Differential Revision: https://reviews.llvm.org/D89285
2020-12-01 15:05:20 -08:00
Jez Ng 3fcb0eeb15 [lld-macho] Emit STABS symbols for debugging, and drop debug sections
Debug sections contain a large amount of data. In order not to bloat the size
of the final binary, we remove them and instead emit STABS symbols for
`dsymutil` and the debugger to locate their contents in the object files.

With this diff, `dsymutil` is able to locate the debug info. However, we need
a few more features before `lldb` is able to work well with our binaries --
e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols,
emitting `LC_UUID`, and more. Those will be handled in follow-up diffs.

Note also that the STABS we emit differ slightly from what ld64 does. First, we
emit the path to the source file as one `N_SO` symbol instead of two. (`ld64`
emits one `N_SO` for the dirname and one of the basename.) Second, we do not
emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions,
because the `N_FUN` STABS already serve that purpose. @clayborg recommended
these changes based on his knowledge of what the debugging tools look for.

Additionally, this current implementation doesn't accurately reflect the size
of function symbols. It uses the size of their containing sectioins as a proxy,
but that is only accurate if `.subsections_with_symbols` is set, and if there
isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two
options to solve this:

1. We can split up subsections by symbol even if `.subsections_with_symbols`
   is not set, but include constraints to ensure those subsections retain
   their order in the final output. This is `ld64`'s approach.
2. We could just add a `size` field to our `Symbol` class. This seems simpler,
   and I'm more inclined toward it, but I'm not sure if there are use cases
   that it doesn't handle well. As such I'm punting on the decision for now.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D89257
2020-12-01 15:05:20 -08:00
Jez Ng 62a3f0c984 [lld-macho] Support absolute symbols
They operate like Defined symbols but with no associated InputSection.

Note that `ld64` seems to treat the weak definition flag like a no-op for
absolute symbols, so I have replicated that behavior.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D87909
2020-09-25 11:28:35 -07:00
Jez Ng c7c9776f77 [lld-macho] Allow the entry symbol to be dynamically bound
Apparently this is used in real programs. I've handled this by reusing
the logic we already have for branch (function call) relocations.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D87852
2020-09-25 11:28:33 -07:00
Jez Ng e4e673e75a [lld-macho] Implement support for PIC
* Implement rebase opcodes. Rebase opcodes tell dyld where absolute
  addresses have been encoded in the binary. If the binary is not loaded
  at its preferred address, dyld has to rebase these addresses by adding
  an offset to them.
* Support `-pie` and use it to test rebase opcodes.

This is necessary for absolute address references in dylibs, bundles etc
to work.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D87199
2020-09-25 11:28:31 -07:00
Jez Ng 5d26bd3b75 [lld-macho] Emit indirect symbol table
Makes it a little easier to read objdump's disassembly.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D87178
2020-09-23 19:26:40 -07:00
Jez Ng abd70fb398 [lld-macho] Export trie addresses should be relative to the image base
We didn't notice this earlier this we were only testing the export trie
encoded in a dylib, whose image base starts at zero. But a regular
executable contains `__PAGEZERO`, which means it has a non-zero image
base. This bug was discovered after attempting to run some programs that
performed `dlopen` on an executable.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D87780
2020-09-20 20:43:15 -07:00
Jez Ng 0a7e56f74c [lld-macho] Mark weak symbols in symbol table
Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86908
2020-09-20 20:43:14 -07:00
Jez Ng ae8fa1d8a6 [lld-macho][NFC] Define isHidden() in LinkEditSection
Since it's always true

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86749
2020-08-27 17:44:18 -07:00
Jez Ng ccbacdded4 [lld-macho] Weak locals should be relaxed too
Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86746
2020-08-27 17:44:17 -07:00
Jez Ng 2a38dba7dd [lld-macho] Emit binding opcodes for defined symbols that override weak dysyms
These opcodes tell dyld to coalesce the overridden weak dysyms to this
particular symbol definition.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86575
2020-08-27 17:44:16 -07:00
Jez Ng 3da2130e45 [lld-macho] Emit the right header flags for weak bindings/symbols
Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86574
2020-08-27 17:44:16 -07:00
Jez Ng e263287c79 [lld-macho] Implement weak binding for branch relocations
Since there is no "weak lazy" lookup, function calls to weak symbols are
always non-lazily bound. We emit both regular non-lazy bindings as well
as weak bindings, in order that the weak bindings may overwrite the
non-lazy bindings if an appropriate symbol is found at runtime. However,
the bound addresses will still be written (non-lazily) into the
LazyPointerSection.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D86573
2020-08-27 17:44:15 -07:00
Jez Ng cbe27316ef [lld-macho] Implement weak bindings for GOT/TLV
Previously, we were only emitting regular bindings to weak
dynamic symbols; this diff adds support for the weak bindings too, which
can overwrite the regular bindings at runtime. We also treat weak
defined global symbols similarly -- since they can also be interposed at
runtime, they need to be treated as potentially dynamic symbols.

Note that weak bindings differ from regular bindings in that they do not
specify the dylib to do the lookup in (i.e. weak symbol lookup happens
in a flat namespace.)

Differential Revision: https://reviews.llvm.org/D86572
2020-08-26 19:21:09 -07:00
Jez Ng b84d72d893 [lld-macho][NFC] Handle GOT bindings and regular bindings more uniformly
Previously, the BindingEntry struct could only store bindings to offsets
within InputSections. Since the GOTSection and TLVPointerSections are
OutputSections, I handled those in a separate code path. However, this
makes it awkward to support weak bindings properly without code
duplication. This diff allows BindingEntries to point directly to
OutputSections, simplifying the upcoming weak binding implementation.

Along the way, I also converted a bunch of functions taking references
to symbols to take pointers instead. Given how much casting we do for
Symbol (especially in the upcoming weak binding diffs), it's cleaner
this way.

Differential Revision: https://reviews.llvm.org/D86571
2020-08-26 19:21:04 -07:00
Jez Ng 180ad756ec [lld-macho] Support larger dylib symbol ordinals in bindings
Do folks care if we don't have a test for this? Creating 16
dylibs to trigger this straightforward code path seems a little tedious

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D85467
2020-08-12 19:50:25 -07:00
Jez Ng 3c9100fb78 [lld-macho] Support dynamic linking of thread-locals
References to symbols in dylibs work very similarly regardless of
whether the symbol is a TLV. The main difference is that we have a
separate `__thread_ptrs` section that acts as the GOT for these
thread-locals.

We can identify thread-locals in dylibs by a flag in their export trie
entries, and we cross-check it with the relocations that refer to them
to ensure that we are not using a GOT relocation to reference a
thread-local (or vice versa).

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D85081
2020-08-12 19:50:09 -07:00
Jez Ng ca85e37338 [lld-macho] Support static linking of thread-locals
Note: What ELF refers to as "TLS", Mach-O seems to refer to as "TLV", i.e.
thread-local variables.

This diff implements support for TLV relocations that reference defined
symbols. On x86_64, TLV relocations are always used with movq opcodes, so for
defined TLVs, we don't need to create a synthetic section to store the
addresses of the symbols -- we can just convert the `movq` to a `leaq`.

One notable quirk of Mach-O's TLVs is that absolute-address relocations
inside TLV-defining sections behave differently -- their addresses are
no longer absolute, but relative to the start of the target section.
(AFAICT, RIP-relative relocations are not allowed in these sections.)

Reviewed By: #lld-macho, compnerd, smeenai

Differential Revision: https://reviews.llvm.org/D85080
2020-08-07 11:04:52 -07:00
Jez Ng 98210796e1 [lld-macho] Make __LINKEDIT sections contiguous
codesign (or more specifically libstuff) checks that each section in
__LINKEDIT ends where the next one starts -- no gaps are permitted. This
diff achieves it by aligning every section's start and end points to
WordSize.

Remarks: ld64 appears to satisfy the constraint by adding padding bytes
when generating the __LINKEDIT data, e.g. by emitting BIND_OPCODE_DONE
(which is a 0x0 byte) repeatedly. I think the approach this diff takes
is a bit more elegant, but I'm not sure if it's too restrictive. In
particular, it assumes padding always uses the zero byte. But we can
revisit this later.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D84718
2020-07-30 14:30:07 -07:00
Jez Ng 22e6648a18 [lld-macho] Implement -headerpad
Tools like `install_name_tool` and `codesign` may modify the Mach-O
header and increase its size. The linker has to provide padding to make this
possible. This diff does that, plus sets its default value to 32 bytes (which
is what ld64 does).

Unlike ld64, however, we lay out our sections *exactly* `-headerpad` bytes from
the header, whereas ld64 just treats the padding requirement as a lower bound.
ld64 actually starts laying out the non-header sections in the __TEXT segment
from the end of the (page-aligned) segment rather than the front, so its
binaries typically have more than `-headerpad` bytes of actual padding.
We should consider implementing the same alignment behavior.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D84714
2020-07-30 14:29:31 -07:00
Jez Ng 31d5885842 [lld-macho] Partial support for weak definitions
This diff adds support for weak definitions, though it doesn't handle weak
symbols in dylibs quite correctly -- we need to emit binding opcodes for them
in the weak binding section rather than the lazy binding section.

What *is* covered in this diff:

1. Reading the weak flag from symbol table / export trie, and writing it to the
   export trie
2. Refining the symbol table's rules for choosing one symbol definition over
   another. Wrote a few dozen test cases to make sure we were matching ld64's
   behavior.

We can now link basic C++ programs.

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D83532
2020-07-24 15:55:25 -07:00
Jez Ng 53eb7fda51 [lld-macho] Support binding dysyms to any section
Previously, we only supported binding dysyms to the GOT. This
diff adds support for binding them to any arbitrary section. C++
programs appear to use this, I believe for vtables and type_info.

This diff also makes our bind opcode encoding a bit smarter -- we now
encode just the differences between bindings, which will make things
more compact.

I was initially concerned about the performance overhead of iterating
over these relocations, but it turns out that the number of such
relocations is small. A quick analysis of my llvm-project build
directory showed that < 1.3% out of ~7M relocations are RELOC_UNSIGNED
bindings to symbols (including both dynamic and static symbols).

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D83103
2020-07-02 21:21:01 -07:00
Fangrui Song 40bc99538c [lld-macho] Remove using namespace llvm::MachO
llvm/include/llvm/TextAPI/MachO/ inappropriately uses the llvm::MachO namespace (this is for BinaryFormat and Object) and causes conflicts in some MSVC builds::

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/65324/steps/stage%201%20build/logs/stdio

Removing `using namespace llvm::MachO` should decrease name collisions.
2020-06-24 13:46:13 -07:00
Fangrui Song 18db086dca [lld-macho] Use namespace qualifiers (macho::) instead of `namespace lld { namespace macho {`
See https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions
Similar to D79982.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D82432
2020-06-24 12:34:06 -07:00
Jez Ng 3646ee503d [lld-macho] Refactor segment/section creation, sorting, and merging
Summary:
There were a few issues with the previous setup:

1. The section sorting comparator used a declarative map of section names to
  determine the correct order, but it turns out we need to match on more than
  just names -- in particular, an upcoming diff will sort based on whether the
  S_ZERO_FILL flag is set. This diff changes the sorter to a more imperative but
  flexible form.

2. We were sorting OutputSections stored in a MapVector, which left the
  MapVector in an inconsistent state -- the wrong keys map to the wrong values!
  In practice, we weren't doing key lookups (only container iteration) after the
  sort, so this was fine, but it was still a dubious state of affairs. This diff
  copies the OutputSections to a vector before sorting them.

3. We were adding unneeded OutputSections to OutputSegments and then filtering
  them out later, which meant that we had to remember whether an OutputSegment
  was in a pre- or post-filtered state. This diff only adds the sections to the
  segments if they are needed.

In addition to those major changes, two minor ones worth noting:

1. I renamed all OutputSection variable names to `osec`, to parallel `isec`.
  Previously we were using some inconsistent combination of `osec`, `os`, and
  `section`.

2. I added a check (and a test) for InputSections with names that clashed with
  those of our synthetic OutputSections.

Reviewers: #lld-macho

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81887
2020-06-21 17:13:59 -07:00
Jez Ng 74871cdad7 [lld-macho] Ensure __bss sections we output have file offset of zero
Summary:
llvm-mc emits `__bss` sections with an offset of zero, but we weren't expecting
that in our input, so we were copying non-zero data from the start of the file and
putting it in `__bss`, with obviously undesirable runtime results. (It appears that
the kernel will copy those nonzero bytes as long as the offset is nonzero, regardless
of whether S_ZERO_FILL is set.)

I debated on whether to make a special ZeroFillSection -- separate from a
regular InputSection -- but it seemed like too much work for now. But I'm happy
to refactor if anyone feels strongly about having it as a separate class.

Depends on D80857.

Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee

Reviewed By: smeenai

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80859
2020-06-17 20:41:28 -07:00