Commit Graph

92 Commits

Author SHA1 Message Date
Jez Ng 3c19b4f34d [lld-macho] Skip over symbols in un-parsed debug info sections
clang appears to emit symbols in `__debug_aranges`, at least
for arm64... in the examples I've seen, it doesn't seem like those
symbols are referenced outside of `__DWARF`, so I think they're safe to
ignore. But hopefully @clayborg can confirm.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D98073
2021-03-05 17:24:32 -05:00
Jez Ng fc011b5eb1 [lld-macho] Replace debug-info-related assert with FIXME
We'll need to properly handle object files with multiple source inputs
eventually, but remove the assert for now so we can successfully emit binaries
for testing.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D98067
2021-03-05 17:24:31 -05:00
Jez Ng 0d4dadc64c [lld-macho] Include install name in error messages for dylibs from TBDs
Since multiple dylibs can be defined in one TBD, this is
necessary to avoid confusion.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97905
2021-03-04 14:36:49 -05:00
Jez Ng 55a32812fa [lld-macho] Filter TAPI re-exports by target
Previously, we were loading re-exports without checking whether
they were compatible with our target. Prior to {D97209}, it meant that
we were defining dylib symbols that were invalid -- usually a silent
failure unless our binary actually used them. D97209 exposed this as an
explicit error.

Along the way, I've extended our TAPI compatibility check to cover the
platform as well, instead of just checking the arch. To this end, I've
replaced MachO::Architecture with MachO::Target in our Config struct.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97867
2021-03-04 14:36:47 -05:00
Jez Ng 8601be809e [lld-macho] Fix & fold reexport-nested-libs test into stub-link.s
The reexport-nested-libs test added in D97438 was a bit wonky.

First, it was linking against libReexportSystem.tbd which targets the
iOS simulator, and which in turn attempted to re-export the iOS
simulator's libSystem. However, due to the way `-syslibroot` works, it
was actually re-exporting the macOS libSystem.

As a result, the test was not actually able to resolve the symbols in
the desired libSystem. I'm guessing that @oontvoo was confused by this
and therefore included those symbols in libReexportSystem.tbd itself.
But this means that the test wasn't actually testing the resolution of
re-exported symbols (though it did at least verify that the re-exported
libraries could be located).

After some consideration, I figured that stub-link.s could be extended
to cover what reexport-nested-libs.s was attempting to do. The test
targets macOS, so we only have one `-syslibroot` and no chance of
confusion.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D97866
2021-03-04 14:36:46 -05:00
Jez Ng 5d9aafc09a [lld-macho] Bind re-exported symbols directly to implicitly-linked umbrellas
Suppose we are linking against libFoo, which re-exports the
implicitly-bound libSystem, which in turn re-exports some
non-explicitly-bound library like `/usr/lib/system/libsystem_c.dylib`.
Then any bindings we have to a symbol in libsystem_c should use
libSystem (and not libFoo) as the umbrella library.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D97865
2021-03-04 14:36:44 -05:00
Jez Ng b63919e180 [lld-macho] Require -arch and -platform_version to always be specified
We previously defaulted to x86_64 and an unknown platform, which was fine when
we only supported one arch and did no platform checks, but that will no longer
be true going ahead. Therefore, we should require those flags to be specified
whenever the linker is invoked.

Note that LLD-ELF and ld64 both infer the arch from their input object files,
but the usefulness of that is questionable since clang will always specify these
flags, and most of the time `lld` will be invoked via clang.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97799
2021-03-03 15:52:10 -05:00
Greg McGary 4af1522a85 [lld-macho] Rework length check when opening input files
This reverts diff D97610 (commit 0223ab035c) and adds a one-line fix to verify that a `MemoryBufferRef` has sufficient length before reading a 4-byte magic number.

Differential Revision: https://reviews.llvm.org/D97757
2021-03-02 13:00:57 -08:00
Vy Nguyen 9a2e2de15f [lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD.
Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature.
The concrete use case is
       `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports
               /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports
                    /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation

The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree.
This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it.

The right thing should be:
 - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD.
 - When loading from an actual dylib, no additional TBD documents need to be examined.
 - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file

Differential Revision: https://reviews.llvm.org/D97438
2021-03-02 12:14:31 -05:00
Nico Weber 8174f33dc9 [lld/mac] Add support for -flat_namespace
-flat_namespace makes lld emit binaries that use name lookup that's more in
line with other POSIX systems: Instead of looking up symbols as (dylib,name)
pairs by dyld, they're instead looked up just by name.

-flat_namespace has three effects:

1. MH_TWOLEVEL and MH_NNOUNDEFS are no longer set in the Mach-O header
2. All symbols use BIND_SPECIAL_DYLIB_FLAT_LOOKUP as ordinal
3. When a dylib is added to the link, its dependent dylibs are also added,
   so that lld can verify that no undefined symbols remain at the end of
   a link with -flat_namespace. These transitive dylibs are added for symbol
   resolution, but they are not emitted in LC_LOAD_COMMANDs.

-undefined with -flat_namespace still isn't implemented. Before this change,
it was impossible to hit that combination because -flat_namespace caused a
diagnostic. Now that it no longer does, emit a dedicated temporary diagnostic
when both flags are used.

Differential Revision: https://reviews.llvm.org/D97641
2021-03-01 15:25:10 -05:00
Jez Ng f083f652c3 [lld-macho][nfc] Remove TODO regarding addends
There was initially some concern around the correct handling of pcrel
section relocations with r_length != 2. But it looks like there are no such
relocations in practice -- x86_64's pcrel section relocs all have r_length == 2,
and ARM64 doesn't even have pcrel section relocs. So we can replace the TODO
with an assert.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97576
2021-03-01 12:30:08 -05:00
Greg McGary 0223ab035c [lld-macho] check minimum header length when opening linkable input files
Bifurcate the `readFile()` API into ...
* `readRawFile()` which performs no checks, and
* `readLinkableFile()` which enforces minimum length of 20 bytes, same as ld64

There are no new tests because tweaks to existing tests are sufficient.

Differential Revision: https://reviews.llvm.org/D97610
2021-02-27 14:41:40 -08:00
Jez Ng 82b3da6f6f [lld-macho] Extract embedded addends for arm64 UNSIGNED relocations
On arm64, UNSIGNED relocs are the only ones that use embedded addends
instead of the ADDEND relocation.

Also ensure that the addend works when UNSIGNED is part of a SUBTRACTOR
pair.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D97105
2021-02-27 12:31:34 -05:00
Jez Ng 541390131e [lld-macho] Don't emit rebase opcodes for subtractor minuend relocs
Also add a few asserts to verify that we are indeed handling an
UNSIGNED relocation as the minued. I haven't made it an actual
user-facing error since I don't think llvm-mc is capable of generating
SUBTRACTOR relocations without an associated UNSIGNED.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D97103
2021-02-27 12:31:34 -05:00
Jez Ng 84579fc24f [lld-macho] Basic support for linkage and visibility attributes in LTO
When parsing bitcode, convert LTO Symbols to LLD Symbols in order to perform
resolution. The "winning" symbol will then be marked as Prevailing at LTO
compilation time. This is similar to what the other LLD ports do.

This change allows us to handle `linkonce` symbols correctly, and to deal with
duplicate bitcode symbols gracefully. Previously, both scenarios would result in
an assertion failure inside the LTO code, complaining that multiple Prevailing
definitions are not allowed.

While at it, I also added basic logic around visibility. We don't do anything
useful with it yet, but we do check that its value is valid. LLD-ELF appears to
use it only to set FinalDefinitionInLinkageUnit for LTO, which I think is just a
performance optimization.

From my local experimentation, the linker itself doesn't seem to do anything
differently when encountering linkonce / linkonce_odr / weak / weak_odr. So I've
only written a test for one of them. LLD-ELF has more, but they seem to mostly
be testing the intermediate bitcode output of their LTO backend...? I'm far from
an expert here though, so I might very well be missing things.

Reviewed By: #lld-macho, MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D94342
2021-02-25 13:27:40 -05:00
Jez Ng 4752cdc9a2 [lld-macho] Check for arch compatibility when loading ObjFiles and TBDs
The silent failures had confused me a few times.

I haven't added a similar check for platform yet as we don't yet have logic to
infer the platform automatically, and so adding that check would require
updating dozens of test files.

Reviewed By: #lld-macho, thakis, alexshap

Differential Revision: https://reviews.llvm.org/D97209
2021-02-23 22:02:38 -05:00
Jez Ng 5e851733c5 [lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs
I've adjusted the RelocAttrBits to better fit the semantics of
the relocations. In particular:

1. *_UNSIGNED relocations are no longer marked with the `TLV` bit, even
   though they can occur within TLV sections. Instead the `TLV` bit is
   reserved for relocations that can reference thread-local symbols, and
   *_UNSIGNED relocations have their own `UNSIGNED` bit. The previous
   implementation caused TLV and regular UNSIGNED semantics to be
   conflated, resulting in rebase opcodes being incorrectly emitted for TLV
   relocations.

2. I've added a new `POINTER` bit to denote non-relaxable GOT
   relocations. This distinction isn't important on x86 -- the GOT
   relocations there are either relaxable or non-relaxable loads -- but
   arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent
   symbol is in (regardless of whether the symbol ends up in the GOT). This
   relocation must reference a GOT symbol (so must have the `GOT` bit set)
   but isn't itself relaxable (so must not have the `LOAD` bit). The
   `POINTER` bit is used for relocations that *must* reference a GOT
   slot.

3. A similar situation occurs for TLV relocations.

4. ld64 supports both a pcrel and an absolute version of
   ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version
   are pretty weird -- it results in the value of the GOT slot being
   written, rather than the address. (That means a reference to a
   dynamically-bound slot will result in zeroes being written.) The
   programs I've tried linking don't use this form of the relocation, so
   I've dropped our partial support for it by removing the relevant
   RelocAttrBits.

Reviewed By: alexshap

Differential Revision: https://reviews.llvm.org/D97031
2021-02-23 22:02:38 -05:00
Jez Ng e5d780e049 [lld-macho] Use full input file name in invalid relocation error message
Just something I noticed while debugging arm relocations...

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D97078
2021-02-23 22:02:38 -05:00
Vy Nguyen 5a856f5b44 Reland [lld-macho]Implement bundle_loader
Reland 1a0afcf518
  https://reviews.llvm.org/D95913

New change: fix UB bug caused by copying empty path/name. (since the executable does not have a name)
2021-02-22 14:05:12 -05:00
Vitaly Buka c17547df44 Revert "Implement -bundle_loader"
D95913 passes null pointer into memcpy

This reverts commit 1a0afcf518.
2021-02-19 17:40:07 -08:00
Vy Nguyen 1a0afcf518 Implement -bundle_loader
Differential Revision: https://reviews.llvm.org/D95913

Usage: -bundle_loader <executable>
This option specifies the executable that will load the build output file being linked.
When building a bundle, users can use the --bundle_loader  to specify an executable
that contains symbols referenced, but not implemented in the bundle.
2021-02-18 16:11:37 -05:00
Greg McGary 87104faac4 [lld-macho] Add ARM64 target arch
This is an initial base commit for ARM64 target arch support. I don't represent that it complete or bug-free, but wish to put it out for review now that some basic things like branch target & load/store address relocs are working.

I can add more tests to this base commit, or add them in follow-up commits.

It is not entirely clear whether I use the "ARM64" (Apple) or "AArch64" (non-Apple) naming convention. Guidance is appreciated.

Differential Revision: https://reviews.llvm.org/D88629
2021-02-08 18:14:07 -07:00
Jez Ng f843bb82c0 [lld-macho] Force-loading should share code path with regular archive loads
This extends {D92539} to work even when we are loading archive
members via `-force_load`. I uncovered this issue while trying to
force-load archives containing bitcode -- we were segfaulting.

In addition to fixing the `-force_load` case, this diff also addresses
the behavior of `-ObjC` when LTO bitcode is involved -- we need to
force-load those archive members if they contain ObjC categories.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D95265
2021-02-03 13:43:47 -05:00
Jez Ng 163dcd8513 [lld-macho] Associate each Symbol with an InputFile
This makes our error messages more informative. But the bigger motivation is for
LTO symbol resolution, which will be in an upcoming diff. The changes in this
one are largely mechanical.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D94316
2021-02-03 13:43:47 -05:00
Greg McGary 3a9d2f1488 [lld-macho][NFC] refactor relocation handling
Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes.

Many cleanups

Differential Revision: https://reviews.llvm.org/D95121
2021-02-02 10:54:53 -07:00
Jez Ng e98b441a09 [lld-macho] Remove unnecessary llvm:: namespace prefixes 2021-01-09 12:44:35 -05:00
Nico Weber 13f439a187 [lld/mac] Implement support for private extern symbols
Private extern symbols are used for things scoped to the linkage unit.
They cause duplicate symbol errors (so they're in the symbol table,
unlike TU-scoped truly local symbols), but they don't make it into the
export trie. They are created e.g. by compiling with
-fvisibility=hidden.

If two weak symbols have differing privateness, the combined symbol is
non-private external. (Example: inline functions and some TUs that
include the header defining it were built with
-fvisibility-inlines-hidden and some weren't).

A weak private external symbol implicitly has its "weak" dropped and
behaves like a regular strong private external symbol: Weak is an export
trie concept, and private symbols are not in the export trie.

If a weak and a strong symbol have different privateness, the strong
symbol wins.

If two common symbols have differing privateness, the larger symbol
wins. If they have the same size, the privateness of the symbol seen
later during the link wins (!) -- this is a bit lame, but it matches
ld64 and this behavior takes 2 lines less to implement than the less
surprising "result is non-private external), so match ld64.
(Example: `int a` in two .c files, both built with -fcommon,
one built with -fvisibility=hidden and one without.)

This also makes `__dyld_private` a true TU-local symbol, matching ld64.
To make this work, make the `const char*` StringRefZ ctor to correctly
set `size` (without this, writing the string table crashed when calling
getName() on the __dyld_private symbol).

Mention in CommonSymbol's comment that common symbols are now disabled
by default in clang.

Mention in -keep_private_externs's HelpText that the flag only has an
effect with `-r` (which we don't implement yet -- so this patch here
doesn't regress any behavior around -r + -keep_private_externs)). ld64
doesn't explicitly document it, but the commit text of
http://reviews.llvm.org/rL216146 does, and ld64's
OutputFile::buildSymbolTable() checks `_options.outputKind() ==
Options::kObjectFile` before calling `_options.keepPrivateExterns()`
(the only reference to that function).

Fixes PR48536.

Differential Revision: https://reviews.llvm.org/D93609
2020-12-21 21:23:33 -05:00
Greg McGary d4ec3346b1 [lld-macho][nfc] Refactor to accommodate paired relocs
This is a refactor to pave the way for supporting paired-ADDEND for ARM64. The only paired reloc type for X86_64 is SUBTRACTOR. In a later diff, I will add SUBTRACTOR for both X86_64 and ARM64.

* s/`getImplicitAddend`/`getAddend`/ because it handles all forms of addend: implicit, explicit, paired.
* add predicate `bool isPairedReloc()`
* check range of `relInfo.r_symbolnum` is internal, unrelated to user-input, so use `assert()`, not `error()`
* minor cleanups & rearrangements in `InputFile::parseRelocations()`

Differential Revision: https://reviews.llvm.org/D90614
2020-12-17 20:21:41 -08:00
Jez Ng 4c8276cdc1 [lld-macho] Use LC_LOAD_WEAK_DYLIB for dylibs with only weakrefs
Note that dylibs without *any* refs will still be loaded in the usual
(strong) fashion.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93435
2020-12-17 08:49:17 -05:00
Jez Ng 811444d7a1 [lld-macho] Add support for weak references
Weak references need not necessarily be satisfied at runtime (but they must
still be satisfied at link time). So symbol resolution still works as per usual,
but we now pass around a flag -- ultimately emitting it in the bind table -- to
indicate if a given dylib symbol is a weak reference.

ld64's behavior for symbols that have both weak and strong references is
a bit bizarre. For non-function symbols, it will emit a weak import. For
function symbols (those referenced by BRANCH relocs), it will emit a
regular import. I'm not sure what value there is in that behavior, and
since emulating it will make our implementation more complex, I've
decided to treat regular weakrefs like function symbol ones for now.

Fixes PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93369
2020-12-17 08:49:16 -05:00
Nico Weber ec88746a05 [lld/mac] fill in current and compatibility version for LC_LOAD_(WEAK_)DYLIB
Not sure if anything actually depends on this, but it makes `otool -L`
output look nicer.

Differential Revision: https://reviews.llvm.org/D93332
2020-12-15 19:34:59 -05:00
Jez Ng 3aa8e071dd [lld-macho] Add implicit dylib support for frameworks
{D93000} applied to frameworks. Partial fix for PR48511.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D93277
2020-12-15 15:58:26 -05:00
Jez Ng 544148ae70 [lld-macho] -weak_{library,framework} should always take priority
We were not setting forceWeakImport for file paths given by
`-weak_library` if we had already loaded the file. This diff fixes that
by having `loadDylib` return a cached DylibFile instance even if we have
already loaded that file.

We still avoid emitting multiple LC_LOAD_DYLIBs, but we achieve this by
making inputFiles a SetVector instead of relying on the `loadedDylibs`
cache.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D93255
2020-12-15 15:58:26 -05:00
Jez Ng 76c36c11a9 [lld-macho] Don't load dylibs more than once
Also remove `DylibFile::reexported` since it's unused.

Fixes llvm.org/PR48393.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D93001
2020-12-10 15:57:52 -08:00
Jez Ng 6a348f6158 [lld-macho] Implement `-no_implicit_dylibs`
Dylibs that are "public" -- i.e. top-level system libraries -- are considered
implicitly linked when another library re-exports them. That is, we should load
them & bind directly to their symbols instead of via their re-exporting
umbrella library. This diff implements that behavior by default, as well as an
opt-out flag.

In theory, this is just a performance optimization, but in practice it seems
that it's needed for correctness.

Fixes llvm.org/PR48395.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D93000
2020-12-10 15:57:52 -08:00
Jez Ng 863f7a745e [lld-macho] Don't attempt to emit rebase opcodes for debug sections
This was causing a crash as we were attempting to look up the
nonexistent parent OutputSection of the debug sections. We didn't detect
it earlier because there was no test for PIEs with debug info (PIEs
require us to emit rebases for X86_64_RELOC_UNSIGNED).

This diff filters out the debug sections while loading the ObjFiles. In
addition to fixing the above problem, it also lets us avoid doing
redundant work -- we no longer parse / apply relocations / attempt to
emit dyld opcodes for these sections that we don't emit.

Fixes llvm.org/PR48392.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92904
2020-12-10 15:57:51 -08:00
Jez Ng 78976bf3da [lld-macho] Support parsing of bitcode within archives
Also error out if we find anything other than an object or bitcode file
in the archive.

Note that we were previously inserting the symbols and sections of the
unpacked ObjFile into the containing ArchiveFile. This was actually
unnecessary -- we can just insert the ObjectFile (or BitcodeFile) into
the `inputFiles` vector. This is the approach taken by LLD-ELF.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92539
2020-12-08 10:34:32 -08:00
Jez Ng 7b007ac080 [lld-macho][nfc] Move some methods from InputFile to ObjFile
Additionally:
1. Move the helper functions in InputSection.h below the definition of
   `InputSection`, so the important stuff is on top
2. Remove unnecessary `explicit`

Reviewed By: #lld-macho, compnerd

Differential Revision: https://reviews.llvm.org/D92453
2020-12-08 10:34:32 -08:00
Nico Weber 16b1f6e385 [mac/lld] Add support for the LC_LINKER_OPTION load command in o files
clang puts `-framework CoreFoundation` in this load command for files
that use @available / __builtin_available. Without support for this,
binaries that don't explicitly link to CoreFoundation fail to link.

Differential Revision: https://reviews.llvm.org/D92624
2020-12-04 08:46:53 -05:00
Nico Weber 7cb0a373d1 [mac/lld] Implement -t
Goes well with `-why_load` to get an idea of load order.

Differential Revision: https://reviews.llvm.org/D92583
2020-12-03 16:02:38 -05:00
Nico Weber 3422f3cc6e Reland "[mac/lld] Implement -why_load".
The problem was that `sym` became replaced in the call
to make<ObjFile> and referring to it afer that read memory that now
stored a different kind of symbol (a Defined instead of a LazySymbol).
Since this happens only once per archive, just copy the symbol to the
stack before make<ObjFile> and read the copy instead.

Originally reviewed at https://reviews.llvm.org/D92496
2020-12-03 08:35:12 -05:00
Nico Weber ea0029f55d Revert "[mac/lld] Implement -why_load"
This reverts commit 542d3b609d.
Seems to break check-lld. Reverting while I take a look.
2020-12-02 18:57:46 -05:00
Nico Weber 542d3b609d [mac/lld] Implement -why_load
This is useful for debugging why lld loads .o files it shouldn't load.
It's also useful for users of lld -- I've used ld64's version of this a
few times.

Differential Revision: https://reviews.llvm.org/D92496
2020-12-02 18:33:12 -05:00
Nico Weber ca634393fc [mac/lld] Make --reproduce work with thin archives
See http://reviews.llvm.org/rL268229 and
http://reviews.llvm.org/rL313832 which did the same for the ELF port.

Differential Revision: https://reviews.llvm.org/D92456
2020-12-02 09:48:31 -05:00
Nico Weber b2f00f24a3 [mac/lld] Include archive name in diagnostics
Also, for .o files, include full path as given on link command line.

Before:
    lld: error: undefined symbol [...], referenced from sandbox_logging.o

After:
    lld: error: undefined symbol [...], referenced from libseatbelt.a(sandbox_logging.o)

Move archiveName up to InputFile so we can consistently use toString()
to print InputFiles in diags, and pass it to the ObjFile ctor. This
matches the ELF and COFF ports.

Differential Revision: https://reviews.llvm.org/D92437
2020-12-01 23:00:25 -05:00
Nico Weber 07ab597bb0 [lld/mac] Fix issues around thin archives
- most importantly, fix a use-after-free when using thin archives,
  by putting the archive unique_ptr to the arena allocator. This
  ports D65565 to MachO

- correctly demangle symbol namess from archives in diagnostics

- add a test for thin archives -- it finds this UaF, but only when
  running it under asan (it also finds the demangling fix)

- make forceLoadArchive() use addFile() with a bool to have the archive
  loading code in fewer places. no behavior change; matches COFF port a
  bit better

Differential Revision: https://reviews.llvm.org/D92360
2020-12-01 18:48:29 -05:00
Jez Ng 78f6498cdc [lld-macho] Flesh out STABS implementation
This addresses a lot of the comments in {D89257}. Ideally it'd have been
done in the same diff, but the commits in between make that difficult.

This diff implements:
* N_GSYM and N_STSYM, the STABS for global and static symbols
* Has the STABS reflect the section IDs of their referent symbols
* Ensures we don't fail when encountering absolute symbols or files with
  no debug info
* Sorts STABS symbols by file to minimize the number of N_OSO entries

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D92366
2020-12-01 15:05:21 -08:00
Jez Ng b768d57b36 [lld-macho] Add archive name and file modtime to STABS output
We should also set the modtime when running LTO. That will be done in a
future diff, together with support for the `-object_path_lto` flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D91318
2020-12-01 15:05:21 -08:00
Jez Ng 3fcb0eeb15 [lld-macho] Emit STABS symbols for debugging, and drop debug sections
Debug sections contain a large amount of data. In order not to bloat the size
of the final binary, we remove them and instead emit STABS symbols for
`dsymutil` and the debugger to locate their contents in the object files.

With this diff, `dsymutil` is able to locate the debug info. However, we need
a few more features before `lldb` is able to work well with our binaries --
e.g. having `LC_DYSYMTAB` accurately reflect the number of local symbols,
emitting `LC_UUID`, and more. Those will be handled in follow-up diffs.

Note also that the STABS we emit differ slightly from what ld64 does. First, we
emit the path to the source file as one `N_SO` symbol instead of two. (`ld64`
emits one `N_SO` for the dirname and one of the basename.) Second, we do not
emit `N_BNSYM` and `N_ENSYM` STABS to mark the start and end of functions,
because the `N_FUN` STABS already serve that purpose. @clayborg recommended
these changes based on his knowledge of what the debugging tools look for.

Additionally, this current implementation doesn't accurately reflect the size
of function symbols. It uses the size of their containing sectioins as a proxy,
but that is only accurate if `.subsections_with_symbols` is set, and if there
isn't an `N_ALT_ENTRY` in that particular subsection. I think we have two
options to solve this:

1. We can split up subsections by symbol even if `.subsections_with_symbols`
   is not set, but include constraints to ensure those subsections retain
   their order in the final output. This is `ld64`'s approach.
2. We could just add a `size` field to our `Symbol` class. This seems simpler,
   and I'm more inclined toward it, but I'm not sure if there are use cases
   that it doesn't handle well. As such I'm punting on the decision for now.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D89257
2020-12-01 15:05:20 -08:00
Nico Weber 83e60f5a55 [lld/mac] Add --reproduce option
This adds support for ld.lld's --reproduce / lld-link's /reproduce:
flag to the MachO port. This flag can be added to a link command
to make the link write a tar file containing all inputs to the link
and a response file containing the link command. This can be used
to reproduce the link on another machine, which is useful for sharing
bug report inputs or performance test loads.

Since the linker is usually called through the clang driver and
adding linker flags can be a bit cumbersome, setting the env var
`LLD_REPRODUCE=foo.tar` triggers the feature as well.

The file response.txt in the archive can be used with
`ld64.lld.darwinnew $(cat response.txt)` as long as the contents are
smaller than the command-line limit, or with `ld64.lld.darwinnew
@response.txt` once D92149 is in.

The support in this patch is sufficient to create a tar file for
Chromium's base_unittests that can link after unpacking on a different
machine.

Differential Revision: https://reviews.llvm.org/D92274
2020-11-30 08:40:21 -05:00