Commit Graph

72 Commits

Author SHA1 Message Date
Keith Smiley 0bc100986c
[lld-macho] Add support for -alias
This creates a symbol alias similar to --defsym in the elf linker. This
is used by swiftpm for all executables, so it's useful to support. This
doesn't implement -alias_list but that could be done pretty easily as
needed.

Differential Revision: https://reviews.llvm.org/D129938
2022-07-19 13:55:56 -07:00
Kaining Zhong 6c641d0de6 [lld-macho] Handle user-provided dtrace symbols to avoid linking failure
This fixes https://github.com/llvm/llvm-project/issues/56238. ld64.lld currently does not generate __dof section in Mach-O, and -no_dtrace_dof option is on by default. However when there are user-defined dtrace symbols, ld64.lld will treat them as undefined symbols, which causes the linking to fail because lld cannot find their definitions. This patch allows ld64.lld to rewrite the instructions calling dtrace symbols to instructions like nop as what ld64 does; therefore, when encountered with user-provided dtrace probes, the linking can still succeed.

I'm not sure whether support for dtrace is expected in lld, so for now I didn't add codes to make lld emit __dof section like ld64, and only made it possible to link with dtrace symbols provided. If this feature is needed, I can add that part in Dtrace.cpp & Dtrace.h.

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D129062
2022-07-11 15:32:26 -04:00
Daniel Bertalan ed39fd515a [lld-macho] Use source information in duplicate symbol errors
Similarly to how undefined symbol diagnostics were changed in D128184,
we now show where in the source file duplicate symbols are defined at:

  ld64.lld: error: duplicate symbol: _foo
  >> defined in bar.c:42
  >>            /path/to/bar.o
  >> defined in baz.c:1
  >>            /path/to/libbaz.a(baz.o)

For objects that don't contain DWARF data, the format is unchanged.

A slight difference to undefined symbol diagnostics is that we don't
print the name of the symbol on the third line, as it's already
contained on the first line.

Differential Revision: https://reviews.llvm.org/D128425
2022-06-23 11:07:15 -04:00
Daniel Bertalan 5792797c5b Reland "[lld-macho] Show source information for undefined references"
The error used to look like this:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _baz+0x4)

If DWARF line information is available, we now show where in the source
the references are coming from:

  ld64.lld: error: unreferenced symbol: _foo
  >>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
  >>>                /path/to/bar.o:(symbol _baz+0x4)

The reland is identical to the first time this landed. The fix was in D128294.
This reverts commit 0cc7ad4175.

Differential Revision: https://reviews.llvm.org/D128184
2022-06-21 18:50:06 -04:00
Nico Weber 0cc7ad4175 Revert "[lld-macho] Show source information for undefined references"
This reverts commit cd7624f153.
See https://reviews.llvm.org/D128184#3597534
2022-06-20 19:15:57 -04:00
Daniel Bertalan cd7624f153 [lld-macho] Show source information for undefined references
The error used to look like this:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _baz+0x4)

If DWARF line information is available, we now show where in the source
the references are coming from:

  ld64.lld: error: unreferenced symbol: _foo
  >>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
  >>>                /path/to/bar.o:(symbol _baz+0x4)

Differential Revision: https://reviews.llvm.org/D128184
2022-06-20 18:49:42 -04:00
Daniel Bertalan 0eec7e2a89 Reland "[lld-macho] Group undefined symbol diagnostics by symbol".
This reverts commit 36e7c9a450.

This relands d61341768c with the fix described in
https://reviews.llvm.org/D127753#3587390
2022-06-15 19:22:39 -04:00
Stella Stamenova 36e7c9a450 Revert "[lld-macho] Group undefined symbol diagnostics by symbol"
This reverts commit d61341768c.

This change broke multiple lld tests, including some sanitizer builds: https://lab.llvm.org/buildbot/#/builders/5/builds/24787/steps/19/logs/stdio
2022-06-15 15:42:26 -07:00
Daniel Bertalan d61341768c [lld-macho] Group undefined symbol diagnostics by symbol
ld64.lld used to print the "undefined symbol" line for each reference to
an undefined symbol previously:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _baz+0x0)

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _quux+0x1)

Now they are deduplicated:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _baz+0x0)
  >>> referenced by /path/to/bar.o:(symbol _quux+0x1)

As with the other lld ports, only the first 3 references are printed.

Differential Revision: https://reviews.llvm.org/D127753
2022-06-14 16:38:11 -04:00
Daniel Bertalan f2e92cf60e [lld-macho] Print the name of functions containing undefined references
The error used to look like this:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o

Now it displays the name of the function that contains the undefined
reference as well:

  ld64.lld: error: undefined symbol: _foo
  >>> referenced by /path/to/bar.o:(symbol _baz+0x4)

Differential Revision: https://reviews.llvm.org/D127696
2022-06-14 09:41:28 -04:00
Vy Nguyen 66bd14697b [lld-macho] Demangle symbol names in duplicate-symbol error when -demangle is specified
Differential Revision: https://reviews.llvm.org/D127110
2022-06-06 15:12:26 -04:00
Jez Ng 1cff723ff5 [lld-macho][nfc] Use includeInSymtab for all symtab-skipping logic
{D123302} got me looking deeper at `includeInSymtab`. I thought it was a
little odd that there were excluded (live) symbols for which
`includeInSymtab` was false; we shouldn't have so many different ways to
exclude a symbol. As such, this diff makes the `L`-prefixed-symbol
exclusion code use `includeInSymtab` too. (Note that as part of our
support for `__eh_frame`, we will also be excluding all `__eh_frame`
symbols from the symtab in a future diff.)

Another thing I noticed is that the `emitStabs` code never has to deal
with excluded symbols because `SymtabSection::finalize()` already
filters them out. As such, I've updated the comments and asserts from
{D123302} to reflect this.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D123433
2022-04-11 15:45:46 -04:00
Nico Weber 2cb3d28b17 [lld/mac] Add some comments and asserts
I was wondering if SymtabSection::emitStabs() should check
defined->includeInSymtab. Add asserts and comments explaining why that's not
necessary.

No behavior change.

Differential Revision: https://reviews.llvm.org/D123302
2022-04-07 15:43:28 -04:00
Jez Ng ceff23c6e3 [lld-macho] -flat_namespace for dylibs should make all externs interposable
All references to interposable symbols can be redirected at runtime to
point to a different symbol definition (with the same name). For
example, if both dylib A and B define symbol _foo, and we load A before
B at runtime, then all references to _foo within dylib B will point to
the definition in dylib A.

ld64 makes all extern symbols interposable when linking with
`-flat_namespace`.

TODO 1: Support `-interposable` and `-interposable_list`, which should
just be a matter of parsing those CLI flags and setting the
`Defined::interposable` bit.

TODO 2: Set Reloc::FinalDefinitionInLinkageUnit correctly with this info
(we are currently not setting it at all, so we're erring on the
conservative side, but we should help the LTO backend generate more
optimal code.)

Reviewed By: modimo, MaskRay

Differential Revision: https://reviews.llvm.org/D119294
2022-03-14 22:18:32 -04:00
Jez Ng 2b78ef06c2 [lld-macho][nfc] Eliminate InputSection::Shared
Earlier in LLD's evolution, I tried to create the illusion that
subsections were indistinguishable from "top-level" sections. Thus, even
though the subsections shared many common field values, I hid those
common values away in a private Shared struct (see D105305). More
recently, however, @gkm added a public `Section` struct in D113241 that
served as an explicit way to store values that are common to an entire
set of subsections (aka InputSections). Now that we have another "common
value" struct, `Shared` has been rendered redundant. All its fields can
be moved into `Section` instead, and the pointer to `Shared` can be replaced
with a pointer to `Section`.

This `Section` pointer also has the advantage of letting us inspect other
subsections easily, simplifying the implementation of {D118798}.

P.S. I do think that having both `Section` and `InputSection` makes for
a slightly confusing naming scheme. I considered renaming `InputSection`
to `Subsection`, but that would break the symmetry with `OutputSection`.
It would also make us deviate from LLD-ELF's naming scheme.

This change is perf-neutral on my 3.2 GHz 16-Core Intel Xeon W machine:

             base           diff           difference (95% CI)
  sys_time   1.258 ± 0.031  1.248 ± 0.023  [  -1.6% ..   +0.1%]
  user_time  3.659 ± 0.047  3.658 ± 0.041  [  -0.5% ..   +0.4%]
  wall_time  4.640 ± 0.085  4.625 ± 0.063  [  -1.0% ..   +0.3%]
  samples    49             61

There's also no stat sig change in RSS (as measured by `time -l`):

           base                         diff                           difference (95% CI)
  time     998038627.097 ± 13567305.958 1003327715.556 ± 15210451.236  [  -0.2% ..   +1.2%]
  samples  31                           36

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D118797
2022-02-03 19:55:42 -05:00
Fangrui Song 0aae2bf373 [lld-macho] Add --start-lib --end-lib
In ld.lld, when an ObjFile/BitcodeFile is read in --start-lib state, the file is
given archive semantics. --end-lib closes the previous --start-lib. A build
system can use this feature as an alternative to archives. This patch ports
the feature to lld-macho.

--start-lib and --end-lib are positional, unlike usual ld64 options.
I think the slight drawback does not matter as (a) reusing option names
make build systems convenient (b) `--start-lib a.o b.o --end-lib` conveys more
information than an alternative design: `-objlib a.o -objlib b.o` because
--start-lib makes it clear which objects are in the same conceptual archive.
This provides flexibility (c) `-objlib`/`-filelist` interaction may be weird.

Close https://github.com/llvm/llvm-project/issues/52931

Reviewed By: #lld-macho, Jez Ng, oontvoo

Differential Revision: https://reviews.llvm.org/D116913
2022-01-19 10:14:49 -08:00
Fangrui Song 97a5dccb7d [lld-macho] Rename LazySymbol to LazyArchive. NFC
D116913 will add LazyObject. Rename LazySymbol to LazyArchive to avoid confusion
and mirror ELF.

Reviewed By: #lld-macho, Jez Ng

Differential Revision: https://reviews.llvm.org/D116914
2022-01-11 16:49:06 -08:00
Fangrui Song 477bc36d3b [lld-macho] Change some global pointers to unique_ptr
Similar to D116143. My x86-64 `lld` is ~8KiB smaller.

Reviewed By: keith

Differential Revision: https://reviews.llvm.org/D116902
2022-01-10 19:39:14 -08:00
Jez Ng 1b44364714 [lld-macho] Unreferenced weak dylib symbols shouldn't fetch archive symbols
We were fetching archive symbols too eagerly, bloating binary size as well as
just screwing up binaries that expected to look up certain symbols only at
runtime.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D115092
2021-12-05 15:11:44 -05:00
Vy Nguyen 9b29dae3ca [lld-macho] Allow exporting weak_def_can_be_hidden(AKA "autohide") symbols
autohide symbols behaves similarly to private_extern symbols.
However, LD64 allows exporting autohide symbols. LLD currently does not.
This patch allows LLD to export them.

Differential Revision: https://reviews.llvm.org/D113167
2021-11-12 21:57:30 -05:00
Vy Nguyen 2e1be96df6 Reland "[lld-macho] Fix assertion failure in registerCompactUnwind""
PR/52372

  Differential Revision: https://reviews.llvm.org/D112977

New changes:
- use llvm-otool instead of `otool` which doesn't in exist on non-OSX platforms
- add llvm-otool to the set of tools used by test so that the bot will use the <build_dir>/bin/llvm-otool instead of the unqualified `llvm-otool` (which may not exist)
- update tests since the latest (TOT) llvm-otool prints a space between two bytes and the old one doesn't.
2021-11-09 11:52:46 -05:00
Vy Nguyen eb4a517816 Revert "[lld-macho] Fix assertion failure in registerCompactUnwind"
broke windows build - reverting to investigate
This reverts commit b2d9258474.
2021-11-09 10:31:47 -05:00
Vy Nguyen b2d9258474 [lld-macho] Fix assertion failure in registerCompactUnwind
PR/52372

  Differential Revision: https://reviews.llvm.org/D112977
2021-11-09 10:08:17 -05:00
Jez Ng 002eda7056 [lld-macho] Associate compact unwind entries with function symbols
Compact unwind entries (CUEs) contain pointers to their respective
function symbols. However, during the link process, it's far more useful
to have pointers from the function symbol to the CUE than vice versa.
This diff adds that pointer in the form of `Defined::compactUnwind`.

In particular, when doing dead-stripping, we want to mark CUEs live when
their function symbol is live; and when doing ICF, we want to dedup
sections iff the symbols in that section have identical CUEs. In both
cases, we want to be able to locate the symbols within a given section,
as well as locate the CUEs belonging to those symbols. So this diff also
adds `InputSection::symbols`.

The ultimate goal of this refactor is to have ICF support dedup'ing
functions with unwind info, but that will be handled in subsequent
diffs. This diff focuses on simplifying `-dead_strip` --
`findFunctionsWithUnwindInfo` is no longer necessary, and
`Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no
longer has to check for dead CUEs -- we simply avoid adding them in the
first place.

Additionally, we now support stripping of dead LSDAs, which follows
quite naturally since `markLive()` can now reach them via the CUEs.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D109944
2021-10-26 16:04:15 -04:00
Nico Weber 80caa1eb4a [lld/mac] Add support for segment$start$ and segment$end$ symbols
These symbols are somewhat interesting in that they create non-existing
segments, which as far as I know is the only way to create segments
that don't contain any sections.

Final part of part of PR50760. Like D106629, but for segments instead
of sections. I'm not aware of anything that needs this in practice.

Differential Revision: https://reviews.llvm.org/D106767
2021-07-25 18:25:13 -04:00
Nico Weber 04e8d0b62d [lld/mac] Implement support for section$start and section$ end symbols
With this, libclang_rt.profile_osx.a can be linked, that is coverage
and PGO-instrumented builds should now work with lld.

section$start and section$end symbols can create non-existing sections.
They're also undefined symbols that are only magic if there isn't a
regular symbol with their name, which means the need to be handled
in treatUndefined() instead of just looping over all existing
sections and adding start and end symbols like the ELF port does.

To represent the actual symbols, this uses absolute symbols that
get their value updated once an output section is layed out.

segment$start and segment$end are still missing for now, but they produce a
nicer error message after this patch.

Main part of PR50760.

Differential Revision: https://reviews.llvm.org/D106629
2021-07-23 16:01:09 -04:00
Nico Weber 2d6fb62ef2 [lld/mac] Handle symbols from -U in treatUndefinedSymbol()
In ld64, `-U section$start$FOO$bar` handles `section$start$FOO$bar`
as a regular `section$start` symbol, that is section$start processing
happens before -U processing.

Likely, nobody uses that in practice so it doesn't seem very important
to be compatible with this, but it also moves the -U handling code next
to the `-undefined dynamic_lookup` handling code, which is nice because
they do the same thing. And, in fact, this did identify a bug in a corner
case in the intersection of `-undefined dynamic_lookup` and dead-stripping
(fix for that in D106565).

Vaguely related to PR50760.

No interesting behavior change.

Differential Revision: https://reviews.llvm.org/D106566
2021-07-22 19:43:57 -04:00
Nico Weber 64be5b7d87 [lld/mac] Implement -arch_multiple
This is the other flag clang passes when calling clang with two -arch
flags (which means with this, `clang -arch x86_64 -arch arm64 -fuse-ld=lld ...`
now no longer prints any warnings \o/). Since clang calls the linker several
times in that setup, it's not clear to the user from which invocation the
errors are. The flag's help text is

    Specifies that the linker should augment error and warning messages
    with the architecture name.

In ld64, the only effect of the flag is that undefined symbols are prefaced
with

    Undefined symbols for architecture x86_64:

instead of the usual "Undefined symbols:". So for now, let's add this
only to undefined symbol errors too. That's probably the most common
linker diagnostic.

Another idea would be to prefix errors and warnings with "ld64.lld(x86_64):"
instead of the usual "ld64.lld:", but I'm not sure if people would
misunderstand that as a comment about the arch of ld itself.
But open to suggestions on what effect this flag should have :) And we
don't have to get it perfect now, we can iterate on it.

Differential Revision: https://reviews.llvm.org/D105450
2021-07-06 00:25:18 -04:00
Jez Ng f6b6e72143 [lld-macho] Factor out common InputSection members
We have been creating many ConcatInputSections with identical values due
to .subsections_via_symbols. This diff factors out the identical values
into a Shared struct, to reduce memory consumption and make copying
cheaper.

I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an
extra word.

All in all, this takes InputSection from 120 to 72 bytes (and
ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in
ConcatInputSection.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.14          4.24          4.18         4.183   0.027548999
  +  20          4.04          4.11         4.075        4.0775   0.018027756
  Difference at 95.0% confidence
          -0.1055 +/- 0.0149005
          -2.52211% +/- 0.356215%
          (Student's t, pooled s = 0.0232803)

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D105305
2021-07-01 21:22:39 -04:00
Jez Ng 7f2ba39b16 [lld-macho][nfc] Move liveness-tracking fields into ConcatInputSection
These fields currently live in the parent InputSection class,
but they should be specific to ConcatInputSection, since the other
InputSection classes (that contain literals) aren't atomically live or
dead -- rather their component string/int literals should have
individual liveness states. (An upcoming diff will add liveness bits for
StringPieces and fixed-sized literals.)

I also factored out some asserts for isCoalescedWeak() in MarkLive.cpp.
We now avoid putting coalesced sections in the `inputSections` vector,
so we don't have to check/assert against it everywhere.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D103977
2021-06-11 19:50:08 -04:00
Nico Weber a5645513db [lld/mac] Implement -dead_strip
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.

Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).

Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:

    % ministat lld_*
    x lld_nostrip.txt
    + lld_strip.txt
        N           Min           Max        Median           Avg        Stddev
    x  10      3.929414       4.07692     4.0269079     4.0089678   0.044214794
    +  10     3.8129408     3.9025559     3.8670411     3.8642573   0.024779651
    Difference at 95.0% confidence
            -0.144711 +/- 0.0336749
            -3.60967% +/- 0.839989%
            (Student's t, pooled s = 0.0358398)

This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols

It's possible it interacts with more features I didn't think of,
of course.

I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
  as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests

Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.

Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
  since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
  (`.no_dead_strip` in asm)

Fixes PR49276.

Differential Revision: https://reviews.llvm.org/D103324
2021-06-02 11:09:26 -04:00
Nico Weber 2c1903412b [lld/mac] Implement removal of unused dylibs
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed

This matches ld64.

Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.

With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)

Differential Revision: https://reviews.llvm.org/D103430
2021-06-01 16:06:30 -04:00
Nico Weber 4a12248ee2 [lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.

(Corresponds to symbolTableInAndNeverStrip in ld64.)

Differential Revision: https://reviews.llvm.org/D102619
2021-05-17 14:22:12 -04:00
Jez Ng 2516b0b526 [lld-macho] Treat undefined symbols uniformly
In particular, we should apply the `-undefined` behavior to all
such symbols, include those that are specified via the command line
(i.e.  `-e`, `-u`, and `-exported_symbol`). ld64 supports this too.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D102143
2021-05-10 15:45:54 -04:00
Nico Weber d5a70db193 [lld/mac] Write every weak symbol only once in the output
Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.

Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:

        N           Min           Max        Median           Avg        Stddev
    x  10     3.9957051     4.3496981     4.1411121      4.156837    0.10092097
    +  10      3.908154      4.169318     3.9712729     3.9846753   0.075773012
    Difference at 95.0% confidence
            -0.172162 +/- 0.083847
            -4.14165% +/- 2.01709%
            (Student's t, pooled s = 0.0892373)

Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)

Some restrictions:
- not yet done for bitcode inputs
- no "comdat" handling (`kindNoneGroupSubordinate*` in ld64) --
  Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs)
  (that is, catch block unwind information) and Personality Routines
  associated with weak functions still not stripped. This is wasteful,
  but harmless.
- However, this does strip weaks from __unwind_info (which is needed for
  correctness and not just for size)
- This nopes out on InputSections that are referenced form more than
  one symbol (eg from .alt_entry) for now

Things that work based on symbols Just Work:
- map files (change in MapFile.cpp is no-op and not needed; I just
  found it a bit more explicit)
- exports

Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).

This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.

I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.

Differential Revision: https://reviews.llvm.org/D102076
2021-05-07 17:11:40 -04:00
Jez Ng 05c5363b39 [lld-macho] Parse & emit the N_ARM_THUMB_DEF symbol flag
Eventually we'll use this flag to properly handle bl/blx
opcodes.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D101558
2021-04-30 16:17:26 -04:00
Jez Ng eb5b7d4497 [lld-macho] LTO: Unset VisibleToRegularObj where possible
This allows LLVM's LTO to internalize symbols that are not referenced
directly by regular objects. Naturally, this means we need to track
which symbols are referenced by regular objects. The approach taken here
is similar to LLD-COFF's: like the COFF port, we extend
`SymbolTable::insert()` to set the isVisibleToRegularObj bit. (LLD-ELF
relies on the Symbol constructor and `Symbol::mergeProperties()`, but
the Mach-O port does not have a `mergeProperties()` equivalent.)

From what I can tell, ld64 (which uses libLTO) doesn't do this
optimization at all. I'm not even sure libLTO provides a way to do this.
Not having ld64's behavior as a reference implementation is unfortunate;
instead, I am relying on LLD-ELF/COFF's behavior as references while
erring on the conservative side. In particular, LLD-MachO will only do
this optimization for executables right now.

We also don't attempt it when `-flat_namespace` is used -- otherwise
we'd need scan the symbol table to find matches for every un-namespaced
symbol reference, which is expensive.

internalize.ll is based off the LLD-ELF tests `internalize-basic.ll` and
`internalize-undef.ll`. Looks like @davide added some of LLD-ELF's internalize
tests, so adding him as a reviewer...

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D99105
2021-04-15 21:16:33 -04:00
Jez Ng 2461804b48 [lld-macho] Symbol::value should always be uint64_t
D98837 migrated a bunch of `value`s to uint64_t, but missed these.
2021-04-06 17:54:11 -04:00
Jez Ng ceec610754 [lld-macho] Fix & refactor symbol size calculations
I noticed two problems with the previous implementation:

* N_ALT_ENTRY symbols weren't being handled correctly -- they should
  determine the size of the previous symbol, even though they don't
  cause a new section to be created
* The last symbol in a section had its size calculated wrongly;
  the first subsection's size was used instead of the last one

I decided to take the opportunity to refactor things as well, mainly to
realize my observation
[here](https://reviews.llvm.org/D98837#inline-931511) that we could
avoid doing a binary search to match symbols with subsections. I think
the resulting code is a bit simpler too.

      N           Min           Max        Median           Avg        Stddev
  x  20          4.31          4.43          4.37        4.3775   0.034162922
  +  20          4.32          4.43          4.38        4.3755    0.02799906
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D99972
2021-04-06 15:10:01 -04:00
Alexander Shaposhnikov f6ad045366 [lld][MachO] Make emitEndFunStab independent from .subsections_via_symbols
This diff addresses FIXME in SyntheticSections.cpp and removes
the dependency of emitEndFunStab on .subsections_via_symbols.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D99054
2021-04-01 17:48:09 -07:00
Vy Nguyen 66f340051a [lld-macho] Define __mh_*_header synthetic symbols.
Bug: https://bugs.llvm.org/show_bug.cgi?id=49290

    Differential Revision: https://reviews.llvm.org/D97007
2021-03-19 14:14:40 -04:00
Greg McGary a170533632 [lld-macho][NFC] Drop unnecessary braces around simple if/for bodies
Minor cleanup

Differential Revision: https://reviews.llvm.org/D98758
2021-03-16 22:39:39 -07:00
Greg McGary db1e845a96 [lld-macho] Handle error cases properly for -exported_symbol(s_list)
This fixes defects in D98223 [lld-macho] implement options -(un)exported_symbol(s_list):
* disallow export of hidden symbols
* verify that whitelisted literal names are defined in the symbol table
* reflect export-status overrides in `nlist` attribute of `N_EXT` or `N_PEXT`

Thanks to @thakis for raising these issues

Differential Revision: https://reviews.llvm.org/D98381
2021-03-16 21:20:39 -07:00
Jez Ng d8283d9ddc [lld-macho][nfc] Give every SyntheticSection a fake InputSection
Previously, it was difficult to write code that handled both synthetic
and regular sections generically. We solve this problem by creating a
fake InputSection at the start of every SyntheticSection.

This refactor allows us to handle DSOHandle like a regular Defined
symbol (since Defined symbols must be attached to an InputSection), and
paves the way for supporting `__mh_*header` symbols. Additionally, it
simplifies our binding/rebase code.

I did have to extend Defined a little -- it now has a `linkerInternal`
flag, to indicate that `___dso_handle` should not be in the final symbol
table.

I've also added some additional testing for `___dso_handle`.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D98545
2021-03-12 17:26:27 -05:00
Greg McGary fdc0c21973 [lld-macho][NFC] when reasonable, replace auto keyword with type names
lld policy discourages `auto`. Replace it with a type name whenever reasonable. Retain `auto` to avoid ...
* redundancy, as for decls such as `auto *t = mumble_cast<TYPE *>` or similar that specifies the result type on the RHS
* verbosity, as for iterators
* gratuitous suffering, as for lambdas

Along the way, add `const` when appropriate.

Note: a future diff will ...
* add more `const` qualifiers
* remove `opt::` when we are already `using llvm::opt`

Differential Revision: https://reviews.llvm.org/D98313
2021-03-09 22:08:32 -08:00
Nico Weber 0658fc654c [lld/mac] Implement the missing bits of -undefined
This adds support for `-undefined dynamic_lookup`, and for
`-undefined warning` and `-undefined suppress` with `-flat_namespace`.

We just replace undefined symbols with a DynamicLookup when we hit them.

With this, `check-llvm` passes when using ld64.lld.darwinnew as host linker.

Differential Revision: https://reviews.llvm.org/D97642
2021-03-01 15:30:53 -05:00
Nico Weber 8174f33dc9 [lld/mac] Add support for -flat_namespace
-flat_namespace makes lld emit binaries that use name lookup that's more in
line with other POSIX systems: Instead of looking up symbols as (dylib,name)
pairs by dyld, they're instead looked up just by name.

-flat_namespace has three effects:

1. MH_TWOLEVEL and MH_NNOUNDEFS are no longer set in the Mach-O header
2. All symbols use BIND_SPECIAL_DYLIB_FLAT_LOOKUP as ordinal
3. When a dylib is added to the link, its dependent dylibs are also added,
   so that lld can verify that no undefined symbols remain at the end of
   a link with -flat_namespace. These transitive dylibs are added for symbol
   resolution, but they are not emitted in LC_LOAD_COMMANDs.

-undefined with -flat_namespace still isn't implemented. Before this change,
it was impossible to hit that combination because -flat_namespace caused a
diagnostic. Now that it no longer does, emit a dedicated temporary diagnostic
when both flags are used.

Differential Revision: https://reviews.llvm.org/D97641
2021-03-01 15:25:10 -05:00
Nico Weber cafb6cd10c [lld/mac] Add some support for dynamic lookup symbols, and implement -U
Dynamic lookup symbols are symbols that work like dynamic symbols
in ELF: They're not bound to a dylib like normal Mach-O twolevel lookup
symbols, but they live in a global pool and dyld resolves them against
exported symbols from all loaded dylibs.

This adds support for dynamical lookup symbols to lld/mac. They are
represented as DylibSymbols with file set to nullptr.

This also uses this support to implement the -U flag, which makes
a specific symbol that's undefined at the end of the link a
dynamic lookup symbol.

For -U, it'd be sufficient to just to a pass over remaining undefined symbols
at the end of the link and to replace them with dynamic lookup symbols then.
But I'd like to use this code to implement flat_namespace too, and that will
require real support for resolving dynamic lookup symbols in SymbolTable. So
this patch adds this now already.

While writing tests for this, I noticed that we didn't set N_WEAK_DEF in the
symbol table for DylibSymbols, so this fixes that too.

Differential Revision: https://reviews.llvm.org/D97521
2021-02-26 16:50:53 -05:00
Jez Ng 163dcd8513 [lld-macho] Associate each Symbol with an InputFile
This makes our error messages more informative. But the bigger motivation is for
LTO symbol resolution, which will be in an upcoming diff. The changes in this
one are largely mechanical.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D94316
2021-02-03 13:43:47 -05:00
Jez Ng e98b441a09 [lld-macho] Remove unnecessary llvm:: namespace prefixes 2021-01-09 12:44:35 -05:00