Commit Graph

41 Commits

Author SHA1 Message Date
Jez Ng 118bfde90a [lld-macho] Have ICF dedup explicitly-defined selrefs
This is what ld64 does (though it doesn't use ICF to do this; instead it
always dedups selrefs by default).

We'll want to dedup implicitly-defined selrefs as well, but I will leave
that for future work.

Additionally, I'm not *super* happy with the current LLD implementation
because I think it is rather janky and inefficient. But at least it
moves us toward the goal of closing the size gap with ld64. I've
described ideas for cleaning up our implementation here:
https://github.com/llvm/llvm-project/issues/57714

Differential Revision: https://reviews.llvm.org/D133780
2022-09-14 17:59:22 -04:00
Jez Ng 8d4ca09d06 [lld-macho][nfc] Clean up ICF code
Split these changes out from https://reviews.llvm.org/D133780.
2022-09-14 17:59:22 -04:00
Keith Smiley 15f685eaa8
[lld-macho] Fold cfstrings with --deduplicate-literals
Similar to cstrings ld64 always deduplicates cfstrings. This was already
being done when enabling ICF, but for debug builds you may want to flip
this on if you cannot eliminate your instances of this, so this change
makes --deduplicate-literals also apply to cfstrings.

Differential Revision: https://reviews.llvm.org/D130134
2022-07-20 11:11:09 -07:00
Jez Ng 2d889a87fb [lld-macho] Read in new addrsig format
The new format uses symbol relocations, as described in {D127637}.

Reviewed By: #lld-macho, alx32

Differential Revision: https://reviews.llvm.org/D128938
2022-07-19 21:22:27 -04:00
Jez Ng f6017abb60 [lld-macho] Support folding of functions with identical LSDAs
To do this, we need to slice away the LSDA pointer, just like we are
slicing away the functionAddress pointer.

No observable difference in perf on chromium_framework:

             base           diff           difference (95% CI)
  sys_time   1.769 ± 0.068  1.761 ± 0.065  [  -2.7% ..   +1.8%]
  user_time  9.517 ± 0.110  9.528 ± 0.116  [  -0.6% ..   +0.8%]
  wall_time  8.291 ± 0.174  8.307 ± 0.183  [  -1.1% ..   +1.5%]
  samples    21             25

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D129830
2022-07-19 13:29:52 -04:00
Nico Weber 0ec87addb7 [lld/mac] Add a few TimeTraceScopes
Identical literal folding takes ~1.4% of the time, and was missing
from the trace.

Signature computation still needs ~2.2% of the time, so probably worth
explicitly marking its contribution to "Write output file" (9.1%)

Differential Revision: https://reviews.llvm.org/D128343
2022-06-23 11:46:57 -04:00
Nico Weber 7effcbda49 Rename parallelForEachN to just parallelFor
Patch created by running:

  rg -l parallelForEachN | xargs sed -i '' -c 's/parallelForEachN/parallelFor/'

No behavior change.

Differential Revision: https://reviews.llvm.org/D128140
2022-06-19 17:49:00 -04:00
Jez Ng e183bf8e15 [lld-macho][reland] Initial support for EH Frames
This reverts commit 942f4e3a7c.

The additional change required to avoid the assertion errors seen
previously is:

  --- a/lld/MachO/ICF.cpp
  +++ b/lld/MachO/ICF.cpp
  @@ -443,7 +443,9 @@ void macho::foldIdenticalSections() {
                                 /*relocVA=*/0);
           isec->data = copy;
         }
  -    } else {
  +    } else if (!isEhFrameSection(isec)) {
  +      // EH frames are gathered as hashables from unwindEntry above; give a
  +      // unique ID to everything else.
         isec->icfEqClass[0] = ++icfUniqueID;
       }
     }

Differential Revision: https://reviews.llvm.org/D123435
2022-06-13 07:45:16 -04:00
Douglas Yung 942f4e3a7c Revert "[lld-macho] Initial support for EH Frames"
This reverts commit 826be330af.

This was causing a test failure on build bots:
  - https://lab.llvm.org/buildbot/#/builders/36/builds/21770
  - https://lab.llvm.org/buildbot/#/builders/58/builds/23913
2022-06-09 05:25:43 -07:00
Jez Ng 826be330af [lld-macho] Initial support for EH Frames
== Background ==

`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.

== Caveats ==

It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.

**Known limitations of the current code:**

* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
  relocations as described in 618def651b.
  `llvm-mc` emits regular relocations for ARM EH frames, which we do not
  yet handle correctly.

Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:

             base           diff           difference (95% CI)
  sys_time   1.926 ± 0.168  1.979 ± 0.117  [  -1.2% ..   +6.6%]
  user_time  3.590 ± 0.033  3.606 ± 0.028  [  +0.0% ..   +0.9%]
  wall_time  7.104 ± 0.184  7.179 ± 0.151  [  -0.2% ..   +2.3%]
  samples    30             31

== Design ==

Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)

In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.

== Next Steps ==

In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.

== Misc ==

The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D123435
2022-06-08 23:40:52 -04:00
Alex Borcan e29dc0c6fd [lld] Implement safe icf for MachO
This change implements --icf=safe for MachO based on addrsig section that is implemented in D123751.

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D123752
2022-05-03 21:01:03 -04:00
Jez Ng c242e10c74 [lld-macho] Fix ICF crash when comparing symbol relocs
Previously, when encountering a symbol reloc located in a literal section, we
would look up the contents of the literal at the `symbol value + addend` offset
within the literal section. However, it seems that this offset is not guaranteed
to be valid. Instead, we should use just the symbol value to retrieve the
literal's contents, and compare the addend values separately. ld64 seems to do
this.

Reviewed By: #lld-macho, thevinster

Differential Revision: https://reviews.llvm.org/D124223
2022-04-22 15:36:53 -04:00
Jez Ng 9b7b21d2f7 [lld-macho] Don't allocate memory in parallelForEach
... since BumpPtrAllocator isn't thread-safe.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D121458
2022-03-11 13:32:24 -05:00
Jez Ng ce2ae38124 [lld-macho] Deduplicate the `__objc_classrefs` section contents
ld64 breaks down `__objc_classrefs` on a per-word level and deduplicates
them. This greatly reduces the number of bind entries emitted (and
therefore the amount of work `dyld` has to do at runtime). For
chromium_framework, this change to LLD cuts the number of (non-lazy)
binds from 912 to 190, getting us to parity with ld64 in this aspect.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D121053
2022-03-08 08:34:04 -05:00
Jez Ng 8ec1033933 [lld-macho] Deduplicate CFStrings during ICF
`__cfstring` has embedded addends that foil ICF's hashing / equality
checks. (We can ignore embedded addends when doing ICF because the same
information gets recorded in our Reloc structs.) Therefore, in order to
properly dedup CFStrings, we create a mutable copy of the CFString and
zero out the embedded addends before performing any hashing / equality
checks.

(We did in fact have a partial implementation of CFString deduplication
already. However, it only worked when the cstrings they point to are at
identical offsets in their object files.)

I anticipate this approach can be extended to other similar
statically-allocated struct sections in the future.

In addition, we previously treated all references with differing addends
as unequal. This is not true when the references are to literals:
different addends may point to the same literal in the output binary. In
particular, `__cfstring` has such references to `__cstring`. I've
adjusted ICF's `equalsConstant` logic accordingly, and I've added a few
more tests to make sure the addend-comparison code path is adequately
covered.

Fixes https://github.com/llvm/llvm-project/issues/51281.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D120137
2022-03-08 08:34:03 -05:00
Jez Ng 0405920c5f Re-land [lld-macho][nfc] Don't use `stubsHelperIndex` in ICF hash
Previous attempt was commit 112135e774 and
reverted in d86d431814.
2022-03-07 16:58:00 -05:00
Nico Weber d86d431814 Revert "[lld-macho][nfc] Don't use `stubsHelperIndex` in ICF hash"
This reverts commit 112135e774.
Breaks lld/test/MachO/{icf.s,cfstring-dedup.s,invalid/cfstring.s}
2022-03-07 13:50:38 -05:00
Jez Ng ad1c32e9b3 [lld-macho][nfc] Reduce size of icfEqClass hash
... from a `uint64_t` to a `uint32_t`. (LLD-ELF uses a `uint32_t` too.)

About a 1.7% reduction in peak RSS when linking chromium_framework on my
3.2 GHz 16-Core Intel Xeon W Mac Pro, and no stat sig change in wall
time.

           </Users/jezng/test2.sh ["before"]>  </Users/jezng/test2.sh ["after"]>  difference (95% CI)
  RSS      1003036672.000 ± 9891065.259        985539505.231 ± 10272748.749       [  -2.3% ..   -1.2%]
  samples  27                                  26

             base           diff           difference (95% CI)
  sys_time   1.277 ± 0.023  1.277 ± 0.024  [  -0.9% ..   +0.9%]
  user_time  6.682 ± 0.046  6.598 ± 0.043  [  -1.6% ..   -0.9%]
  wall_time  5.904 ± 0.062  5.895 ± 0.063  [  -0.7% ..   +0.4%]
  samples    46             28

No appreciable change (~0.01%) in number of `equals` comparisons either:

Before:

  ld64.lld: ICF needed 8 iterations
  ld64.lld: equalsConstant() called 701643 times
  ld64.lld: equalsVariable() called 3438526 times

After:

  ld64.lld: ICF needed 8 iterations
  ld64.lld: equalsConstant() called 701729 times
  ld64.lld: equalsVariable() called 3438526 times

Reviewed By: #lld-macho, MaskRay, thakis

Differential Revision: https://reviews.llvm.org/D121052
2022-03-07 12:36:28 -05:00
Jez Ng 112135e774 [lld-macho][nfc] Don't use `stubsHelperIndex` in ICF hash
The existing hashing of stubsHelperIndex has mostly been a no-op* for
some time now (ever since we made ICF run before dylib symbols get their
stubs indices assigned). I guess we could consider hashing the name +
filename of the DylibSymbol instead, but I'm not sure the overhead's
worth it... moreover, LLD/ELF only hashes their Defined symbols as well.

*: Technically it does change the hash value since stubsHelperIndex is
initialized to `UINT32_MAX` by default. But since all stubsHelperIndex
values are the same at when ICF runs, they don't add any useful
information to the hash.
2022-03-07 12:36:28 -05:00
Jez Ng 7028799ca3 [lld-macho][nfc] Rename isec -> referentIsec to avoid shadowing
I found the shadowing a bit confusing
2022-03-07 12:36:28 -05:00
Jez Ng 64cc719766 [lld-macho][nfc] Track # of ICF calls to `equals*` methods
This is debug code that is disabled by default. It'll provide a easy way
to figure out the impact (if any) of tweaking ICF's hashing algorithm
(since a poor quality hash will result in many more `equals*` calls).

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D121051
2022-03-07 12:36:27 -05:00
Jez Ng 53e7eef43f [lld-macho][nfc] Use llvm::function_ref instead of std::function 2022-03-07 12:36:27 -05:00
Jez Ng c416f3fafd [lld-macho][nfc] Remove file statics from ICF.cpp
This gets us closer to the [LLD-as-a-library goal][1].

[1]: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151184.html

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D121050
2022-03-07 12:36:26 -05:00
Jez Ng 8386eb23bf [lld-macho][nfc] Move ICF-specific logic into ICF.cpp
This mirrors the code organization in `lld/ELF`.

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D120378
2022-02-23 08:58:25 -05:00
Jez Ng 69297cf639 [lld-macho] Don't include CommandFlags.h in CommonLinkerContext.h
Main motivation: including `llvm/CodeGen/CommandFlags.h` in
`CommonLinkerContext.h` means that the declaration of `llvm::Reloc` is
visible in any file that includes `CommonLinkerContext.h`. Since our
cpp files have both `using namespace llvm` and `using namespace
lld::macho`, this results in conflicts with `lld::macho::Reloc`.

I suppose we could put `llvm::Reloc` into a nested namespace, but in general,
I think we should avoid transitively including too many header files in
a very widely used header like `CommonLinkerContext.h`.

RegisterCodeGenFlags' ctor initializes a bunch of function-`static`
structures and does nothing else, so it should be fine to "initialize"
it as a temporary stack variable rather than as a file static.

Reviewed By: aganea

Differential Revision: https://reviews.llvm.org/D119913
2022-02-16 20:05:07 -05:00
Nico Weber bc20bcb39e [lld/mac] Crash even less on undefined symbols with --icf=all
Follow-up to https://reviews.llvm.org/D112643. Even after that change, we were
still asserting if two separate functions that are eligible for ICF (same size,
same data, same number of relocs, same reloc types, ...) referred to
Undefineds. This fixes that oversight.

Differential Revision: https://reviews.llvm.org/D114195
2021-11-19 09:23:19 -05:00
Greg McGary 9cc489a4b2 [lld-macho][nfc] Factor-out NFC changes from main __eh_frame diff
In order to keep signal:noise high for the `__eh_frame` diff, I have teased-out the NFC changes and put them here.

Differential Revision: https://reviews.llvm.org/D114017
2021-11-17 15:16:44 -07:00
Jez Ng d9b6f7e312 [lld-macho] Teach ICF to dedup functions with identical unwind info
Dedup'ing unwind info is tricky because each CUE contains a different
function address, if ICF operated naively and compared the entire
contents of each CUE, entries with identical unwind info but belonging
to different functions would never be considered identical. To work
around this problem, we slice away the function address before
performing ICF. We rely on `relocateCompactUnwind()` to correctly handle
these truncated input sections.

Here are the numbers before and after D109944, D109945, and this diff
were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W:

Without any optimizations:

             base           diff           difference (95% CI)
  sys_time   0.849 ± 0.015  0.896 ± 0.012  [  +4.8% ..   +6.2%]
  user_time  3.357 ± 0.030  3.512 ± 0.023  [  +4.3% ..   +5.0%]
  wall_time  3.944 ± 0.039  4.032 ± 0.031  [  +1.8% ..   +2.6%]
  samples    40             38

With `-dead_strip`:

             base           diff           difference (95% CI)
  sys_time   0.847 ± 0.010  0.896 ± 0.012  [  +5.2% ..   +6.5%]
  user_time  3.377 ± 0.014  3.532 ± 0.015  [  +4.4% ..   +4.8%]
  wall_time  3.962 ± 0.024  4.060 ± 0.030  [  +2.1% ..   +2.8%]
  samples    47             30

With `-dead_strip` and `--icf=all`:

             base           diff           difference (95% CI)
  sys_time   0.935 ± 0.013  0.957 ± 0.018  [  +1.5% ..   +3.2%]
  user_time  3.472 ± 0.022  6.531 ± 0.046  [ +87.6% ..  +88.7%]
  wall_time  4.080 ± 0.040  5.329 ± 0.060  [ +30.0% ..  +31.2%]
  samples    37             30

Unsurprisingly, ICF is now a lot slower, likely due to the much larger
number of input sections it needs to process. But the rest of the
linker only suffers a mild slowdown.

Note that the compact-unwind-bad-reloc.s test was expanded because we
now handle the relocation for CUE's function address in a separate code
path from the rest of the CUE relocations. The extended test covers both
code paths.

Reviewed By: #lld-macho, oontvoo

Differential Revision: https://reviews.llvm.org/D109946
2021-11-12 16:02:49 -05:00
Nico Weber 2d48b19136 [lld/mac] Fix mislink with ICF
When comparing relocations against two symbols, ICF's equalsConstant() did not
look at the value of the two symbols. With subsections_via_symbols, the value
is usually 0 but not always: In particular, it isn't 0 for constants in string
and literal sections. Since we ignored the value, comparing two constant string
symbols or two literal symbols always compared the 0th's element, so functions
in the same TU always compared as equal.

This can cause mislinks, and, with -dead_strip, crashes.

Fixes PR52349, see that bug for lots of details and examples of mislinks.

While here, make the existing assembly in icf-literals.s a bit more realistic
(use leaq instead of movq with strings, and use foo(%rip) instead of
foo@gotpcrel(%rip)). This has no interesting effect, it just maybe makes the
test look a bit less surprising.

Differential Revision: https://reviews.llvm.org/D112862
2021-10-30 18:58:59 -04:00
Nico Weber 7f369304df [lld/mac] Don't crash on undefined symbols with --icf=all
ICF runs before relocation processing, but undefined symbol errors
are only emitted during relocation processing.

So just ignore Undefineds during ICF (instead of crashing) -- lld
will emit an error once ICF is done.

Fixes PR52330.

Differential Revision: https://reviews.llvm.org/D112643
2021-10-27 16:20:10 -04:00
Jez Ng 002eda7056 [lld-macho] Associate compact unwind entries with function symbols
Compact unwind entries (CUEs) contain pointers to their respective
function symbols. However, during the link process, it's far more useful
to have pointers from the function symbol to the CUE than vice versa.
This diff adds that pointer in the form of `Defined::compactUnwind`.

In particular, when doing dead-stripping, we want to mark CUEs live when
their function symbol is live; and when doing ICF, we want to dedup
sections iff the symbols in that section have identical CUEs. In both
cases, we want to be able to locate the symbols within a given section,
as well as locate the CUEs belonging to those symbols. So this diff also
adds `InputSection::symbols`.

The ultimate goal of this refactor is to have ICF support dedup'ing
functions with unwind info, but that will be handled in subsequent
diffs. This diff focuses on simplifying `-dead_strip` --
`findFunctionsWithUnwindInfo` is no longer necessary, and
`Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no
longer has to check for dead CUEs -- we simply avoid adding them in the
first place.

Additionally, we now support stripping of dead LSDAs, which follows
quite naturally since `markLive()` can now reach them via the CUEs.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D109944
2021-10-26 16:04:15 -04:00
Nico Weber 1412719066 [lld/mac] Remove else-after-return in ICF code
No behavior change.
2021-10-20 14:24:13 -04:00
Jez Ng 3313b84481 [lld-macho] ICF: Do more work in equalsConstant, less in equalsVariable
In particular, relocations to absolute symbols or literal sections can
be handled in equalsConstant(), since their output addresses will not
change across each iteration of ICF. Offsets and addends can also be
dealt with entirely in equalsConstant(), making the code somewhat easier
to reason about. Only ConcatInputSections need to be handled in
equalsVariable().

LLD-ELF's implementation takes a similar approach.

Although this should make ICF do less work, in practice it seems like
there is no stat sig difference in time taken when linking
chromium_framework.

This refactor is motivated by an upcoming diff which improves ICF's handling of
addends.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D106212
2021-07-23 11:49:00 -04:00
Jez Ng 428a7c1b38 [lld-macho] Have ICF operate on all sections at once
ICF previously operated only within a given OutputSection. We would
merge all CFStrings first, then merge all regular code sections in a
second phase. This worked fine since CFStrings would never reference
regular `__text` sections. However, I would like to expand ICF to merge
functions that reference unwind info. Unwind info references the LSDA
section, which can in turn reference the `__text` section, so we cannot
perform ICF in phases.

In order to have ICF operate on InputSections spanning multiple
OutputSections, we need a way to distinguish InputSections that are
destined for different OutputSections, so that we don't fold across
section boundaries. We achieve this by creating OutputSections early,
and setting `InputSection::parent` to point to them. This is what
LLD-ELF does. (This change should also make it easier to implement the
`section$start$` symbols.)

This diff also folds InputSections w/o checking their flags, which I
think is the right behavior -- if they are destined for the same
OutputSection, they will have the same flags in the output (even if
their input flags differ). I.e. the `parent` pointer check subsumes the
`flags` check. In practice this has nearly no effect (ICF did not become
any more effective on chromium_framework).

I've also updated ICF.cpp's block comment to better reflect its current
status.

Reviewed By: #lld-macho, smeenai

Differential Revision: https://reviews.llvm.org/D105641
2021-07-17 13:42:51 -04:00
Jez Ng f6b6e72143 [lld-macho] Factor out common InputSection members
We have been creating many ConcatInputSections with identical values due
to .subsections_via_symbols. This diff factors out the identical values
into a Shared struct, to reduce memory consumption and make copying
cheaper.

I also changed `callSiteCount` from a uint32_t to a 31-bit field to save an
extra word.

All in all, this takes InputSection from 120 to 72 bytes (and
ConcatInputSection from 160 to 112 bytes), i.e. 30% size reduction in
ConcatInputSection.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          4.14          4.24          4.18         4.183   0.027548999
  +  20          4.04          4.11         4.075        4.0775   0.018027756
  Difference at 95.0% confidence
          -0.1055 +/- 0.0149005
          -2.52211% +/- 0.356215%
          (Student's t, pooled s = 0.0232803)

Reviewed By: #lld-macho, thakis

Differential Revision: https://reviews.llvm.org/D105305
2021-07-01 21:22:39 -04:00
Jez Ng ac2dd06b91 [lld-macho] Deduplicate CFStrings
`__cfstring` is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
`.subsections_via_symbols`.

With that in place, we then run the section through ICF.

This change is about perf-neutral when linking chromium_framework.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D105045
2021-07-01 21:22:38 -04:00
Jez Ng 3a11528d97 [lld-macho] Move ICF earlier to avoid emitting redundant binds
This is a pretty big refactoring diff, so here are the motivations:

Previously, ICF ran after scanRelocations(), where we emitting
bind/rebase opcodes etc. So we had a bunch of redundant leftovers after
ICF. Having ICF run before Writer seems like a better design, and is
what LLD-ELF does, so this diff refactors it accordingly.

However, ICF had two dependencies on things occurring in Writer: 1) it
needs literals to be deduplicated beforehand and 2) it needs to know
which functions have unwind info, which was being handled by
`UnwindInfoSection::prepareRelocations()`.

In order to do literal deduplication earlier, we need to add literal
input sections to their corresponding output sections. So instead of
putting all input sections into the big `inputSections` vector, and then
filtering them by type later on, I've changed things so that literal
sections get added directly to their output sections during the 'gather'
phase. Likewise for compact unwind sections -- they get added directly
to the UnwindInfoSection now. This latter change is not strictly
necessary, but makes it easier for ICF to determine which functions have
unwind info.

Adding literal sections directly to their output sections means that we
can no longer determine `inputOrder` from iterating over
`inputSections`. Instead, we store that order explicitly on
InputSection. Bloating the size of InputSection for this purpose would
be unfortunate -- but LLD-ELF has already solved this problem: it reuses
`outSecOff` to store this order value.

One downside of this refactor is that we now make an additional pass
over the unwind info relocations to figure out which functions have
unwind info, since want to know that before `processRelocations()`. I've
made sure to run that extra loop only if ICF is enabled, so there should
be no overhead in non-optimizing runs of the linker.

The upside of all this is that the `inputSections` vector now contains
only ConcatInputSections that are destined for ConcatOutputSections, so
we can clean up a bunch of code that just existed to filter out other
elements from that vector.

I will test for the lack of redundant binds/rebases in the upcoming
cfstring deduplication diff. While binds/rebases can also happen in the
regular `.text` section, they're more common in `.data` sections, so it
seems more natural to test it that way.

This change is perf-neutral when linking chromium_framework.

Reviewed By: oontvoo

Differential Revision: https://reviews.llvm.org/D105044
2021-07-01 21:22:38 -04:00
Jez Ng 557e1fa02f [lld-macho] Extend ICF to literal sections
Literal sections can be deduplicated before running ICF. That makes it
easy to compare them during ICF: we can tell if two literals are
constant-equal by comparing their offsets in their OutputSection.

LLD-ELF takes a similar approach.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D104671
2021-06-28 14:49:39 -04:00
Jez Ng 8aa17d1eae [lld-macho] Move ICF members from InputSection to ConcatInputSection
`icfEqClass` only makes sense on ConcatInputSections since (in contrast
to literal sections) they are deduplicated as an atomic unit.

Similarly, `hasPersonality` and `replacement` don't make sense on
literal sections.

This mirrors LLD-ELF, which stores `icfEqClass` only on non-mergeable
sections.

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D104670
2021-06-24 22:23:12 -04:00
Greg McGary 8a8558ae27 [lld-macho] add tests for ICF, plus cleanups
Add tests for pending TODOs, plus some global cleanups:
* No fold: func has personality/LSDA
* Fold: reference to absolute symbol with different name but identical value
* No fold: reloc references to absolute symbols with different values
* No fold: N_ALT_ENTRY symbols

Differential Revision: https://reviews.llvm.org/D104721
2021-06-23 20:44:25 -07:00
Greg McGary f27e4548fc [lld-macho] Implement ICF
ICF = Identical C(ode|OMDAT) Folding

This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.

`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.

We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.

Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.44          4.34         4.349   0.043029977
+  20          4.37          4.46         4.405        4.4115   0.025188761
Difference at 95.0% confidence
        0.0625 +/- 0.0225658
        1.43711% +/- 0.518873%
        (Student's t, pooled s = 0.0352566)
```

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D103292
2021-06-17 10:07:44 -07:00