Commit Graph

5 Commits

Author SHA1 Message Date
Vy Nguyen 366df11a35 [lld-macho] Rework mergeFlag to behave closer to what ld64 does.
Details:
I've been getting a few weird errors similar to the following from our internal tests:

```
ld64.lld.darwinnew: error: Cannot merge section __eh_frame (type=0x0) into __eh_frame (type=0xB): inconsistent types
ld64.lld.darwinnew: error: Cannot merge section __eh_frame (flags=0x0) into __eh_frame (flags=0x6800000B): strict flags differ
ld64.lld.darwinnew: error: Cannot merge section __eh_frame (type=0x0) into __eh_frame (type=0xB): inconsistent types
ld64.lld.darwinnew: error: Cannot merge section __eh_frame (flags=0x0) into __eh_frame (flags=0x6800000B): strict flags differ
```

Differential Revision: https://reviews.llvm.org/D103971
2021-06-17 14:22:58 -04:00
Greg McGary f27e4548fc [lld-macho] Implement ICF
ICF = Identical C(ode|OMDAT) Folding

This is the LLD ELF/COFF algorithm, adapted for MachO. So far, only `-icf all` is supported. In order to support `-icf safe`, we will need to port address-significance tables (`.addrsig` directives) to MachO, which will come in later diffs.

`check-{llvm,clang,lld}` have 0 regressions for `lld -icf all` vs. baseline ld64.

We only run ICF on `__TEXT,__text` for reasons explained in the block comment in `ConcatOutputSection.cpp`.

Here is the perf impact for linking `chromium_framekwork` on a Mac Pro (16-core Xeon W) for the non-ICF case vs. pre-ICF:
```
    N           Min           Max        Median           Avg        Stddev
x  20          4.27          4.44          4.34         4.349   0.043029977
+  20          4.37          4.46         4.405        4.4115   0.025188761
Difference at 95.0% confidence
        0.0625 +/- 0.0225658
        1.43711% +/- 0.518873%
        (Student's t, pooled s = 0.0352566)
```

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D103292
2021-06-17 10:07:44 -07:00
Jez Ng da24e6d43e [lld-macho][nfc] Add `final` to classes where possible
I wanted to see if we would get any perf wins out of this, but
it doesn't seem to be the case. But it still seems worth committing.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104200
2021-06-13 19:52:03 -04:00
Jez Ng 04259cde15 [lld-macho] Implement cstring deduplication
Our implementation draws heavily from LLD-ELF's, which in turn delegates
its string deduplication to llvm-mc's StringTableBuilder. The messiness of
this diff is largely due to the fact that we've previously assumed that
all InputSections get concatenated together to form the output. This is
no longer true with CStringInputSections, which split their contents into
StringPieces. StringPieces are much more lightweight than InputSections,
which is important as we create a lot of them. They may also overlap in
the output, which makes it possible for strings to be tail-merged. In
fact, the initial version of this diff implemented tail merging, but
I've dropped it for reasons I'll explain later.

**Alignment Issues**

Mergeable cstring literals are found under the `__TEXT,__cstring`
section. In contrast to ELF, which puts strings that need different
alignments into different sections, clang's Mach-O backend puts them all
in one section. Strings that need to be aligned have the `.p2align`
directive emitted before them, which simply translates into zero padding
in the object file.

I *think* ld64 extracts the desired per-string alignment from this data
by preserving each string's offset from the last section-aligned
address. I'm not entirely certain since it doesn't seem consistent about
doing this; but perhaps this can be chalked up to cases where ld64 has
to deduplicate strings with different offset/alignment combos -- it
seems to pick one of their alignments to preserve. This doesn't seem
correct in general; we can in fact can induce ld64 to produce a crashing
binary just by linking in an additional object file that only contains
cstrings and no code. See PR50563 for details.

Moreover, this scheme seems rather inefficient: since unaligned and
aligned strings are all put in the same section, which has a single
alignment value, it doesn't seem possible to tell whether a given string
doesn't have any alignment requirements. Preserving offset+alignments
for strings that don't need it is wasteful.

In practice, the crashes seen so far seem to stem from x86_64 SIMD
operations on cstrings. X86_64 requires SIMD accesses to be
16-byte-aligned. So for now, I'm thinking of just aligning all strings
to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise
it's simpler than preserving per-string alignment+offsets. It also
avoids the aforementioned crash after deduplication of
differently-aligned strings. Finally, the overhead is not huge: using
16-byte alignment (vs no alignment) is only a 0.5% size overhead when
linking chromium_framework.

With these alignment requirements, it doesn't make sense to attempt tail
merging -- most strings will not be eligible since their overlaps aren't
likely to start at a 16-byte boundary. Tail-merging (with alignment) for
chromium_framework only improves size by 0.3%.

It's worth noting that LLD-ELF only does tail merging at `-O2`. By
default (at `-O1`), it just deduplicates w/o tail merging. @thakis has
also mentioned that they saw it regress compressed size in some cases
and therefore turned it off. `ld64` does not seem to do tail merging at
all.

**Performance Numbers**

CString deduplication reduces chromium_framework from 250MB to 242MB, or
about a 3.2% reduction.

Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.99          4.14         4.015        4.0365     0.0492336
  Difference at 95.0% confidence
          0.0865 +/- 0.027245
          2.18987% +/- 0.689746%
          (Student's t, pooled s = 0.0425673)

As expected, cstring merging incurs some non-trivial overhead.

When passing `--no-literal-merge`, it seems that performance is the
same, i.e. the refactoring in this diff didn't cost us.

      N           Min           Max        Median           Avg        Stddev
  x  20          3.91          4.03         3.935          3.95   0.034641016
  +  20          3.89          4.02         3.935        3.9435   0.043197831
  No difference proven at 95.0% confidence

Reviewed By: #lld-macho, gkm

Differential Revision: https://reviews.llvm.org/D102964
2021-06-07 23:48:35 -04:00
Jez Ng 33706191d8 [lld-macho][nfc] Rename MergedOutputSection to ConcatOutputSection
The ELF format has the concept of merge sections (marked by SHF_MERGE),
which contain data that can be safely deduplicated. The Mach-O
equivalents are called literal sections (marked by S_CSTRING_LITERALS or
S_{4,8,16}BYTE_LITERALS). While the Mach-O format doesn't use the word
'merge', to avoid confusion, I've renamed our MergedOutputSection to
ConcatOutputSection. I believe it's a more descriptive name too.

This renaming sets the stage for {D102964}.

Reviewed By: #lld-macho, alexshap

Differential Revision: https://reviews.llvm.org/D102971
2021-05-25 14:58:29 -04:00