llvm-project/lld/test/MachO/invalid/reserved-section-name.s

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

20 lines
391 B
ArmAsm
Raw Normal View History

[lld-macho] Refactor segment/section creation, sorting, and merging Summary: There were a few issues with the previous setup: 1. The section sorting comparator used a declarative map of section names to determine the correct order, but it turns out we need to match on more than just names -- in particular, an upcoming diff will sort based on whether the S_ZERO_FILL flag is set. This diff changes the sorter to a more imperative but flexible form. 2. We were sorting OutputSections stored in a MapVector, which left the MapVector in an inconsistent state -- the wrong keys map to the wrong values! In practice, we weren't doing key lookups (only container iteration) after the sort, so this was fine, but it was still a dubious state of affairs. This diff copies the OutputSections to a vector before sorting them. 3. We were adding unneeded OutputSections to OutputSegments and then filtering them out later, which meant that we had to remember whether an OutputSegment was in a pre- or post-filtered state. This diff only adds the sections to the segments if they are needed. In addition to those major changes, two minor ones worth noting: 1. I renamed all OutputSection variable names to `osec`, to parallel `isec`. Previously we were using some inconsistent combination of `osec`, `os`, and `section`. 2. I added a check (and a test) for InputSections with names that clashed with those of our synthetic OutputSections. Reviewers: #lld-macho Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81887
2020-06-15 15:03:24 +08:00
# REQUIRES: x86
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t.o
# RUN: not %lld -o %t %t.o 2>&1 | FileCheck %s -DFILE=%t.o
[lld-macho] Refactor segment/section creation, sorting, and merging Summary: There were a few issues with the previous setup: 1. The section sorting comparator used a declarative map of section names to determine the correct order, but it turns out we need to match on more than just names -- in particular, an upcoming diff will sort based on whether the S_ZERO_FILL flag is set. This diff changes the sorter to a more imperative but flexible form. 2. We were sorting OutputSections stored in a MapVector, which left the MapVector in an inconsistent state -- the wrong keys map to the wrong values! In practice, we weren't doing key lookups (only container iteration) after the sort, so this was fine, but it was still a dubious state of affairs. This diff copies the OutputSections to a vector before sorting them. 3. We were adding unneeded OutputSections to OutputSegments and then filtering them out later, which meant that we had to remember whether an OutputSegment was in a pre- or post-filtered state. This diff only adds the sections to the segments if they are needed. In addition to those major changes, two minor ones worth noting: 1. I renamed all OutputSection variable names to `osec`, to parallel `isec`. Previously we were using some inconsistent combination of `osec`, `os`, and `section`. 2. I added a check (and a test) for InputSections with names that clashed with those of our synthetic OutputSections. Reviewers: #lld-macho Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81887
2020-06-15 15:03:24 +08:00
# CHECK: error: section from [[FILE]] conflicts with synthetic section __DATA_CONST,__got
.globl _main
.section __DATA_CONST,__got
.space 1
[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. **Alignment Issues** Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I *think* ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. **Performance Numbers** CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964
2021-06-08 11:47:12 +08:00
.data
_foo:
.space 1
[lld-macho] Refactor segment/section creation, sorting, and merging Summary: There were a few issues with the previous setup: 1. The section sorting comparator used a declarative map of section names to determine the correct order, but it turns out we need to match on more than just names -- in particular, an upcoming diff will sort based on whether the S_ZERO_FILL flag is set. This diff changes the sorter to a more imperative but flexible form. 2. We were sorting OutputSections stored in a MapVector, which left the MapVector in an inconsistent state -- the wrong keys map to the wrong values! In practice, we weren't doing key lookups (only container iteration) after the sort, so this was fine, but it was still a dubious state of affairs. This diff copies the OutputSections to a vector before sorting them. 3. We were adding unneeded OutputSections to OutputSegments and then filtering them out later, which meant that we had to remember whether an OutputSegment was in a pre- or post-filtered state. This diff only adds the sections to the segments if they are needed. In addition to those major changes, two minor ones worth noting: 1. I renamed all OutputSection variable names to `osec`, to parallel `isec`. Previously we were using some inconsistent combination of `osec`, `os`, and `section`. 2. I added a check (and a test) for InputSections with names that clashed with those of our synthetic OutputSections. Reviewers: #lld-macho Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81887
2020-06-15 15:03:24 +08:00
.text
_main:
[lld-macho] Implement cstring deduplication Our implementation draws heavily from LLD-ELF's, which in turn delegates its string deduplication to llvm-mc's StringTableBuilder. The messiness of this diff is largely due to the fact that we've previously assumed that all InputSections get concatenated together to form the output. This is no longer true with CStringInputSections, which split their contents into StringPieces. StringPieces are much more lightweight than InputSections, which is important as we create a lot of them. They may also overlap in the output, which makes it possible for strings to be tail-merged. In fact, the initial version of this diff implemented tail merging, but I've dropped it for reasons I'll explain later. **Alignment Issues** Mergeable cstring literals are found under the `__TEXT,__cstring` section. In contrast to ELF, which puts strings that need different alignments into different sections, clang's Mach-O backend puts them all in one section. Strings that need to be aligned have the `.p2align` directive emitted before them, which simply translates into zero padding in the object file. I *think* ld64 extracts the desired per-string alignment from this data by preserving each string's offset from the last section-aligned address. I'm not entirely certain since it doesn't seem consistent about doing this; but perhaps this can be chalked up to cases where ld64 has to deduplicate strings with different offset/alignment combos -- it seems to pick one of their alignments to preserve. This doesn't seem correct in general; we can in fact can induce ld64 to produce a crashing binary just by linking in an additional object file that only contains cstrings and no code. See PR50563 for details. Moreover, this scheme seems rather inefficient: since unaligned and aligned strings are all put in the same section, which has a single alignment value, it doesn't seem possible to tell whether a given string doesn't have any alignment requirements. Preserving offset+alignments for strings that don't need it is wasteful. In practice, the crashes seen so far seem to stem from x86_64 SIMD operations on cstrings. X86_64 requires SIMD accesses to be 16-byte-aligned. So for now, I'm thinking of just aligning all strings to 16 bytes on x86_64. This is indeed wasteful, but implementation-wise it's simpler than preserving per-string alignment+offsets. It also avoids the aforementioned crash after deduplication of differently-aligned strings. Finally, the overhead is not huge: using 16-byte alignment (vs no alignment) is only a 0.5% size overhead when linking chromium_framework. With these alignment requirements, it doesn't make sense to attempt tail merging -- most strings will not be eligible since their overlaps aren't likely to start at a 16-byte boundary. Tail-merging (with alignment) for chromium_framework only improves size by 0.3%. It's worth noting that LLD-ELF only does tail merging at `-O2`. By default (at `-O1`), it just deduplicates w/o tail merging. @thakis has also mentioned that they saw it regress compressed size in some cases and therefore turned it off. `ld64` does not seem to do tail merging at all. **Performance Numbers** CString deduplication reduces chromium_framework from 250MB to 242MB, or about a 3.2% reduction. Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W: N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.99 4.14 4.015 4.0365 0.0492336 Difference at 95.0% confidence 0.0865 +/- 0.027245 2.18987% +/- 0.689746% (Student's t, pooled s = 0.0425673) As expected, cstring merging incurs some non-trivial overhead. When passing `--no-literal-merge`, it seems that performance is the same, i.e. the refactoring in this diff didn't cost us. N Min Max Median Avg Stddev x 20 3.91 4.03 3.935 3.95 0.034641016 + 20 3.89 4.02 3.935 3.9435 0.043197831 No difference proven at 95.0% confidence Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D102964
2021-06-08 11:47:12 +08:00
## make sure the GOT will be needed
pushq _foo@GOTPCREL(%rip)
[lld-macho] Refactor segment/section creation, sorting, and merging Summary: There were a few issues with the previous setup: 1. The section sorting comparator used a declarative map of section names to determine the correct order, but it turns out we need to match on more than just names -- in particular, an upcoming diff will sort based on whether the S_ZERO_FILL flag is set. This diff changes the sorter to a more imperative but flexible form. 2. We were sorting OutputSections stored in a MapVector, which left the MapVector in an inconsistent state -- the wrong keys map to the wrong values! In practice, we weren't doing key lookups (only container iteration) after the sort, so this was fine, but it was still a dubious state of affairs. This diff copies the OutputSections to a vector before sorting them. 3. We were adding unneeded OutputSections to OutputSegments and then filtering them out later, which meant that we had to remember whether an OutputSegment was in a pre- or post-filtered state. This diff only adds the sections to the segments if they are needed. In addition to those major changes, two minor ones worth noting: 1. I renamed all OutputSection variable names to `osec`, to parallel `isec`. Previously we were using some inconsistent combination of `osec`, `os`, and `section`. 2. I added a check (and a test) for InputSections with names that clashed with those of our synthetic OutputSections. Reviewers: #lld-macho Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81887
2020-06-15 15:03:24 +08:00
ret