For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D72932
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
Patch D16586 was updated to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Differential Revision: https://reviews.llvm.org/D72932
The following people contributed to this patch:
- Diogo Sampaio
- Ties Stuij
Summary:
Clang -fpic defaults to -fno-semantic-interposition (GCC -fpic defaults
to -fsemantic-interposition).
Users need to specify -fsemantic-interposition to get semantic
interposition behavior.
Semantic interposition is currently a best-effort feature. There may
still be some cases where it is not handled well.
Reviewers: peter.smith, rnk, serge-sans-paille, sfertile, jfb, jdoerfert
Subscribers: dschuff, jyknight, dylanmckay, nemanjai, jvesely, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, arphaman, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73865
tools/clang/test/CodeGen/packed-nest-unpacked.c contains this test:
struct XBitfield {
unsigned b1 : 10;
unsigned b2 : 12;
unsigned b3 : 10;
};
struct YBitfield {
char x;
struct XBitfield y;
} __attribute((packed));
struct YBitfield gbitfield;
unsigned test7() {
// CHECK: @test7
// CHECK: load i32, i32* getelementptr inbounds (%struct.YBitfield, %struct.YBitfield* @gbitfield, i32 0, i32 1, i32 0), align 4
return gbitfield.y.b2;
}
The "align 4" is actually wrong. Accessing all of "gbitfield.y" as a single
i32 is of course possible, but that still doesn't make it 4-byte aligned as
it remains packed at offset 1 in the surrounding gbitfield object.
This alignment was changed by commit r169489, which also introduced changes
to bitfield access code in CGExpr.cpp. Code before that change used to take
into account *both* the alignment of the field to be accessed within the
current struct, *and* the alignment of that outer struct itself; this logic
was removed by the above commit.
Neglecting to consider both values can cause incorrect code to be generated
(I've seen an unaligned access crash on SystemZ due to this bug).
In order to always use the best known alignment value, this patch removes
the CGBitFieldInfo::StorageAlignment member and replaces it with a
StorageOffset member specifying the offset from the start of the surrounding
struct to the bitfield's underlying storage. This offset can then be combined
with the best-known alignment for a bitfield access lvalue to determine the
alignment to use when accessing the bitfield's storage.
Differential Revision: http://reviews.llvm.org/D11034
llvm-svn: 241916
CGRecordLayoutBuilder was aging, complex, multi-pass, and shows signs of
existing before ASTRecordLayoutBuilder. It redundantly performed many
layout operations that are now performed by ASTRecordLayoutBuilder and
asserted that the results were the same. With the addition of support
for the MS-ABI, such as placement of vbptrs, vtordisps, different
bitfield layout and a variety of other features, CGRecordLayoutBuilder
was growing unwieldy in its redundancy.
This patch re-architects CGRecordLayoutBuilder to not perform any
redundant layout but rather, as directly as possible, lower an
ASTRecordLayout to an llvm::type. The new architecture is significantly
smaller and simpler than the CGRecordLayoutBuilder and contains fewer
ABI-specific code paths. It's also one pass.
The architecture of the new system is described in the comments. For the
most part, the new system simply takes all of the fields and bases from
an ASTRecordLayout, sorts them, inserts padding and dumps a record.
Bitfields, unions and primary virtual bases make this process a bit more
complicated. See the inline comments.
In addition, this patch updates a few lit tests due to the fact that the
new system computes more accurate llvm types than CGRecordLayoutBuilder.
Each change is commented individually in the review.
Differential Revision: http://llvm-reviews.chandlerc.com/D2795
llvm-svn: 201907
Indents were given the color blue when outputting with color.
AST dumping now looks like this:
Node
|-Node
| `-Node
`-Node
`-Node
Compared to the previous:
(Node
(Node
(Node))
(Node
(Node)))
llvm-svn: 174022
generally support the C++11 memory model requirements for bitfield
accesses by relying more heavily on LLVM's memory model.
The primary change this introduces is to move from a manually aligned
and strided access pattern across the bits of the bitfield to a much
simpler lump access of all bits in the bitfield followed by math to
extract the bits relevant for the particular field.
This simplifies the code significantly, but relies on LLVM to
intelligently lowering these integers.
I have tested LLVM's lowering both synthetically and in benchmarks. The
lowering appears to be functional, and there are no really significant
performance regressions. Different code patterns accessing bitfields
will vary in how this impacts them. The only real regressions I'm seeing
are a few patterns where the LLVM code generation for loads that feed
directly into a mask operation don't take advantage of the x86 ability
to do a smaller load and a cheap zero-extension. This doesn't regress
any benchmark in the nightly test suite on my box past the noise
threshold, but my box is quite noisy. I'll be watching the LNT numbers,
and will look into further improvements to the LLVM lowering as needed.
llvm-svn: 169489
Make CGT defer to the ABI on all member pointer types.
This requires giving CGT a handle to the ABI.
It's way easier to make that work if we avoid lazily creating the ABI.
Make it so.
llvm-svn: 111786
- This fixes some pedantic bugs with packed structures, as well as major problems with -fno-bitfield-type-align.
- Fixes PR5591, PR5567, and all known -fno-bitfield-type-align issues.
- Review appreciated.
llvm-svn: 102045
- Sadly, this doesn't seem to give any .ll size win so far. It is possible to make this routine significantly smarter & avoid various shifting, masking, and zext/sext, but I'm not really convinced it is worth it. It is tricky, and this is really instcombine's job.
- No intended functionality change; the test case is just to increase coverage & serves as a demo file, it worked before this commit.
The new fixes from r101222 are:
1. The shift to the target position needs to occur after the value is extended to the correct size. This broke Clang bootstrap, among other things no doubt.
2. Swap the order of arguments to OR, to get a tad more constant folding.
llvm-svn: 101339
- Sadly, this doesn't seem to give any .ll size win so far. It is possible to make this routine significantly smarter & avoid various shifting, masking, and zext/sext, but I'm not really convinced it is worth it. It is tricky, and this is really instcombine's job.
- No intended functionality change; the test case is just to increase coverage & serves as a demo file, it worked before this commit.
llvm-svn: 101222