llvm-project/clang/lib/CodeGen
Ties Stuij 208987844f [ARM] Follow AACPS standard for volatile bit-fields access width
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
  short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
  short a;
  int  b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.

The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```

I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
 - it won't overlap with any other non-bit-field member
 - we only access memory inside the bounds of the record
 - avoid overlapping zero-length bit-fields.

Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D72932
2020-10-13 10:31:48 +01:00
..
ABIInfo.h [ABI][NFC] Fix the confusion of ByVal and ByRef argument names 2020-08-06 15:20:18 +03:00
Address.h
BackendUtil.cpp [HWAsan][NewPM] Handle hwasan like other sanitizers 2020-10-08 14:43:21 -07:00
CGAtomic.cpp [SVE] Remove calls to VectorType::getNumElements from clang 2020-08-26 11:12:26 -07:00
CGBlocks.cpp CGBlocks.cpp - assert non-null CGF pointer. NFCI. 2020-09-16 12:30:24 +01:00
CGBlocks.h [CodeGen] Simplify the way lifetime of block captures is extended 2020-06-11 16:06:22 -07:00
CGBuilder.h Reapply "[IRBuilder] Virtualize IRBuilder" 2020-02-17 19:04:11 +01:00
CGBuiltin.cpp [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) 2020-10-13 09:28:39 +01:00
CGCUDANV.cpp [HIP] Align device binary 2020-10-02 18:10:44 -04:00
CGCUDARuntime.cpp
CGCUDARuntime.h Fix GCC warning on enum class bitfield. NFC. 2020-03-28 10:20:34 -04:00
CGCXX.cpp [Alignment][NFC] Use Align with CreateAlignedLoad 2020-01-27 10:58:36 +01:00
CGCXXABI.cpp Fix build error 2020-07-10 17:40:37 -07:00
CGCXXABI.h [CodeGen] Add public function to emit C++ destructor call. 2020-07-01 11:01:23 -07:00
CGCall.cpp [clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr. 2020-10-01 11:07:39 -04:00
CGCall.h [CodeGen] Emit destructor calls to destruct non-trivial C struct objects 2020-03-20 18:34:22 -07:00
CGClass.cpp [clang/llvm] As part of using inclusive language within 2020-06-20 16:03:58 -07:00
CGCleanup.cpp [CodeGen] Simplify the way lifetime of block captures is extended 2020-06-11 16:06:22 -07:00
CGCleanup.h Remove clang::Codegen::EHPadEndScope as unused 2020-06-23 15:18:49 -07:00
CGCoroutine.cpp [Coroutines] Do not evaluate InitListExpr of a co_return 2020-03-16 12:42:44 +08:00
CGDebugInfo.cpp Re-land [DebugInfo] Add debug location to stubs generated by CGDeclCXX and mark them as artificial 2020-10-08 20:49:17 -04:00
CGDebugInfo.h [Clang] implement -fno-eliminate-unused-debug-types 2020-08-10 15:08:48 -07:00
CGDecl.cpp [CodeGen] Make sure the EH cleanup for block captures is conditional when the block literal is in a conditional context 2020-08-31 10:12:17 -04:00
CGDeclCXX.cpp Re-land [DebugInfo] Add debug location to stubs generated by CGDeclCXX and mark them as artificial 2020-10-08 20:49:17 -04:00
CGException.cpp [Windows SEH] Fix the frame-ptr of a nested-filter within a _finally 2020-07-12 01:37:56 -07:00
CGExpr.cpp [ARM] Follow AACPS standard for volatile bit-fields access width 2020-10-13 10:31:48 +01:00
CGExprAgg.cpp attempt to fix failing buildbots after 3bab88b7ba 2020-06-15 12:58:37 +02:00
CGExprCXX.cpp [FE] Use preferred alignment instead of ABI alignment for complete object when applicable 2020-09-30 10:48:28 -04:00
CGExprComplex.cpp [clang][NFC] Store a pointer to the ASTContext in ASTDumper and TextNodeDumper 2020-07-03 13:59:22 +01:00
CGExprConstant.cpp Canonicalize declaration pointers when forming APValues. 2020-10-12 19:32:57 -07:00
CGExprScalar.cpp [PowerPC] Implement the 128-bit vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins in Clang/LLVM 2020-09-23 16:49:40 -04:00
CGGPUBuiltin.cpp [Alignment][NFC] Use Align with CreateAlignedStore 2020-01-23 17:34:32 +01:00
CGLoopInfo.cpp [Clang] Add llvm.loop.unroll.disable to loops with -fno-unroll-loops. 2020-04-07 14:01:55 +01:00
CGLoopInfo.h [Clang] Add llvm.loop.unroll.disable to loops with -fno-unroll-loops. 2020-04-07 14:01:55 +01:00
CGNonTrivialStruct.cpp [NFC] Silence compiler warning [-Wmissing-braces]. 2020-06-17 13:01:53 -07:00
CGObjC.cpp [clang] Implement objc_non_runtime_protocol to remove protocol metadata 2020-10-02 17:35:50 -04:00
CGObjCGNU.cpp [clang] Implement objc_non_runtime_protocol to remove protocol metadata 2020-10-02 17:35:50 -04:00
CGObjCMac.cpp [clang] Implement objc_non_runtime_protocol to remove protocol metadata 2020-10-02 17:35:50 -04:00
CGObjCRuntime.cpp Fix a variety of minor issues with ObjC method mangling: 2020-09-29 19:51:53 -04:00
CGObjCRuntime.h [clang] Implement objc_non_runtime_protocol to remove protocol metadata 2020-10-02 17:35:50 -04:00
CGOpenCLRuntime.cpp Fix "pointer is null" static analyzer warning. NFCI. 2020-01-08 17:19:08 +00:00
CGOpenCLRuntime.h
CGOpenMPRuntime.cpp [Clang][OpenMP] Added support for nowait target in CodeGen via regular task 2020-09-25 22:10:36 -04:00
CGOpenMPRuntime.h [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def 2020-10-08 14:00:22 -04:00
CGOpenMPRuntimeAMDGCN.cpp [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 2020-08-03 05:38:39 +00:00
CGOpenMPRuntimeAMDGCN.h [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 2020-08-03 05:38:39 +00:00
CGOpenMPRuntimeGPU.cpp [AMDGPU] Add gfx602, gfx705, gfx805 targets 2020-10-10 17:22:22 +01:00
CGOpenMPRuntimeGPU.h [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 2020-08-03 05:38:39 +00:00
CGOpenMPRuntimeNVPTX.cpp [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 2020-08-03 05:38:39 +00:00
CGOpenMPRuntimeNVPTX.h [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 2020-08-03 05:38:39 +00:00
CGRecordLayout.h [ARM] Follow AACPS standard for volatile bit-fields access width 2020-10-13 10:31:48 +01:00
CGRecordLayoutBuilder.cpp [ARM] Follow AACPS standard for volatile bit-fields access width 2020-10-13 10:31:48 +01:00
CGStmt.cpp [CodeGen] Improve likelihood branch weights 2020-10-04 14:24:27 +02:00
CGStmtOpenMP.cpp [OPENMP]Add support for allocate vars in untied tasks. 2020-09-15 13:39:14 -04:00
CGVTT.cpp
CGVTables.cpp [CodeGen] Store the return value of the target function call to the 2020-07-10 17:24:13 -07:00
CGVTables.h [clang] Frontend components for the relative vtables ABI (round 2) 2020-06-11 11:17:08 -07:00
CGValue.h [Matrix] Implement matrix index expressions ([][]). 2020-06-01 20:08:49 +01:00
CMakeLists.txt Remove dependency on clangASTMatchers. 2020-09-10 22:17:48 -04:00
CodeGenABITypes.cpp [CodeGen] Add public function to emit C++ destructor call. 2020-07-01 11:01:23 -07:00
CodeGenAction.cpp [ThinLTO] Option to bypass function importing. 2020-09-22 13:12:11 -07:00
CodeGenFunction.cpp [CodeGen] Improve likelihood branch weights 2020-10-04 14:24:27 +02:00
CodeGenFunction.h [CodeGen] Improve likelihood branch weights 2020-10-04 14:24:27 +02:00
CodeGenModule.cpp [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables 2020-10-05 12:53:59 -07:00
CodeGenModule.h [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def 2020-10-08 14:00:22 -04:00
CodeGenPGO.cpp [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions. 2020-08-10 11:01:46 -04:00
CodeGenPGO.h [CodeGenPGO] Fix shadow variable warning. NFC. 2020-03-02 15:06:34 +00:00
CodeGenTBAA.cpp Reland Implement _ExtInt as an extended int type specifier. 2020-04-17 10:45:48 -07:00
CodeGenTBAA.h
CodeGenTypeCache.h [ARM] Add __bf16 as new Bfloat16 C Type 2020-06-05 10:32:43 +01:00
CodeGenTypes.cpp [SVE] Make ElementCount members private 2020-08-28 14:43:53 +01:00
CodeGenTypes.h CodeGenTypes::CGRecordLayouts: Use unique_ptr to simplify memory management 2020-04-28 22:31:16 -07:00
ConstantEmitter.h attempt to fix failing buildbots after 3bab88b7ba 2020-06-15 12:58:37 +02:00
ConstantInitBuilder.cpp Fix ConstantAggregateBuilderBase::getRelativeOffset 2020-06-15 12:23:20 -07:00
CoverageMappingGen.cpp [Coverage] Add empty line regions to SkippedRegions 2020-09-21 12:42:53 -07:00
CoverageMappingGen.h [Coverage] Add empty line regions to SkippedRegions 2020-09-21 12:42:53 -07:00
EHScopeStack.h [CodeGen] Simplify the way lifetime of block captures is extended 2020-06-11 16:06:22 -07:00
ItaniumCXXABI.cpp [FE] Use preferred alignment instead of ABI alignment for complete object when applicable 2020-09-30 10:48:28 -04:00
MacroPPCallbacks.cpp
MacroPPCallbacks.h
MicrosoftCXXABI.cpp [MS] For unknown ISAs, pass non-trivially copyable arguments indirectly 2020-09-24 16:29:48 -07:00
ModuleBuilder.cpp reland "[DebugInfo] Support to emit debugInfo for extern variables" 2019-12-22 18:28:50 -08:00
ObjectFilePCHContainerOperations.cpp Reland "Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)" 2020-08-24 14:52:53 +02:00
PatternInit.cpp Clean up usages of asserting vector getters in Type 2020-04-13 13:01:40 -07:00
PatternInit.h
README.txt
SanitizerMetadata.cpp [Analysis/Transforms/Sanitizers] As part of using inclusive language 2020-06-20 00:42:26 -07:00
SanitizerMetadata.h [Analysis/Transforms/Sanitizers] As part of using inclusive language 2020-06-20 00:42:26 -07:00
SwiftCallingConv.cpp Teach the swift calling convention about _Atomic types 2020-08-31 07:07:25 -07:00
TargetInfo.cpp [X86] Passing union type through register 2020-10-09 11:24:29 +08:00
TargetInfo.h [CodeGen][ObjC] Mark calls to objc_unsafeClaimAutoreleasedReturnValue as 2020-08-03 13:25:25 -07:00
VarBypassDetector.cpp
VarBypassDetector.h

README.txt

IRgen optimization opportunities.

//===---------------------------------------------------------------------===//

The common pattern of
--
short x; // or char, etc
(x == 10)
--
generates an zext/sext of x which can easily be avoided.

//===---------------------------------------------------------------------===//

Bitfields accesses can be shifted to simplify masking and sign
extension. For example, if the bitfield width is 8 and it is
appropriately aligned then is is a lot shorter to just load the char
directly.

//===---------------------------------------------------------------------===//

It may be worth avoiding creation of alloca's for formal arguments
for the common situation where the argument is never written to or has
its address taken. The idea would be to begin generating code by using
the argument directly and if its address is taken or it is stored to
then generate the alloca and patch up the existing code.

In theory, the same optimization could be a win for block local
variables as long as the declaration dominates all statements in the
block.

NOTE: The main case we care about this for is for -O0 -g compile time
performance, and in that scenario we will need to emit the alloca
anyway currently to emit proper debug info. So this is blocked by
being able to emit debug information which refers to an LLVM
temporary, not an alloca.

//===---------------------------------------------------------------------===//

We should try and avoid generating basic blocks which only contain
jumps. At -O0, this penalizes us all the way from IRgen (malloc &
instruction overhead), all the way down through code generation and
assembly time.

On 176.gcc:expr.ll, it looks like over 12% of basic blocks are just
direct branches!

//===---------------------------------------------------------------------===//