forked from OSchip/llvm-project
![]() To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before. Many thanks to Eli Friedman and Sam Parker for all their help with this work. This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask. Differential Revision: https://reviews.llvm.org/D79175 |
||
---|---|---|
.. | ||
Analysis | ||
AsmParser | ||
BinaryFormat | ||
Bitcode | ||
Bitstream | ||
CodeGen | ||
DWARFLinker | ||
DebugInfo | ||
Demangle | ||
ExecutionEngine | ||
Extensions | ||
Frontend | ||
FuzzMutate | ||
Fuzzer | ||
IR | ||
IRReader | ||
LTO | ||
LineEditor | ||
Linker | ||
MC | ||
MCA | ||
Object | ||
ObjectYAML | ||
Option | ||
Passes | ||
ProfileData | ||
Remarks | ||
Support | ||
TableGen | ||
Target | ||
Testing | ||
TextAPI | ||
ToolDrivers | ||
Transforms | ||
WindowsManifest | ||
XRay | ||
CMakeLists.txt | ||
LLVMBuild.txt |