forked from OSchip/llvm-project
[ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729
This commit is contained in:
parent
b68bae6a94
commit
fad70c3068
|
@ -15965,6 +15965,46 @@ set up the hardware-loop count with a target specific instruction, usually a
|
|||
move of this value to a special register or a hardware-loop instruction.
|
||||
The result is the conditional value of whether the given count is not zero.
|
||||
|
||||
|
||||
'``llvm.test.start.loop.iterations.*``' Intrinsic
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax:
|
||||
"""""""
|
||||
|
||||
This is an overloaded intrinsic.
|
||||
|
||||
::
|
||||
|
||||
declare {i32, i1} @llvm.test.start.loop.iterations.i32(i32)
|
||||
declare {i64, i1} @llvm.test.start.loop.iterations.i64(i64)
|
||||
|
||||
Overview:
|
||||
"""""""""
|
||||
|
||||
The '``llvm.test.start.loop.iterations.*``' intrinsics are similar to the
|
||||
'``llvm.test.set.loop.iterations.*``' and '``llvm.start.loop.iterations.*``'
|
||||
intrinsics, used to specify the hardware-loop trip count, but also produce a
|
||||
value identical to the input that can be used as the input to the loop. The
|
||||
second i1 output controls entry to a while-loop.
|
||||
|
||||
Arguments:
|
||||
""""""""""
|
||||
|
||||
The integer operand is the loop trip count of the hardware-loop, and thus
|
||||
not e.g. the loop back-edge taken count.
|
||||
|
||||
Semantics:
|
||||
""""""""""
|
||||
|
||||
The '``llvm.test.start.loop.iterations.*``' intrinsics do not perform any
|
||||
arithmetic on their operand. It's a hint to the backend that can use this to
|
||||
set up the hardware-loop count with a target specific instruction, usually a
|
||||
move of this value to a special register or a hardware-loop instruction.
|
||||
The result is a pair of the input and a conditional value of whether the
|
||||
given count is not zero.
|
||||
|
||||
|
||||
'``llvm.loop.decrement.reg.*``' Intrinsic
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
|
@ -1601,6 +1601,12 @@ def int_start_loop_iterations :
|
|||
def int_test_set_loop_iterations :
|
||||
DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_anyint_ty], [IntrNoDuplicate]>;
|
||||
|
||||
// Same as the above, but produces an extra value (the same as the input
|
||||
// operand) to be fed into the loop.
|
||||
def int_test_start_loop_iterations :
|
||||
DefaultAttrsIntrinsic<[llvm_anyint_ty, llvm_i1_ty], [LLVMMatchType<0>],
|
||||
[IntrNoDuplicate]>;
|
||||
|
||||
// Decrement loop counter by the given argument. Return false if the loop
|
||||
// should exit.
|
||||
def int_loop_decrement :
|
||||
|
|
|
@ -434,25 +434,32 @@ Value* HardwareLoop::InsertIterationSetup(Value *LoopCountInit) {
|
|||
IRBuilder<> Builder(BeginBB->getTerminator());
|
||||
Type *Ty = LoopCountInit->getType();
|
||||
bool UsePhi = UsePHICounter || ForceHardwareLoopPHI;
|
||||
Intrinsic::ID ID = UseLoopGuard ? Intrinsic::test_set_loop_iterations
|
||||
: (UsePhi ? Intrinsic::start_loop_iterations
|
||||
: Intrinsic::set_loop_iterations);
|
||||
Intrinsic::ID ID = UseLoopGuard
|
||||
? (UsePhi ? Intrinsic::test_start_loop_iterations
|
||||
: Intrinsic::test_set_loop_iterations)
|
||||
: (UsePhi ? Intrinsic::start_loop_iterations
|
||||
: Intrinsic::set_loop_iterations);
|
||||
Function *LoopIter = Intrinsic::getDeclaration(M, ID, Ty);
|
||||
Value *SetCount = Builder.CreateCall(LoopIter, LoopCountInit);
|
||||
Value *LoopSetup = Builder.CreateCall(LoopIter, LoopCountInit);
|
||||
|
||||
// Use the return value of the intrinsic to control the entry of the loop.
|
||||
if (UseLoopGuard) {
|
||||
assert((isa<BranchInst>(BeginBB->getTerminator()) &&
|
||||
cast<BranchInst>(BeginBB->getTerminator())->isConditional()) &&
|
||||
"Expected conditional branch");
|
||||
|
||||
Value *SetCount =
|
||||
UsePhi ? Builder.CreateExtractValue(LoopSetup, 1) : LoopSetup;
|
||||
auto *LoopGuard = cast<BranchInst>(BeginBB->getTerminator());
|
||||
LoopGuard->setCondition(SetCount);
|
||||
if (LoopGuard->getSuccessor(0) != L->getLoopPreheader())
|
||||
LoopGuard->swapSuccessors();
|
||||
}
|
||||
LLVM_DEBUG(dbgs() << "HWLoops: Inserted loop counter: "
|
||||
<< *SetCount << "\n");
|
||||
return UseLoopGuard ? LoopCountInit : SetCount;
|
||||
LLVM_DEBUG(dbgs() << "HWLoops: Inserted loop counter: " << *LoopSetup
|
||||
<< "\n");
|
||||
if (UsePhi && UseLoopGuard)
|
||||
LoopSetup = Builder.CreateExtractValue(LoopSetup, 0);
|
||||
return !UsePhi ? LoopCountInit : LoopSetup;
|
||||
}
|
||||
|
||||
void HardwareLoop::InsertLoopDec() {
|
||||
|
|
|
@ -6126,8 +6126,8 @@ ARMBaseInstrInfo::getOutliningType(MachineBasicBlock::iterator &MIT,
|
|||
// Be conservative with ARMv8.1 MVE instructions.
|
||||
if (Opc == ARM::t2BF_LabelPseudo || Opc == ARM::t2DoLoopStart ||
|
||||
Opc == ARM::t2DoLoopStartTP || Opc == ARM::t2WhileLoopStart ||
|
||||
Opc == ARM::t2LoopDec || Opc == ARM::t2LoopEnd ||
|
||||
Opc == ARM::t2LoopEndDec)
|
||||
Opc == ARM::t2WhileLoopStartLR || Opc == ARM::t2LoopDec ||
|
||||
Opc == ARM::t2LoopEnd || Opc == ARM::t2LoopEndDec)
|
||||
return outliner::InstrType::Illegal;
|
||||
|
||||
const MCInstrDesc &MCID = MI.getDesc();
|
||||
|
|
|
@ -646,7 +646,8 @@ static inline bool isJumpTableBranchOpcode(int Opc) {
|
|||
|
||||
static inline bool isLowOverheadTerminatorOpcode(int Opc) {
|
||||
return Opc == ARM::t2DoLoopStartTP || Opc == ARM::t2WhileLoopStart ||
|
||||
Opc == ARM::t2LoopEnd || Opc == ARM::t2LoopEndDec;
|
||||
Opc == ARM::t2WhileLoopStartLR || Opc == ARM::t2LoopEnd ||
|
||||
Opc == ARM::t2LoopEndDec;
|
||||
}
|
||||
|
||||
static inline
|
||||
|
|
|
@ -83,9 +83,9 @@ bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
|
|||
continue;
|
||||
|
||||
for (auto &Terminator : Preheader->terminators()) {
|
||||
if (Terminator.getOpcode() != ARM::t2WhileLoopStart)
|
||||
if (Terminator.getOpcode() != ARM::t2WhileLoopStartLR)
|
||||
continue;
|
||||
MachineBasicBlock *LoopExit = Terminator.getOperand(1).getMBB();
|
||||
MachineBasicBlock *LoopExit = Terminator.getOperand(2).getMBB();
|
||||
// We don't want to move the function's entry block.
|
||||
if (!LoopExit->getPrevNode())
|
||||
continue;
|
||||
|
@ -99,7 +99,7 @@ bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
|
|||
// that were previously not backwards to become backwards
|
||||
bool CanMove = true;
|
||||
for (auto &LoopExitTerminator : LoopExit->terminators()) {
|
||||
if (LoopExitTerminator.getOpcode() != ARM::t2WhileLoopStart)
|
||||
if (LoopExitTerminator.getOpcode() != ARM::t2WhileLoopStartLR)
|
||||
continue;
|
||||
// An example loop structure where the LoopExit can't be moved, since
|
||||
// bb1's WLS will become backwards once it's moved after bb3 bb1: -
|
||||
|
@ -111,7 +111,7 @@ bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
|
|||
// WLS bb1
|
||||
// bb4: - Header
|
||||
MachineBasicBlock *LoopExit2 =
|
||||
LoopExitTerminator.getOperand(1).getMBB();
|
||||
LoopExitTerminator.getOperand(2).getMBB();
|
||||
// If the WLS from LoopExit to LoopExit2 is already backwards then
|
||||
// moving LoopExit won't affect it, so it can be moved. If LoopExit2 is
|
||||
// after the Preheader then moving will keep it as a forward branch, so
|
||||
|
|
|
@ -3778,13 +3778,26 @@ void ARMDAGToDAGISel::Select(SDNode *N) {
|
|||
return;
|
||||
// Other cases are autogenerated.
|
||||
break;
|
||||
case ARMISD::WLS:
|
||||
case ARMISD::WLSSETUP: {
|
||||
SDNode *New = CurDAG->getMachineNode(ARM::t2WhileLoopSetup, dl, MVT::i32,
|
||||
N->getOperand(0));
|
||||
ReplaceUses(N, New);
|
||||
CurDAG->RemoveDeadNode(N);
|
||||
return;
|
||||
}
|
||||
case ARMISD::WLS: {
|
||||
SDNode *New = CurDAG->getMachineNode(ARM::t2WhileLoopStart, dl, MVT::Other,
|
||||
N->getOperand(1), N->getOperand(2),
|
||||
N->getOperand(0));
|
||||
ReplaceUses(N, New);
|
||||
CurDAG->RemoveDeadNode(N);
|
||||
return;
|
||||
}
|
||||
case ARMISD::LE: {
|
||||
SDValue Ops[] = { N->getOperand(1),
|
||||
N->getOperand(2),
|
||||
N->getOperand(0) };
|
||||
unsigned Opc = N->getOpcode() == ARMISD::WLS ?
|
||||
ARM::t2WhileLoopStart : ARM::t2LoopEnd;
|
||||
unsigned Opc = ARM::t2LoopEnd;
|
||||
SDNode *New = CurDAG->getMachineNode(Opc, dl, MVT::Other, Ops);
|
||||
ReplaceUses(N, New);
|
||||
CurDAG->RemoveDeadNode(N);
|
||||
|
|
|
@ -1806,6 +1806,7 @@ const char *ARMTargetLowering::getTargetNodeName(unsigned Opcode) const {
|
|||
case ARMISD::VST3LN_UPD: return "ARMISD::VST3LN_UPD";
|
||||
case ARMISD::VST4LN_UPD: return "ARMISD::VST4LN_UPD";
|
||||
case ARMISD::WLS: return "ARMISD::WLS";
|
||||
case ARMISD::WLSSETUP: return "ARMISD::WLSSETUP";
|
||||
case ARMISD::LE: return "ARMISD::LE";
|
||||
case ARMISD::LOOP_DEC: return "ARMISD::LOOP_DEC";
|
||||
case ARMISD::CSINV: return "ARMISD::CSINV";
|
||||
|
@ -16193,7 +16194,7 @@ static SDValue SearchLoopIntrinsic(SDValue N, ISD::CondCode &CC, int &Imm,
|
|||
}
|
||||
case ISD::INTRINSIC_W_CHAIN: {
|
||||
unsigned IntOp = cast<ConstantSDNode>(N.getOperand(1))->getZExtValue();
|
||||
if (IntOp != Intrinsic::test_set_loop_iterations &&
|
||||
if (IntOp != Intrinsic::test_start_loop_iterations &&
|
||||
IntOp != Intrinsic::loop_decrement_reg)
|
||||
return SDValue();
|
||||
return N;
|
||||
|
@ -16208,7 +16209,7 @@ static SDValue PerformHWLoopCombine(SDNode *N,
|
|||
|
||||
// The hwloop intrinsics that we're interested are used for control-flow,
|
||||
// either for entering or exiting the loop:
|
||||
// - test.set.loop.iterations will test whether its operand is zero. If it
|
||||
// - test.start.loop.iterations will test whether its operand is zero. If it
|
||||
// is zero, the proceeding branch should not enter the loop.
|
||||
// - loop.decrement.reg also tests whether its operand is zero. If it is
|
||||
// zero, the proceeding branch should not branch back to the beginning of
|
||||
|
@ -16283,21 +16284,25 @@ static SDValue PerformHWLoopCombine(SDNode *N,
|
|||
DAG.ReplaceAllUsesOfValueWith(SDValue(Br, 0), NewBr);
|
||||
};
|
||||
|
||||
if (IntOp == Intrinsic::test_set_loop_iterations) {
|
||||
if (IntOp == Intrinsic::test_start_loop_iterations) {
|
||||
SDValue Res;
|
||||
SDValue Setup = DAG.getNode(ARMISD::WLSSETUP, dl, MVT::i32, Elements);
|
||||
// We expect this 'instruction' to branch when the counter is zero.
|
||||
if (IsTrueIfZero(CC, Imm)) {
|
||||
SDValue Ops[] = { Chain, Elements, Dest };
|
||||
SDValue Ops[] = {Chain, Setup, Dest};
|
||||
Res = DAG.getNode(ARMISD::WLS, dl, MVT::Other, Ops);
|
||||
} else {
|
||||
// The logic is the reverse of what we need for WLS, so find the other
|
||||
// basic block target: the target of the proceeding br.
|
||||
UpdateUncondBr(Br, Dest, DAG);
|
||||
|
||||
SDValue Ops[] = { Chain, Elements, OtherTarget };
|
||||
SDValue Ops[] = {Chain, Setup, OtherTarget};
|
||||
Res = DAG.getNode(ARMISD::WLS, dl, MVT::Other, Ops);
|
||||
}
|
||||
DAG.ReplaceAllUsesOfValueWith(Int.getValue(1), Int.getOperand(0));
|
||||
// Update LR count to the new value
|
||||
DAG.ReplaceAllUsesOfValueWith(Int.getValue(0), Setup);
|
||||
// Update chain
|
||||
DAG.ReplaceAllUsesOfValueWith(Int.getValue(2), Int.getOperand(0));
|
||||
return Res;
|
||||
} else {
|
||||
SDValue Size = DAG.getTargetConstant(
|
||||
|
|
|
@ -130,7 +130,8 @@ class VectorType;
|
|||
WIN__CHKSTK, // Windows' __chkstk call to do stack probing.
|
||||
WIN__DBZCHK, // Windows' divide by zero check
|
||||
|
||||
WLS, // Low-overhead loops, While Loop Start
|
||||
WLS, // Low-overhead loops, While Loop Start branch. See t2WhileLoopStart
|
||||
WLSSETUP, // Setup for the iteration count of a WLS. See t2WhileLoopSetup.
|
||||
LOOP_DEC, // Really a part of LE, performs the sub
|
||||
LE, // Low-overhead loops, Loop End
|
||||
|
||||
|
|
|
@ -5457,33 +5457,65 @@ def t2LE : t2LOL<(outs ), (ins lelabel_u11:$label), "le", "$label"> {
|
|||
|
||||
let Predicates = [IsThumb2, HasV8_1MMainline, HasLOB] in {
|
||||
|
||||
// t2DoLoopStart a pseudo for DLS hardware loops. Lowered into a DLS in
|
||||
// ARMLowOverheadLoops if possible, or reverted to a Mov if not.
|
||||
let usesCustomInserter = 1 in
|
||||
def t2DoLoopStart :
|
||||
t2PseudoInst<(outs GPRlr:$X), (ins rGPR:$elts), 4, IIC_Br,
|
||||
[(set GPRlr:$X, (int_start_loop_iterations rGPR:$elts))]>;
|
||||
|
||||
// A pseudo for a DLSTP, created in the MVETPAndVPTOptimizationPass from a
|
||||
// t2DoLoopStart if the loops is tail predicated. Holds both the element
|
||||
// count and trip count of the loop, picking the correct one during
|
||||
// ARMLowOverheadLoops when it is converted to a DLSTP or DLS as required.
|
||||
let isTerminator = 1, hasSideEffects = 1 in
|
||||
def t2DoLoopStartTP :
|
||||
t2PseudoInst<(outs GPRlr:$X), (ins rGPR:$elts, rGPR:$count), 4, IIC_Br, []>;
|
||||
|
||||
// Setup for a t2WhileLoopStart. A pair of t2WhileLoopSetup and t2WhileLoopStart
|
||||
// will be created post-ISel from a llvm.test.start.loop.iterations. This
|
||||
// t2WhileLoopSetup to setup LR and t2WhileLoopStart to perform the branch. Not
|
||||
// valid after reg alloc, as it should be lowered during MVETPAndVPTOptimisations
|
||||
// into a t2WhileLoopStartLR (or expanded).
|
||||
def t2WhileLoopSetup :
|
||||
t2PseudoInst<(outs GPRlr:$lr), (ins rGPR:$elts), 4, IIC_Br, []>;
|
||||
|
||||
// A pseudo to represent the decrement in a low overhead loop. A t2LoopDec and
|
||||
// t2LoopEnd together represent a LE instruction. Ideally these are converted
|
||||
// to a t2LoopEndDec which is lowered as a single instruction.
|
||||
let hasSideEffects = 0 in
|
||||
def t2LoopDec :
|
||||
t2PseudoInst<(outs GPRlr:$Rm), (ins GPRlr:$Rn, imm0_7:$size),
|
||||
4, IIC_Br, []>, Sched<[WriteBr]>;
|
||||
|
||||
let isBranch = 1, isTerminator = 1, hasSideEffects = 1, Defs = [CPSR] in {
|
||||
// Set WhileLoopStart and LoopEnd to occupy 8 bytes because they may
|
||||
// get converted into t2CMP and t2Bcc.
|
||||
// The branch in a t2WhileLoopSetup/t2WhileLoopStart pair, eventually turned
|
||||
// into a t2WhileLoopStartLR that does both the LR setup and branch.
|
||||
def t2WhileLoopStart :
|
||||
t2PseudoInst<(outs),
|
||||
(ins GPRlr:$elts, brtarget:$target),
|
||||
4, IIC_Br, []>,
|
||||
Sched<[WriteBr]>;
|
||||
|
||||
// WhileLoopStartLR that sets up LR and branches on zero, equivalent to WLS. It
|
||||
// is lowered in the ARMLowOverheadLoops pass providing the branches are within
|
||||
// range. WhileLoopStartLR and LoopEnd to occupy 8 bytes because they may get
|
||||
// converted into t2CMP and t2Bcc.
|
||||
def t2WhileLoopStartLR :
|
||||
t2PseudoInst<(outs GPRlr:$lr),
|
||||
(ins rGPR:$elts, brtarget:$target),
|
||||
8, IIC_Br, []>,
|
||||
Sched<[WriteBr]>;
|
||||
|
||||
// t2LoopEnd - the branch half of a t2LoopDec/t2LoopEnd pair.
|
||||
def t2LoopEnd :
|
||||
t2PseudoInst<(outs), (ins GPRlr:$elts, brtarget:$target),
|
||||
8, IIC_Br, []>, Sched<[WriteBr]>;
|
||||
|
||||
// The combination of a t2LoopDec and t2LoopEnd, performing both the LR
|
||||
// decrement and branch as a single instruction. Is lowered to a LE or
|
||||
// LETP in ARMLowOverheadLoops as appropriate, or converted to t2CMP/t2Bcc
|
||||
// if the branches are out of range.
|
||||
def t2LoopEndDec :
|
||||
t2PseudoInst<(outs GPRlr:$Rm), (ins GPRlr:$elts, brtarget:$target),
|
||||
8, IIC_Br, []>, Sched<[WriteBr]>;
|
||||
|
|
|
@ -31,7 +31,7 @@
|
|||
/// during the transform and pseudo instructions are replaced by real ones. In
|
||||
/// some cases, when we have to revert to a 'normal' loop, we have to introduce
|
||||
/// multiple instructions for a single pseudo (see RevertWhile and
|
||||
/// RevertLoopEnd). To handle this situation, t2WhileLoopStart and t2LoopEnd
|
||||
/// RevertLoopEnd). To handle this situation, t2WhileLoopStartLR and t2LoopEnd
|
||||
/// are defined to be as large as this maximum sequence of replacement
|
||||
/// instructions.
|
||||
///
|
||||
|
@ -102,7 +102,7 @@ static bool shouldInspect(MachineInstr &MI) {
|
|||
}
|
||||
|
||||
static bool isDo(MachineInstr *MI) {
|
||||
return MI->getOpcode() != ARM::t2WhileLoopStart;
|
||||
return MI->getOpcode() != ARM::t2WhileLoopStartLR;
|
||||
}
|
||||
|
||||
namespace {
|
||||
|
@ -442,7 +442,7 @@ namespace {
|
|||
MachineOperand &getLoopStartOperand() {
|
||||
if (IsTailPredicationLegal())
|
||||
return TPNumElements;
|
||||
return isDo(Start) ? Start->getOperand(1) : Start->getOperand(0);
|
||||
return Start->getOperand(1);
|
||||
}
|
||||
|
||||
unsigned getStartOpcode() const {
|
||||
|
@ -1064,53 +1064,20 @@ void LowOverheadLoop::Validate(ARMBasicBlockUtils *BBUtils) {
|
|||
return false;
|
||||
}
|
||||
|
||||
if (Start->getOpcode() == ARM::t2WhileLoopStart &&
|
||||
if (Start->getOpcode() == ARM::t2WhileLoopStartLR &&
|
||||
(BBUtils->getOffsetOf(Start) >
|
||||
BBUtils->getOffsetOf(Start->getOperand(1).getMBB()) ||
|
||||
!BBUtils->isBBInRange(Start, Start->getOperand(1).getMBB(), 4094))) {
|
||||
BBUtils->getOffsetOf(Start->getOperand(2).getMBB()) ||
|
||||
!BBUtils->isBBInRange(Start, Start->getOperand(2).getMBB(), 4094))) {
|
||||
LLVM_DEBUG(dbgs() << "ARM Loops: WLS offset is out-of-range!\n");
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
};
|
||||
|
||||
// Find a suitable position to insert the loop start instruction. It needs to
|
||||
// be able to safely define LR.
|
||||
auto FindStartInsertionPoint = [](MachineInstr *Start, MachineInstr *Dec,
|
||||
MachineBasicBlock::iterator &InsertPt,
|
||||
MachineBasicBlock *&InsertBB,
|
||||
ReachingDefAnalysis &RDA,
|
||||
InstSet &ToRemove) {
|
||||
// For a t2DoLoopStart it is always valid to use the start insertion point.
|
||||
// For WLS we can define LR if LR already contains the same value.
|
||||
if (isDo(Start) || Start->getOperand(0).getReg() == ARM::LR) {
|
||||
InsertPt = MachineBasicBlock::iterator(Start);
|
||||
InsertBB = Start->getParent();
|
||||
return true;
|
||||
}
|
||||
|
||||
// We've found no suitable LR def and Start doesn't use LR directly. Can we
|
||||
// just define LR anyway?
|
||||
if (!RDA.isSafeToDefRegAt(Start, MCRegister::from(ARM::LR)))
|
||||
return false;
|
||||
|
||||
InsertPt = MachineBasicBlock::iterator(Start);
|
||||
InsertBB = Start->getParent();
|
||||
return true;
|
||||
};
|
||||
|
||||
if (!FindStartInsertionPoint(Start, Dec, StartInsertPt, StartInsertBB, RDA,
|
||||
ToRemove)) {
|
||||
LLVM_DEBUG(dbgs() << "ARM Loops: Unable to find safe insertion point.\n");
|
||||
Revert = true;
|
||||
return;
|
||||
}
|
||||
LLVM_DEBUG(if (StartInsertPt == StartInsertBB->end())
|
||||
dbgs() << "ARM Loops: Will insert LoopStart at end of block\n";
|
||||
else
|
||||
dbgs() << "ARM Loops: Will insert LoopStart at "
|
||||
<< *StartInsertPt
|
||||
);
|
||||
StartInsertPt = MachineBasicBlock::iterator(Start);
|
||||
StartInsertBB = Start->getParent();
|
||||
LLVM_DEBUG(dbgs() << "ARM Loops: Will insert LoopStart at "
|
||||
<< *StartInsertPt);
|
||||
|
||||
Revert = !ValidateRanges(Start, End, BBUtils, ML);
|
||||
CannotTailPredicate = !ValidateTailPredicate();
|
||||
|
@ -1317,6 +1284,9 @@ bool ARMLowOverheadLoops::ProcessLoop(MachineLoop *ML) {
|
|||
return false;
|
||||
}
|
||||
|
||||
assert(LoLoop.Start->getOpcode() != ARM::t2WhileLoopStart &&
|
||||
"Expected t2WhileLoopStart to be removed before regalloc!");
|
||||
|
||||
// Check that the only instruction using LoopDec is LoopEnd. This can only
|
||||
// happen when the Dec and End are separate, not a single t2LoopEndDec.
|
||||
// TODO: Check for copy chains that really have no effect.
|
||||
|
@ -1339,11 +1309,11 @@ bool ARMLowOverheadLoops::ProcessLoop(MachineLoop *ML) {
|
|||
// another low register.
|
||||
void ARMLowOverheadLoops::RevertWhile(MachineInstr *MI) const {
|
||||
LLVM_DEBUG(dbgs() << "ARM Loops: Reverting to cmp: " << *MI);
|
||||
MachineBasicBlock *DestBB = MI->getOperand(1).getMBB();
|
||||
MachineBasicBlock *DestBB = MI->getOperand(2).getMBB();
|
||||
unsigned BrOpc = BBUtils->isBBInRange(MI, DestBB, 254) ?
|
||||
ARM::tBcc : ARM::t2Bcc;
|
||||
|
||||
RevertWhileLoopStart(MI, TII, BrOpc);
|
||||
RevertWhileLoopStartLR(MI, TII, BrOpc);
|
||||
}
|
||||
|
||||
void ARMLowOverheadLoops::RevertDo(MachineInstr *MI) const {
|
||||
|
@ -1478,7 +1448,7 @@ MachineInstr* ARMLowOverheadLoops::ExpandLoopStart(LowOverheadLoop &LoLoop) {
|
|||
MIB.addDef(ARM::LR);
|
||||
MIB.add(Count);
|
||||
if (!isDo(Start))
|
||||
MIB.add(Start->getOperand(1));
|
||||
MIB.add(Start->getOperand(2));
|
||||
|
||||
LLVM_DEBUG(dbgs() << "ARM Loops: Inserted start: " << *MIB);
|
||||
NewStart = &*MIB;
|
||||
|
@ -1657,7 +1627,7 @@ void ARMLowOverheadLoops::Expand(LowOverheadLoop &LoLoop) {
|
|||
};
|
||||
|
||||
if (LoLoop.Revert) {
|
||||
if (LoLoop.Start->getOpcode() == ARM::t2WhileLoopStart)
|
||||
if (LoLoop.Start->getOpcode() == ARM::t2WhileLoopStartLR)
|
||||
RevertWhile(LoLoop.Start);
|
||||
else
|
||||
RevertDo(LoLoop.Start);
|
||||
|
@ -1728,7 +1698,7 @@ bool ARMLowOverheadLoops::RevertNonLoops() {
|
|||
Changed = true;
|
||||
|
||||
for (auto *Start : Starts) {
|
||||
if (Start->getOpcode() == ARM::t2WhileLoopStart)
|
||||
if (Start->getOpcode() == ARM::t2WhileLoopStartLR)
|
||||
RevertWhile(Start);
|
||||
else
|
||||
RevertDo(Start);
|
||||
|
|
|
@ -1870,7 +1870,7 @@ bool ARMTTIImpl::isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
|
|||
default:
|
||||
break;
|
||||
case Intrinsic::start_loop_iterations:
|
||||
case Intrinsic::test_set_loop_iterations:
|
||||
case Intrinsic::test_start_loop_iterations:
|
||||
case Intrinsic::loop_decrement:
|
||||
case Intrinsic::loop_decrement_reg:
|
||||
return true;
|
||||
|
|
|
@ -64,6 +64,7 @@ public:
|
|||
}
|
||||
|
||||
private:
|
||||
bool LowerWhileLoopStart(MachineLoop *ML);
|
||||
bool MergeLoopEnd(MachineLoop *ML);
|
||||
bool ConvertTailPredLoop(MachineLoop *ML, MachineDominatorTree *DT);
|
||||
MachineInstr &ReplaceRegisterUseWithVPNOT(MachineBasicBlock &MBB,
|
||||
|
@ -164,7 +165,9 @@ static bool findLoopComponents(MachineLoop *ML, MachineRegisterInfo *MRI,
|
|||
? LoopPhi->getOperand(3).getReg()
|
||||
: LoopPhi->getOperand(1).getReg();
|
||||
LoopStart = LookThroughCOPY(MRI->getVRegDef(StartReg), MRI);
|
||||
if (!LoopStart || LoopStart->getOpcode() != ARM::t2DoLoopStart) {
|
||||
if (!LoopStart || (LoopStart->getOpcode() != ARM::t2DoLoopStart &&
|
||||
LoopStart->getOpcode() != ARM::t2WhileLoopSetup &&
|
||||
LoopStart->getOpcode() != ARM::t2WhileLoopStartLR)) {
|
||||
LLVM_DEBUG(dbgs() << " didn't find Start where we expected!\n");
|
||||
return false;
|
||||
}
|
||||
|
@ -173,6 +176,82 @@ static bool findLoopComponents(MachineLoop *ML, MachineRegisterInfo *MRI,
|
|||
return true;
|
||||
}
|
||||
|
||||
static void RevertWhileLoopSetup(MachineInstr *MI, const TargetInstrInfo *TII) {
|
||||
MachineBasicBlock *MBB = MI->getParent();
|
||||
assert(MI->getOpcode() == ARM::t2WhileLoopSetup &&
|
||||
"Only expected a t2WhileLoopSetup in RevertWhileLoopStart!");
|
||||
|
||||
// Subs
|
||||
MachineInstrBuilder MIB =
|
||||
BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(ARM::t2SUBri));
|
||||
MIB.add(MI->getOperand(0));
|
||||
MIB.add(MI->getOperand(1));
|
||||
MIB.addImm(0);
|
||||
MIB.addImm(ARMCC::AL);
|
||||
MIB.addReg(ARM::NoRegister);
|
||||
MIB.addReg(ARM::CPSR, RegState::Define);
|
||||
|
||||
// Attempt to find a t2WhileLoopStart and revert to a t2Bcc.
|
||||
for (MachineInstr &I : MBB->terminators()) {
|
||||
if (I.getOpcode() == ARM::t2WhileLoopStart) {
|
||||
MachineInstrBuilder MIB =
|
||||
BuildMI(*MBB, &I, I.getDebugLoc(), TII->get(ARM::t2Bcc));
|
||||
MIB.add(MI->getOperand(1)); // branch target
|
||||
MIB.addImm(ARMCC::EQ);
|
||||
MIB.addReg(ARM::CPSR);
|
||||
I.eraseFromParent();
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
MI->eraseFromParent();
|
||||
}
|
||||
|
||||
// The Hardware Loop insertion and ISel Lowering produce the pseudos for the
|
||||
// start of a while loop:
|
||||
// %a:gprlr = t2WhileLoopSetup %Cnt
|
||||
// t2WhileLoopStart %a, %BB
|
||||
// We want to convert those to a single instruction which, like t2LoopEndDec and
|
||||
// t2DoLoopStartTP is both a terminator and produces a value:
|
||||
// %a:grplr: t2WhileLoopStartLR %Cnt, %BB
|
||||
//
|
||||
// Otherwise if we can't, we revert the loop. t2WhileLoopSetup and
|
||||
// t2WhileLoopStart are not valid past regalloc.
|
||||
bool MVETPAndVPTOptimisations::LowerWhileLoopStart(MachineLoop *ML) {
|
||||
LLVM_DEBUG(dbgs() << "LowerWhileLoopStart on loop "
|
||||
<< ML->getHeader()->getName() << "\n");
|
||||
|
||||
MachineInstr *LoopEnd, *LoopPhi, *LoopStart, *LoopDec;
|
||||
if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))
|
||||
return false;
|
||||
|
||||
if (LoopStart->getOpcode() != ARM::t2WhileLoopSetup)
|
||||
return false;
|
||||
|
||||
Register LR = LoopStart->getOperand(0).getReg();
|
||||
auto WLSIt = find_if(MRI->use_nodbg_instructions(LR), [](auto &MI) {
|
||||
return MI.getOpcode() == ARM::t2WhileLoopStart;
|
||||
});
|
||||
if (!MergeEndDec || WLSIt == MRI->use_instr_nodbg_end()) {
|
||||
RevertWhileLoopSetup(LoopStart, TII);
|
||||
RevertLoopDec(LoopStart, TII);
|
||||
RevertLoopEnd(LoopStart, TII);
|
||||
return true;
|
||||
}
|
||||
|
||||
MachineInstrBuilder MI =
|
||||
BuildMI(*WLSIt->getParent(), *WLSIt, WLSIt->getDebugLoc(),
|
||||
TII->get(ARM::t2WhileLoopStartLR), LR)
|
||||
.add(LoopStart->getOperand(1))
|
||||
.add(WLSIt->getOperand(1));
|
||||
(void)MI;
|
||||
LLVM_DEBUG(dbgs() << "Lowered WhileLoopStart into: " << *MI.getInstr());
|
||||
|
||||
WLSIt->eraseFromParent();
|
||||
LoopStart->eraseFromParent();
|
||||
return true;
|
||||
}
|
||||
|
||||
// This function converts loops with t2LoopEnd and t2LoopEnd instructions into
|
||||
// a single t2LoopEndDec instruction. To do that it needs to make sure that LR
|
||||
// will be valid to be used for the low overhead loop, which means nothing else
|
||||
|
@ -192,12 +271,19 @@ bool MVETPAndVPTOptimisations::MergeLoopEnd(MachineLoop *ML) {
|
|||
return false;
|
||||
|
||||
// Check if there is an illegal instruction (a call) in the low overhead loop
|
||||
// and if so revert it now before we get any further.
|
||||
for (MachineBasicBlock *MBB : ML->blocks()) {
|
||||
// and if so revert it now before we get any further. While loops also need to
|
||||
// check the preheaders.
|
||||
SmallPtrSet<MachineBasicBlock *, 4> MBBs(ML->block_begin(), ML->block_end());
|
||||
if (LoopStart->getOpcode() == ARM::t2WhileLoopStartLR)
|
||||
MBBs.insert(ML->getHeader()->pred_begin(), ML->getHeader()->pred_end());
|
||||
for (MachineBasicBlock *MBB : MBBs) {
|
||||
for (MachineInstr &MI : *MBB) {
|
||||
if (MI.isCall()) {
|
||||
LLVM_DEBUG(dbgs() << "Found call in loop, reverting: " << MI);
|
||||
RevertDoLoopStart(LoopStart, TII);
|
||||
if (LoopStart->getOpcode() == ARM::t2DoLoopStart)
|
||||
RevertDoLoopStart(LoopStart, TII);
|
||||
else
|
||||
RevertWhileLoopStartLR(LoopStart, TII);
|
||||
RevertLoopDec(LoopDec, TII);
|
||||
RevertLoopEnd(LoopEnd, TII);
|
||||
return true;
|
||||
|
@ -236,8 +322,16 @@ bool MVETPAndVPTOptimisations::MergeLoopEnd(MachineLoop *ML) {
|
|||
};
|
||||
if (!CheckUsers(PhiReg, {LoopDec}, MRI) ||
|
||||
!CheckUsers(DecReg, {LoopPhi, LoopEnd}, MRI) ||
|
||||
!CheckUsers(StartReg, {LoopPhi}, MRI))
|
||||
!CheckUsers(StartReg, {LoopPhi}, MRI)) {
|
||||
// Don't leave a t2WhileLoopStartLR without the LoopDecEnd.
|
||||
if (LoopStart->getOpcode() == ARM::t2WhileLoopStartLR) {
|
||||
RevertWhileLoopStartLR(LoopStart, TII);
|
||||
RevertLoopDec(LoopDec, TII);
|
||||
RevertLoopEnd(LoopEnd, TII);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
MRI->constrainRegClass(StartReg, &ARM::GPRlrRegClass);
|
||||
MRI->constrainRegClass(PhiReg, &ARM::GPRlrRegClass);
|
||||
|
@ -281,7 +375,7 @@ bool MVETPAndVPTOptimisations::ConvertTailPredLoop(MachineLoop *ML,
|
|||
MachineInstr *LoopEnd, *LoopPhi, *LoopStart, *LoopDec;
|
||||
if (!findLoopComponents(ML, MRI, LoopStart, LoopPhi, LoopDec, LoopEnd))
|
||||
return false;
|
||||
if (LoopDec != LoopEnd)
|
||||
if (LoopDec != LoopEnd || LoopStart->getOpcode() != ARM::t2DoLoopStart)
|
||||
return false;
|
||||
|
||||
SmallVector<MachineInstr *, 4> VCTPs;
|
||||
|
@ -869,6 +963,7 @@ bool MVETPAndVPTOptimisations::runOnMachineFunction(MachineFunction &Fn) {
|
|||
|
||||
bool Modified = false;
|
||||
for (MachineLoop *ML : MLI->getBase().getLoopsInPreorder()) {
|
||||
Modified |= LowerWhileLoopStart(ML);
|
||||
Modified |= MergeLoopEnd(ML);
|
||||
Modified |= ConvertTailPredLoop(ML, DT);
|
||||
}
|
||||
|
|
|
@ -71,26 +71,31 @@ static inline bool isVCTP(const MachineInstr *MI) {
|
|||
static inline bool isLoopStart(MachineInstr &MI) {
|
||||
return MI.getOpcode() == ARM::t2DoLoopStart ||
|
||||
MI.getOpcode() == ARM::t2DoLoopStartTP ||
|
||||
MI.getOpcode() == ARM::t2WhileLoopStart;
|
||||
MI.getOpcode() == ARM::t2WhileLoopStart ||
|
||||
MI.getOpcode() == ARM::t2WhileLoopStartLR;
|
||||
}
|
||||
|
||||
// WhileLoopStart holds the exit block, so produce a cmp lr, 0 and then a
|
||||
// WhileLoopStart holds the exit block, so produce a subs Op0, Op1, 0 and then a
|
||||
// beq that branches to the exit branch.
|
||||
inline void RevertWhileLoopStart(MachineInstr *MI, const TargetInstrInfo *TII,
|
||||
unsigned BrOpc = ARM::t2Bcc) {
|
||||
inline void RevertWhileLoopStartLR(MachineInstr *MI, const TargetInstrInfo *TII,
|
||||
unsigned BrOpc = ARM::t2Bcc) {
|
||||
MachineBasicBlock *MBB = MI->getParent();
|
||||
assert(MI->getOpcode() == ARM::t2WhileLoopStartLR &&
|
||||
"Only expected a t2WhileLoopStartLR in RevertWhileLoopStartLR!");
|
||||
|
||||
// Cmp
|
||||
// Subs
|
||||
MachineInstrBuilder MIB =
|
||||
BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(ARM::t2CMPri));
|
||||
BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(ARM::t2SUBri));
|
||||
MIB.add(MI->getOperand(0));
|
||||
MIB.add(MI->getOperand(1));
|
||||
MIB.addImm(0);
|
||||
MIB.addImm(ARMCC::AL);
|
||||
MIB.addReg(ARM::NoRegister);
|
||||
MIB.addReg(ARM::CPSR, RegState::Define);
|
||||
|
||||
// Branch
|
||||
MIB = BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(BrOpc));
|
||||
MIB.add(MI->getOperand(1)); // branch target
|
||||
MIB.add(MI->getOperand(2)); // branch target
|
||||
MIB.addImm(ARMCC::EQ); // condition code
|
||||
MIB.addReg(ARM::CPSR);
|
||||
|
||||
|
|
|
@ -156,7 +156,7 @@ bool MVETailPredication::runOnLoop(Loop *L, LPPassManager&) {
|
|||
|
||||
Intrinsic::ID ID = Call->getIntrinsicID();
|
||||
if (ID == Intrinsic::start_loop_iterations ||
|
||||
ID == Intrinsic::test_set_loop_iterations)
|
||||
ID == Intrinsic::test_start_loop_iterations)
|
||||
return cast<IntrinsicInst>(&I);
|
||||
}
|
||||
return nullptr;
|
||||
|
|
|
@ -209,7 +209,7 @@ body: |
|
|||
renamable $r12 = t2LDRi12 $sp, 48, 14, $noreg :: (load 4 from %fixed-stack.0, align 8)
|
||||
renamable $r5 = t2ADDri renamable $r12, 3, 14, $noreg, $noreg
|
||||
renamable $r7, dead $cpsr = tLSRri killed renamable $r5, 2, 14, $noreg
|
||||
t2WhileLoopStart renamable $r7, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $r7, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
|
||||
bb.1.for.body.lr.ph:
|
||||
|
|
|
@ -238,7 +238,7 @@ body: |
|
|||
; CHECK: liveins: $r1, $r2, $r3, $r5, $r7, $r8, $r12
|
||||
; CHECK: $r9, $r4 = t2LDRDi8 $r3, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
; CHECK: $r6, $r0 = t2LDRDi8 $r3, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
; CHECK: t2CMPri renamable $r8, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
; CHECK: dead $lr = t2SUBri renamable $r8, 0, 14 /* CC::al */, $noreg, def $cpsr
|
||||
; CHECK: tBcc %bb.1, 0 /* CC::eq */, killed $cpsr
|
||||
; CHECK: tB %bb.3, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.3.bb27:
|
||||
|
@ -334,7 +334,7 @@ body: |
|
|||
|
||||
$r9, $r4 = t2LDRDi8 $r3, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
$r6, $r0 = t2LDRDi8 $r3, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
t2WhileLoopStart renamable $r8, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $r8, %bb.1, implicit-def dead $cpsr
|
||||
tB %bb.3, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.3.bb27:
|
||||
|
|
|
@ -30,8 +30,10 @@
|
|||
%i23 = load i32, i32* %i22, align 4
|
||||
%i24 = getelementptr inbounds i32, i32* %i14, i32 3
|
||||
%i25 = load i32, i32* %i24, align 4
|
||||
%i26 = call i1 @llvm.test.set.loop.iterations.i32(i32 %arg3)
|
||||
br i1 %i26, label %bb27, label %bb74
|
||||
%i26 = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %arg3)
|
||||
%i26.0 = extractvalue { i32, i1 } %i26, 0
|
||||
%i26.1 = extractvalue { i32, i1 } %i26, 1
|
||||
br i1 %i26.1, label %bb27, label %bb74
|
||||
|
||||
bb27: ; preds = %bb12
|
||||
%i28 = getelementptr inbounds i32, i32* %i13, i32 4
|
||||
|
@ -46,7 +48,7 @@
|
|||
br label %bb37
|
||||
|
||||
bb37: ; preds = %bb37, %bb27
|
||||
%lsr.iv = phi i32 [ %lsr.iv.next, %bb37 ], [ %arg3, %bb27 ]
|
||||
%lsr.iv = phi i32 [ %i70, %bb37 ], [ %i26.0, %bb27 ]
|
||||
%i38 = phi i32* [ %i15, %bb27 ], [ %i51, %bb37 ]
|
||||
%i39 = phi i32* [ %arg2, %bb27 ], [ %i69, %bb37 ]
|
||||
%i40 = phi i32 [ %i25, %bb27 ], [ %i41, %bb37 ]
|
||||
|
@ -81,7 +83,6 @@
|
|||
store i32 %i68, i32* %i39, align 4
|
||||
%i70 = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
|
||||
%i71 = icmp ne i32 %i70, 0
|
||||
%lsr.iv.next = add i32 %lsr.iv, -1
|
||||
br i1 %i71, label %bb37, label %bb72
|
||||
|
||||
bb72: ; preds = %bb37
|
||||
|
@ -115,7 +116,7 @@
|
|||
ret void
|
||||
}
|
||||
|
||||
declare i1 @llvm.test.set.loop.iterations.i32(i32) #1
|
||||
declare { i32, i1 } @llvm.test.start.loop.iterations.i32(i32) #1
|
||||
declare i32 @llvm.loop.decrement.reg.i32(i32, i32) #1
|
||||
|
||||
attributes #0 = { optsize "target-cpu"="cortex-m55" }
|
||||
|
@ -133,7 +134,7 @@ liveins:
|
|||
- { reg: '$r2', virtual-reg: '' }
|
||||
- { reg: '$r3', virtual-reg: '' }
|
||||
frameInfo:
|
||||
stackSize: 76
|
||||
stackSize: 68
|
||||
offsetAdjustment: 0
|
||||
maxAlignment: 4
|
||||
savePoint: ''
|
||||
|
@ -164,37 +165,31 @@ stack:
|
|||
- { id: 7, name: '', type: spill-slot, offset: -68, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 8, name: '', type: spill-slot, offset: -72, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 9, name: '', type: spill-slot, offset: -76, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 10, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
|
||||
- { id: 8, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 11, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
|
||||
- { id: 9, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r11', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 12, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
|
||||
- { id: 10, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r10', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 13, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
|
||||
- { id: 11, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r9', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 14, name: '', type: spill-slot, offset: -20, size: 4, alignment: 4,
|
||||
- { id: 12, name: '', type: spill-slot, offset: -20, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r8', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 15, name: '', type: spill-slot, offset: -24, size: 4, alignment: 4,
|
||||
- { id: 13, name: '', type: spill-slot, offset: -24, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 16, name: '', type: spill-slot, offset: -28, size: 4, alignment: 4,
|
||||
- { id: 14, name: '', type: spill-slot, offset: -28, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r6', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 17, name: '', type: spill-slot, offset: -32, size: 4, alignment: 4,
|
||||
- { id: 15, name: '', type: spill-slot, offset: -32, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r5', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 18, name: '', type: spill-slot, offset: -36, size: 4, alignment: 4,
|
||||
- { id: 16, name: '', type: spill-slot, offset: -36, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
callSites: []
|
||||
|
@ -216,82 +211,70 @@ body: |
|
|||
; CHECK: frame-setup CFI_INSTRUCTION offset $r6, -28
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r5, -32
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -36
|
||||
; CHECK: $sp = frame-setup tSUBspi $sp, 10, 14 /* CC::al */, $noreg
|
||||
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 76
|
||||
; CHECK: $r7, $r5 = t2LDRDi8 $r0, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i), (load 4 from %ir.i5)
|
||||
; CHECK: $r6, $r4 = t2LDRDi8 killed $r0, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i7), (load 4 from %ir.i10)
|
||||
; CHECK: $sp = frame-setup tSUBspi $sp, 8, 14 /* CC::al */, $noreg
|
||||
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 68
|
||||
; CHECK: $r6, $r4 = t2LDRDi8 $r0, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i7), (load 4 from %ir.i10)
|
||||
; CHECK: $r7, $r5 = t2LDRDi8 killed $r0, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i), (load 4 from %ir.i5)
|
||||
; CHECK: renamable $r0 = t2RSBri killed renamable $r6, 31, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: t2STMIA $sp, 14 /* CC::al */, $noreg, killed $r0, $r2, $r3 :: (store 4 into %stack.9), (store 4 into %stack.8), (store 4 into %stack.7)
|
||||
; CHECK: t2STMIA $sp, 14 /* CC::al */, $noreg, killed $r0, $r2, $r3 :: (store 4 into %stack.7), (store 4 into %stack.6), (store 4 into %stack.5)
|
||||
; CHECK: $r12 = tMOVr killed $r2, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r2 = tLDRspi $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.7)
|
||||
; CHECK: bb.1.bb12 (align 4):
|
||||
; CHECK: successors: %bb.2(0x40000000), %bb.5(0x40000000)
|
||||
; CHECK: liveins: $r1, $r2, $r3, $r4, $r5, $r7
|
||||
; CHECK: $r9, $r8 = t2LDRDi8 $r7, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
; CHECK: renamable $lr = nuw t2ADDri renamable $r5, 20, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: $r6, $r12 = t2LDRDi8 $r7, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
; CHECK: t2CMPri renamable $r3, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
; CHECK: tBcc %bb.5, 0 /* CC::eq */, killed $cpsr
|
||||
; CHECK: tB %bb.2, 14 /* CC::al */, $noreg
|
||||
; CHECK: liveins: $r1, $r2, $r3, $r4, $r5, $r7, $r12
|
||||
; CHECK: $r10, $r0 = t2LDRDi8 $r7, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
; CHECK: $r6, $r8 = t2LDRDi8 $r7, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
; CHECK: $lr = t2WLS renamable $r3, %bb.5
|
||||
; CHECK: bb.2.bb27:
|
||||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: liveins: $lr, $r1, $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r9, $r12
|
||||
; CHECK: t2STRDi8 killed $lr, killed $r7, $sp, 12, 14 /* CC::al */, $noreg :: (store 4 into %stack.6), (store 4 into %stack.5)
|
||||
; CHECK: renamable $r0 = tLDRi renamable $r5, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i13)
|
||||
; CHECK: renamable $r10 = t2LDRi12 renamable $r5, 16, 14 /* CC::al */, $noreg :: (load 4 from %ir.i28)
|
||||
; CHECK: tSTRspi killed renamable $r0, $sp, 9, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)
|
||||
; CHECK: renamable $r0 = tLDRi renamable $r5, 1, 14 /* CC::al */, $noreg :: (load 4 from %ir.i34)
|
||||
; CHECK: tSTRspi killed renamable $r4, $sp, 5, 14 /* CC::al */, $noreg :: (store 4 into %stack.4)
|
||||
; CHECK: tSTRspi killed renamable $r0, $sp, 8, 14 /* CC::al */, $noreg :: (store 4 into %stack.1)
|
||||
; CHECK: renamable $r0 = tLDRi renamable $r5, 2, 14 /* CC::al */, $noreg :: (load 4 from %ir.i32)
|
||||
; CHECK: tSTRspi killed renamable $r0, $sp, 7, 14 /* CC::al */, $noreg :: (store 4 into %stack.2)
|
||||
; CHECK: renamable $r0 = tLDRi killed renamable $r5, 3, 14 /* CC::al */, $noreg :: (load 4 from %ir.i30)
|
||||
; CHECK: tSTRspi killed renamable $r0, $sp, 6, 14 /* CC::al */, $noreg :: (store 4 into %stack.3)
|
||||
; CHECK: $r0 = tMOVr killed $r3, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r3 = tLDRspi $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.9)
|
||||
; CHECK: liveins: $lr, $r0, $r1, $r2, $r4, $r5, $r6, $r7, $r8, $r10, $r12
|
||||
; CHECK: renamable $r3 = tLDRi renamable $r5, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i13)
|
||||
; CHECK: t2STRDi8 killed $r7, killed $r4, $sp, 12, 14 /* CC::al */, $noreg :: (store 4 into %stack.4), (store 4 into %stack.3)
|
||||
; CHECK: tSTRspi killed renamable $r3, $sp, 7, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)
|
||||
; CHECK: renamable $r3 = tLDRi renamable $r5, 1, 14 /* CC::al */, $noreg :: (load 4 from %ir.i34)
|
||||
; CHECK: renamable $r4 = tLDRi renamable $r5, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i28)
|
||||
; CHECK: tSTRspi killed renamable $r3, $sp, 6, 14 /* CC::al */, $noreg :: (store 4 into %stack.1)
|
||||
; CHECK: $r9, $r3 = t2LDRDi8 $r5, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i32), (load 4 from %ir.i30)
|
||||
; CHECK: tSTRspi killed renamable $r5, $sp, 5, 14 /* CC::al */, $noreg :: (store 4 into %stack.2)
|
||||
; CHECK: bb.3.bb37 (align 4):
|
||||
; CHECK: successors: %bb.3(0x7c000000), %bb.4(0x04000000)
|
||||
; CHECK: liveins: $r0, $r1, $r2, $r3, $r6, $r8, $r9, $r10, $r12
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 8, 14 /* CC::al */, $noreg :: (load 4 from %stack.1)
|
||||
; CHECK: liveins: $lr, $r0, $r1, $r2, $r3, $r4, $r6, $r8, $r9, $r10, $r12
|
||||
; CHECK: $r7 = tMOVr killed $r6, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r5 = tMOVr $r9, 14 /* CC::al */, $noreg
|
||||
; CHECK: $lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMULL killed $r9, killed renamable $r4, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 7, 14 /* CC::al */, $noreg :: (load 4 from %stack.2)
|
||||
; CHECK: renamable $r0, dead $cpsr = tSUBi8 killed renamable $r0, 1, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r9, renamable $r1 = t2LDR_POST killed renamable $r1, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i38)
|
||||
; CHECK: dead renamable $lr = t2SUBri killed renamable $lr, 1, 14 /* CC::al */, $noreg, def $cpsr
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL killed renamable $r8, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 6, 14 /* CC::al */, $noreg :: (load 4 from %stack.3)
|
||||
; CHECK: $r8 = tMOVr $r5, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL renamable $r7, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 9, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL killed renamable $r12, renamable $r10, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r12 = tMOVr $r7, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL renamable $r9, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: early-clobber renamable $r6, dead early-clobber renamable $r11 = MVE_ASRLr killed renamable $r6, killed renamable $r11, renamable $r3, 14 /* CC::al */, $noreg
|
||||
; CHECK: early-clobber renamable $r2 = t2STR_POST renamable $r6, killed renamable $r2, 4, 14 /* CC::al */, $noreg :: (store 4 into %ir.i39)
|
||||
; CHECK: tBcc %bb.3, 1 /* CC::ne */, killed $cpsr
|
||||
; CHECK: tB %bb.4, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6 = tLDRspi $sp, 6, 14 /* CC::al */, $noreg :: (load 4 from %stack.1)
|
||||
; CHECK: $r5 = tMOVr $r10, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMULL killed $r10, killed renamable $r6, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL killed renamable $r0, renamable $r9, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r10, renamable $r1 = t2LDR_POST killed renamable $r1, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i38)
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL renamable $r7, renamable $r3, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r0 = tLDRspi $sp, 7, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL killed renamable $r8, renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r6, renamable $r11 = t2SMLAL renamable $r10, killed renamable $r0, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
; CHECK: early-clobber renamable $r6, dead early-clobber renamable $r11 = MVE_ASRLr killed renamable $r6, killed renamable $r11, renamable $r2, 14 /* CC::al */, $noreg
|
||||
; CHECK: early-clobber renamable $r12 = t2STR_POST renamable $r6, killed renamable $r12, 4, 14 /* CC::al */, $noreg :: (store 4 into %ir.i39)
|
||||
; CHECK: $r8 = tMOVr $r7, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r0 = tMOVr $r5, 14 /* CC::al */, $noreg
|
||||
; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.3
|
||||
; CHECK: bb.4.bb72:
|
||||
; CHECK: successors: %bb.5(0x80000000)
|
||||
; CHECK: liveins: $r5, $r6, $r7, $r9
|
||||
; CHECK: $r12 = tMOVr killed $r7, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r7, $r4 = t2LDRDi8 $sp, 16, 14 /* CC::al */, $noreg :: (load 4 from %stack.5), (load 4 from %stack.4)
|
||||
; CHECK: $lr = t2ADDri $sp, 4, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: $r8 = tMOVr killed $r5, 14 /* CC::al */, $noreg
|
||||
; CHECK: t2LDMIA killed $lr, 14 /* CC::al */, $noreg, def $r2, def $r3, def $lr :: (load 4 from %stack.8), (load 4 from %stack.7), (load 4 from %stack.6)
|
||||
; CHECK: liveins: $r2, $r5, $r6, $r7, $r10
|
||||
; CHECK: $r0 = tMOVr killed $r5, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r8 = tMOVr killed $r7, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r12, $r3 = t2LDRDi8 $sp, 4, 14 /* CC::al */, $noreg :: (load 4 from %stack.6), (load 4 from %stack.5)
|
||||
; CHECK: renamable $r5 = tLDRspi $sp, 5, 14 /* CC::al */, $noreg :: (load 4 from %stack.2)
|
||||
; CHECK: $r7, $r4 = t2LDRDi8 $sp, 12, 14 /* CC::al */, $noreg :: (load 4 from %stack.4), (load 4 from %stack.3)
|
||||
; CHECK: bb.5.bb74:
|
||||
; CHECK: successors: %bb.1(0x80000000)
|
||||
; CHECK: liveins: $lr, $r2, $r3, $r4, $r6, $r7, $r8, $r9, $r12
|
||||
; CHECK: t2STRDi8 killed $r9, killed $r8, $r7, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.i14), (store 4 into %ir.i81)
|
||||
; CHECK: t2STRDi8 killed $r6, killed $r12, $r7, 8, 14 /* CC::al */, $noreg :: (store 4 into %ir.i84), (store 4 into %ir.i88)
|
||||
; CHECK: successors: %bb.6(0x04000000), %bb.1(0x7c000000)
|
||||
; CHECK: liveins: $r0, $r3, $r4, $r5, $r6, $r7, $r8, $r10, $r12, $r2
|
||||
; CHECK: renamable $r5, dead $cpsr = nuw tADDi8 killed renamable $r5, 20, 14 /* CC::al */, $noreg
|
||||
; CHECK: t2STRDi8 killed $r10, killed $r0, $r7, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.i14), (store 4 into %ir.i81)
|
||||
; CHECK: t2STRDi8 killed $r6, killed $r8, $r7, 8, 14 /* CC::al */, $noreg :: (store 4 into %ir.i84), (store 4 into %ir.i88)
|
||||
; CHECK: renamable $r7, dead $cpsr = nuw tADDi8 killed renamable $r7, 16, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r4, $cpsr = tSUBi8 killed renamable $r4, 1, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r5 = tMOVr killed $lr, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r1 = tMOVr $r2, 14 /* CC::al */, $noreg
|
||||
; CHECK: t2IT 0, 4, implicit-def $itstate
|
||||
; CHECK: $sp = frame-destroy tADDspi $sp, 10, 0 /* CC::eq */, $cpsr, implicit $itstate
|
||||
; CHECK: $sp = frame-destroy t2LDMIA_RET $sp, 0 /* CC::eq */, killed $cpsr, def $r4, def $r5, def $r6, def $r7, def $r8, def $r9, def $r10, def $r11, def $pc, implicit $sp, implicit killed $r4, implicit killed $r5, implicit killed $r7, implicit killed $itstate
|
||||
; CHECK: tB %bb.1, 14 /* CC::al */, $noreg
|
||||
; CHECK: $r1 = tMOVr $r12, 14 /* CC::al */, $noreg
|
||||
; CHECK: tBcc %bb.1, 1 /* CC::ne */, killed $cpsr
|
||||
; CHECK: bb.6.bb91:
|
||||
; CHECK: $sp = frame-destroy tADDspi $sp, 8, 14 /* CC::al */, $noreg
|
||||
; CHECK: $sp = frame-destroy t2LDMIA_RET $sp, 14 /* CC::al */, $noreg, def $r4, def $r5, def $r6, def $r7, def $r8, def $r9, def $r10, def $r11, def $pc
|
||||
bb.0.bb:
|
||||
successors: %bb.1(0x80000000)
|
||||
liveins: $r0, $r1, $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11, $lr
|
||||
|
@ -307,90 +290,82 @@ body: |
|
|||
frame-setup CFI_INSTRUCTION offset $r6, -28
|
||||
frame-setup CFI_INSTRUCTION offset $r5, -32
|
||||
frame-setup CFI_INSTRUCTION offset $r4, -36
|
||||
$sp = frame-setup tSUBspi $sp, 10, 14 /* CC::al */, $noreg
|
||||
frame-setup CFI_INSTRUCTION def_cfa_offset 76
|
||||
$r7, $r5 = t2LDRDi8 $r0, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i), (load 4 from %ir.i5)
|
||||
$r6, $r4 = t2LDRDi8 killed $r0, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i7), (load 4 from %ir.i10)
|
||||
$sp = frame-setup tSUBspi $sp, 8, 14 /* CC::al */, $noreg
|
||||
frame-setup CFI_INSTRUCTION def_cfa_offset 68
|
||||
$r6, $r4 = t2LDRDi8 $r0, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i7), (load 4 from %ir.i10)
|
||||
$r7, $r5 = t2LDRDi8 killed $r0, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i), (load 4 from %ir.i5)
|
||||
renamable $r0 = t2RSBri killed renamable $r6, 31, 14 /* CC::al */, $noreg, $noreg
|
||||
t2STMIA $sp, 14 /* CC::al */, $noreg, killed $r0, $r2, $r3 :: (store 4 into %stack.9), (store 4 into %stack.8), (store 4 into %stack.7)
|
||||
t2STMIA $sp, 14 /* CC::al */, $noreg, killed $r0, $r2, $r3 :: (store 4 into %stack.7), (store 4 into %stack.6), (store 4 into %stack.5)
|
||||
$r12 = tMOVr killed $r2, 14 /* CC::al */, $noreg
|
||||
renamable $r2 = tLDRspi $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.7)
|
||||
|
||||
bb.1.bb12 (align 4):
|
||||
successors: %bb.2(0x40000000), %bb.5(0x40000000)
|
||||
liveins: $r1, $r2, $r3, $r4, $r5, $r7
|
||||
liveins: $r1, $r3, $r4, $r5, $r7, $r12, $r2
|
||||
|
||||
$r9, $r8 = t2LDRDi8 $r7, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
renamable $lr = nuw t2ADDri renamable $r5, 20, 14 /* CC::al */, $noreg, $noreg
|
||||
$r6, $r12 = t2LDRDi8 $r7, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
t2WhileLoopStart renamable $r3, %bb.5, implicit-def dead $cpsr
|
||||
$r10, $r0 = t2LDRDi8 $r7, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
$r6, $r8 = t2LDRDi8 $r7, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
renamable $lr = t2WhileLoopStartLR renamable $r3, %bb.5, implicit-def dead $cpsr
|
||||
tB %bb.2, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.2.bb27:
|
||||
successors: %bb.3(0x80000000)
|
||||
liveins: $lr, $r1, $r2, $r3, $r4, $r5, $r6, $r7, $r8, $r9, $r12
|
||||
liveins: $lr, $r0, $r1, $r4, $r5, $r6, $r7, $r8, $r10, $r12, $r2
|
||||
|
||||
t2STRDi8 killed $lr, killed $r7, $sp, 12, 14 /* CC::al */, $noreg :: (store 4 into %stack.6), (store 4 into %stack.5)
|
||||
renamable $r0 = tLDRi renamable $r5, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i13)
|
||||
renamable $r10 = t2LDRi12 renamable $r5, 16, 14 /* CC::al */, $noreg :: (load 4 from %ir.i28)
|
||||
tSTRspi killed renamable $r0, $sp, 9, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)
|
||||
renamable $r0 = tLDRi renamable $r5, 1, 14 /* CC::al */, $noreg :: (load 4 from %ir.i34)
|
||||
tSTRspi killed renamable $r4, $sp, 5, 14 /* CC::al */, $noreg :: (store 4 into %stack.4)
|
||||
tSTRspi killed renamable $r0, $sp, 8, 14 /* CC::al */, $noreg :: (store 4 into %stack.1)
|
||||
renamable $r0 = tLDRi renamable $r5, 2, 14 /* CC::al */, $noreg :: (load 4 from %ir.i32)
|
||||
tSTRspi killed renamable $r0, $sp, 7, 14 /* CC::al */, $noreg :: (store 4 into %stack.2)
|
||||
renamable $r0 = tLDRi killed renamable $r5, 3, 14 /* CC::al */, $noreg :: (load 4 from %ir.i30)
|
||||
tSTRspi killed renamable $r0, $sp, 6, 14 /* CC::al */, $noreg :: (store 4 into %stack.3)
|
||||
$r0 = tMOVr killed $r3, 14 /* CC::al */, $noreg
|
||||
renamable $r3 = tLDRspi $sp, 0, 14 /* CC::al */, $noreg :: (load 4 from %stack.9)
|
||||
renamable $r3 = tLDRi renamable $r5, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i13)
|
||||
t2STRDi8 killed $r7, killed $r4, $sp, 12, 14 /* CC::al */, $noreg :: (store 4 into %stack.4), (store 4 into %stack.3)
|
||||
tSTRspi killed renamable $r3, $sp, 7, 14 /* CC::al */, $noreg :: (store 4 into %stack.0)
|
||||
renamable $r3 = tLDRi renamable $r5, 1, 14 /* CC::al */, $noreg :: (load 4 from %ir.i34)
|
||||
renamable $r4 = tLDRi renamable $r5, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i28)
|
||||
tSTRspi killed renamable $r3, $sp, 6, 14 /* CC::al */, $noreg :: (store 4 into %stack.1)
|
||||
$r9, $r3 = t2LDRDi8 $r5, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i32), (load 4 from %ir.i30)
|
||||
tSTRspi killed renamable $r5, $sp, 5, 14 /* CC::al */, $noreg :: (store 4 into %stack.2)
|
||||
|
||||
bb.3.bb37 (align 4):
|
||||
successors: %bb.3(0x7c000000), %bb.4(0x04000000)
|
||||
liveins: $r0, $r1, $r2, $r3, $r6, $r8, $r9, $r10, $r12
|
||||
liveins: $lr, $r0, $r1, $r2, $r3, $r4, $r6, $r8, $r9, $r10, $r12
|
||||
|
||||
renamable $r4 = tLDRspi $sp, 8, 14 /* CC::al */, $noreg :: (load 4 from %stack.1)
|
||||
$r7 = tMOVr killed $r6, 14 /* CC::al */, $noreg
|
||||
$r5 = tMOVr $r9, 14 /* CC::al */, $noreg
|
||||
$lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMULL killed $r9, killed renamable $r4, 14 /* CC::al */, $noreg
|
||||
renamable $r4 = tLDRspi $sp, 7, 14 /* CC::al */, $noreg :: (load 4 from %stack.2)
|
||||
renamable $r0, dead $cpsr = tSUBi8 killed renamable $r0, 1, 14 /* CC::al */, $noreg
|
||||
renamable $r9, renamable $r1 = t2LDR_POST killed renamable $r1, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i38)
|
||||
renamable $lr = t2LoopDec killed renamable $lr, 1
|
||||
renamable $r6, renamable $r11 = t2SMLAL killed renamable $r8, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
renamable $r4 = tLDRspi $sp, 6, 14 /* CC::al */, $noreg :: (load 4 from %stack.3)
|
||||
$r8 = tMOVr $r5, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMLAL renamable $r7, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
renamable $r4 = tLDRspi $sp, 9, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)
|
||||
renamable $r6, renamable $r11 = t2SMLAL killed renamable $r12, renamable $r10, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
$r12 = tMOVr $r7, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMLAL renamable $r9, killed renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
early-clobber renamable $r6, dead early-clobber renamable $r11 = MVE_ASRLr killed renamable $r6, killed renamable $r11, renamable $r3, 14 /* CC::al */, $noreg
|
||||
early-clobber renamable $r2 = t2STR_POST renamable $r6, killed renamable $r2, 4, 14 /* CC::al */, $noreg :: (store 4 into %ir.i39)
|
||||
t2LoopEnd killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
renamable $r6 = tLDRspi $sp, 6, 14 /* CC::al */, $noreg :: (load 4 from %stack.1)
|
||||
$r5 = tMOVr $r10, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMULL killed $r10, killed renamable $r6, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMLAL killed renamable $r0, renamable $r9, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
renamable $r10, renamable $r1 = t2LDR_POST killed renamable $r1, 4, 14 /* CC::al */, $noreg :: (load 4 from %ir.i38)
|
||||
renamable $r6, renamable $r11 = t2SMLAL renamable $r7, renamable $r3, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
renamable $r0 = tLDRspi $sp, 7, 14 /* CC::al */, $noreg :: (load 4 from %stack.0)
|
||||
renamable $r6, renamable $r11 = t2SMLAL killed renamable $r8, renamable $r4, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
renamable $r6, renamable $r11 = t2SMLAL renamable $r10, killed renamable $r0, killed renamable $r6, killed renamable $r11, 14 /* CC::al */, $noreg
|
||||
early-clobber renamable $r6, dead early-clobber renamable $r11 = MVE_ASRLr killed renamable $r6, killed renamable $r11, renamable $r2, 14 /* CC::al */, $noreg
|
||||
early-clobber renamable $r12 = t2STR_POST renamable $r6, killed renamable $r12, 4, 14 /* CC::al */, $noreg :: (store 4 into %ir.i39)
|
||||
$r8 = tMOVr $r7, 14 /* CC::al */, $noreg
|
||||
$r0 = tMOVr $r5, 14 /* CC::al */, $noreg
|
||||
renamable $lr = t2LoopEndDec killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.4, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.4.bb72:
|
||||
successors: %bb.5(0x80000000)
|
||||
liveins: $r5, $r6, $r7, $r9
|
||||
liveins: $r5, $r6, $r7, $r10, $r2
|
||||
|
||||
$r12 = tMOVr killed $r7, 14 /* CC::al */, $noreg
|
||||
$r7, $r4 = t2LDRDi8 $sp, 16, 14 /* CC::al */, $noreg :: (load 4 from %stack.5), (load 4 from %stack.4)
|
||||
$lr = t2ADDri $sp, 4, 14 /* CC::al */, $noreg, $noreg
|
||||
$r8 = tMOVr killed $r5, 14 /* CC::al */, $noreg
|
||||
t2LDMIA killed $lr, 14 /* CC::al */, $noreg, def $r2, def $r3, def $lr :: (load 4 from %stack.8), (load 4 from %stack.7), (load 4 from %stack.6)
|
||||
$r0 = tMOVr killed $r5, 14 /* CC::al */, $noreg
|
||||
$r8 = tMOVr killed $r7, 14 /* CC::al */, $noreg
|
||||
$r12, $r3 = t2LDRDi8 $sp, 4, 14 /* CC::al */, $noreg :: (load 4 from %stack.6), (load 4 from %stack.5)
|
||||
renamable $r5 = tLDRspi $sp, 5, 14 /* CC::al */, $noreg :: (load 4 from %stack.2)
|
||||
$r7, $r4 = t2LDRDi8 $sp, 12, 14 /* CC::al */, $noreg :: (load 4 from %stack.4), (load 4 from %stack.3)
|
||||
|
||||
bb.5.bb74:
|
||||
successors: %bb.1(0x7c000000)
|
||||
liveins: $lr, $r2, $r3, $r4, $r6, $r7, $r8, $r9, $r12
|
||||
successors: %bb.6(0x04000000), %bb.1(0x7c000000)
|
||||
liveins: $r0, $r3, $r4, $r5, $r6, $r7, $r8, $r10, $r12, $r2
|
||||
|
||||
t2STRDi8 killed $r9, killed $r8, $r7, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.i14), (store 4 into %ir.i81)
|
||||
t2STRDi8 killed $r6, killed $r12, $r7, 8, 14 /* CC::al */, $noreg :: (store 4 into %ir.i84), (store 4 into %ir.i88)
|
||||
renamable $r5, dead $cpsr = nuw tADDi8 killed renamable $r5, 20, 14 /* CC::al */, $noreg
|
||||
t2STRDi8 killed $r10, killed $r0, $r7, 0, 14 /* CC::al */, $noreg :: (store 4 into %ir.i14), (store 4 into %ir.i81)
|
||||
t2STRDi8 killed $r6, killed $r8, $r7, 8, 14 /* CC::al */, $noreg :: (store 4 into %ir.i84), (store 4 into %ir.i88)
|
||||
renamable $r7, dead $cpsr = nuw tADDi8 killed renamable $r7, 16, 14 /* CC::al */, $noreg
|
||||
renamable $r4, $cpsr = tSUBi8 killed renamable $r4, 1, 14 /* CC::al */, $noreg
|
||||
$r5 = tMOVr killed $lr, 14 /* CC::al */, $noreg
|
||||
$r1 = tMOVr $r2, 14 /* CC::al */, $noreg
|
||||
t2IT 0, 4, implicit-def $itstate
|
||||
$sp = frame-destroy tADDspi $sp, 10, 0 /* CC::eq */, $cpsr, implicit $itstate
|
||||
$sp = frame-destroy t2LDMIA_RET $sp, 0 /* CC::eq */, killed $cpsr, def $r4, def $r5, def $r6, def $r7, def $r8, def $r9, def $r10, def $r11, def $pc, implicit $sp, implicit killed $r4, implicit killed $r5, implicit killed $r7, implicit killed $itstate
|
||||
tB %bb.1, 14 /* CC::al */, $noreg
|
||||
$r1 = tMOVr $r12, 14 /* CC::al */, $noreg
|
||||
tBcc %bb.1, 1 /* CC::ne */, killed $cpsr
|
||||
|
||||
bb.6.bb91:
|
||||
$sp = frame-destroy tADDspi $sp, 8, 14 /* CC::al */, $noreg
|
||||
$sp = frame-destroy t2LDMIA_RET $sp, 14 /* CC::al */, $noreg, def $r4, def $r5, def $r6, def $r7, def $r8, def $r9, def $r10, def $r11, def $pc
|
||||
|
||||
...
|
||||
|
|
|
@ -323,7 +323,7 @@ body: |
|
|||
|
||||
$r9, $r4 = t2LDRDi8 $r3, 0, 14 /* CC::al */, $noreg :: (load 4 from %ir.i14), (load 4 from %ir.i20)
|
||||
$r6, $r0 = t2LDRDi8 $r3, 8, 14 /* CC::al */, $noreg :: (load 4 from %ir.i22), (load 4 from %ir.i24)
|
||||
t2WhileLoopStart renamable $r8, %bb.5, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $r8, %bb.5, implicit-def dead $cpsr
|
||||
tB %bb.2, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.2.bb27:
|
||||
|
|
|
@ -402,17 +402,18 @@ for.cond.cleanup:
|
|||
}
|
||||
|
||||
; CHECK-MID: check_negated_xor_wls
|
||||
; CHECK-MID: t2WhileLoopStart $r2, %bb.3
|
||||
; CHECK-MID: $lr = t2WhileLoopStartLR killed renamable $r2
|
||||
; CHECK-MID: tB %bb.1
|
||||
; CHECK-MID: bb.1.while.body.preheader:
|
||||
; CHECK-MID: $lr = t2LoopDec killed renamable $lr, 1
|
||||
; CHECK-MID: t2LoopEnd renamable $lr, %bb.2, implicit-def dead $cpsr
|
||||
; CHECk-MID: tB %bb.3
|
||||
; CHECK-MID: bb.3.while.end:
|
||||
; CHECK-MID: bb.1.while.body:
|
||||
; CHECK-MID: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.1
|
||||
; CHECk-MID: tB %bb.2
|
||||
; CHECK-MID: bb.2.while.end:
|
||||
define void @check_negated_xor_wls(i16* nocapture %a, i16* nocapture readonly %b, i32 %N) {
|
||||
entry:
|
||||
%wls = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
%xor = xor i1 %wls, 1
|
||||
%wls = call {i32, i1} @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
%wls0 = extractvalue {i32, i1} %wls, 0
|
||||
%wls1 = extractvalue {i32, i1} %wls, 1
|
||||
%xor = xor i1 %wls1, 1
|
||||
br i1 %xor, label %while.end, label %while.body.preheader
|
||||
|
||||
while.body.preheader:
|
||||
|
@ -421,7 +422,7 @@ while.body.preheader:
|
|||
while.body:
|
||||
%a.addr.06 = phi i16* [ %incdec.ptr1, %while.body ], [ %a, %while.body.preheader ]
|
||||
%b.addr.05 = phi i16* [ %incdec.ptr, %while.body ], [ %b, %while.body.preheader ]
|
||||
%count = phi i32 [ %N, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%count = phi i32 [ %wls0, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%incdec.ptr = getelementptr inbounds i16, i16* %b.addr.05, i32 1
|
||||
%ld.b = load i16, i16* %b.addr.05, align 2
|
||||
%incdec.ptr1 = getelementptr inbounds i16, i16* %a.addr.06, i32 1
|
||||
|
@ -435,17 +436,18 @@ while.end:
|
|||
}
|
||||
|
||||
; CHECK-MID: check_negated_cmp_wls
|
||||
; CHECK-MID: t2WhileLoopStart $r2, %bb.3
|
||||
; CHECK-MID: $lr = t2WhileLoopStartLR killed renamable $r2
|
||||
; CHECK-MID: tB %bb.1
|
||||
; CHECK-MID: bb.1.while.body.preheader:
|
||||
; CHECK-MID: $lr = t2LoopDec killed renamable $lr, 1
|
||||
; CHECK-MID: t2LoopEnd renamable $lr, %bb.2
|
||||
; CHECk-MID: tB %bb.3
|
||||
; CHECK-MID: bb.3.while.end:
|
||||
; CHECK-MID: bb.1.while.body:
|
||||
; CHECK-MID: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.1
|
||||
; CHECk-MID: tB %bb.2
|
||||
; CHECK-MID: bb.2.while.end:
|
||||
define void @check_negated_cmp_wls(i16* nocapture %a, i16* nocapture readonly %b, i32 %N) {
|
||||
entry:
|
||||
%wls = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
%cmp = icmp ne i1 %wls, 1
|
||||
%wls = call {i32, i1} @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
%wls0 = extractvalue {i32, i1} %wls, 0
|
||||
%wls1 = extractvalue {i32, i1} %wls, 1
|
||||
%cmp = icmp ne i1 %wls1, 1
|
||||
br i1 %cmp, label %while.end, label %while.body.preheader
|
||||
|
||||
while.body.preheader:
|
||||
|
@ -454,7 +456,7 @@ while.body.preheader:
|
|||
while.body:
|
||||
%a.addr.06 = phi i16* [ %incdec.ptr1, %while.body ], [ %a, %while.body.preheader ]
|
||||
%b.addr.05 = phi i16* [ %incdec.ptr, %while.body ], [ %b, %while.body.preheader ]
|
||||
%count = phi i32 [ %N, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%count = phi i32 [ %wls0, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%incdec.ptr = getelementptr inbounds i16, i16* %b.addr.05, i32 1
|
||||
%ld.b = load i16, i16* %b.addr.05, align 2
|
||||
%incdec.ptr1 = getelementptr inbounds i16, i16* %a.addr.06, i32 1
|
||||
|
@ -468,11 +470,10 @@ while.end:
|
|||
}
|
||||
|
||||
; CHECK-MID: check_negated_reordered_wls
|
||||
; CHECK-MID: t2WhileLoopStart killed $r2, %bb.2
|
||||
; CHECK-MID: $lr = t2WhileLoopStartLR killed renamable $r2
|
||||
; CHECK-MID: tB %bb.1
|
||||
; CHECK-MID: bb.1.while.body:
|
||||
; CHECK-MID: $lr = t2LoopDec killed renamable $lr, 1
|
||||
; CHECK-MID: t2LoopEnd renamable $lr, %bb.1
|
||||
; CHECK-MID: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.1
|
||||
; CHECk-MID: tB %bb.2
|
||||
; CHECK-MID: bb.2.while.end:
|
||||
define void @check_negated_reordered_wls(i16* nocapture %a, i16* nocapture readonly %b, i32 %N) {
|
||||
|
@ -485,7 +486,7 @@ while.body.preheader:
|
|||
while.body:
|
||||
%a.addr.06 = phi i16* [ %incdec.ptr1, %while.body ], [ %a, %while.body.preheader ]
|
||||
%b.addr.05 = phi i16* [ %incdec.ptr, %while.body ], [ %b, %while.body.preheader ]
|
||||
%count = phi i32 [ %N, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%count = phi i32 [ %wls0, %while.body.preheader ], [ %count.next, %while.body ]
|
||||
%incdec.ptr = getelementptr inbounds i16, i16* %b.addr.05, i32 1
|
||||
%ld.b = load i16, i16* %b.addr.05, align 2
|
||||
%incdec.ptr1 = getelementptr inbounds i16, i16* %a.addr.06, i32 1
|
||||
|
@ -495,8 +496,10 @@ while.body:
|
|||
br i1 %cmp, label %while.body, label %while.end
|
||||
|
||||
while:
|
||||
%wls = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
%xor = xor i1 %wls, 1
|
||||
%wls = call {i32, i1} @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
%wls0 = extractvalue {i32, i1} %wls, 0
|
||||
%wls1 = extractvalue {i32, i1} %wls, 1
|
||||
%xor = xor i1 %wls1, 1
|
||||
br i1 %xor, label %while.end, label %while.body.preheader
|
||||
|
||||
while.end:
|
||||
|
@ -504,5 +507,5 @@ while.end:
|
|||
}
|
||||
|
||||
declare i32 @llvm.start.loop.iterations.i32(i32)
|
||||
declare i1 @llvm.test.set.loop.iterations.i32(i32)
|
||||
declare {i32, i1} @llvm.test.start.loop.iterations.i32(i32)
|
||||
declare i32 @llvm.loop.decrement.reg.i32(i32, i32)
|
||||
|
|
|
@ -25,7 +25,7 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* nocapture %a, float* nocaptur
|
|||
; CHECK-NEXT: beq .LBB0_4
|
||||
; CHECK-NEXT: @ %bb.2: @ %for.body.preheader
|
||||
; CHECK-NEXT: subs r5, r3, #1
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: and r12, r3, #3
|
||||
; CHECK-NEXT: cmp r5, #3
|
||||
; CHECK-NEXT: bhs .LBB0_6
|
||||
; CHECK-NEXT: @ %bb.3:
|
||||
|
@ -44,7 +44,7 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* nocapture %a, float* nocaptur
|
|||
; CHECK-NEXT: letp lr, .LBB0_5
|
||||
; CHECK-NEXT: b .LBB0_11
|
||||
; CHECK-NEXT: .LBB0_6: @ %for.body.preheader.new
|
||||
; CHECK-NEXT: sub.w r12, r3, lr
|
||||
; CHECK-NEXT: sub.w lr, r3, r12
|
||||
; CHECK-NEXT: movs r4, #0
|
||||
; CHECK-NEXT: movs r3, #0
|
||||
; CHECK-NEXT: .LBB0_7: @ %for.body
|
||||
|
@ -56,7 +56,7 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* nocapture %a, float* nocaptur
|
|||
; CHECK-NEXT: vldr s0, [r5]
|
||||
; CHECK-NEXT: adds r4, #16
|
||||
; CHECK-NEXT: vldr s2, [r6]
|
||||
; CHECK-NEXT: cmp r12, r3
|
||||
; CHECK-NEXT: cmp lr, r3
|
||||
; CHECK-NEXT: vmul.f32 s0, s2, s0
|
||||
; CHECK-NEXT: vstr s0, [r7]
|
||||
; CHECK-NEXT: vldr s0, [r5, #4]
|
||||
|
@ -73,7 +73,7 @@ define arm_aapcs_vfpcc void @fast_float_mul(float* nocapture %a, float* nocaptur
|
|||
; CHECK-NEXT: vstr s0, [r7, #12]
|
||||
; CHECK-NEXT: bne .LBB0_7
|
||||
; CHECK-NEXT: .LBB0_8: @ %for.cond.cleanup.loopexit.unr-lcssa
|
||||
; CHECK-NEXT: wls lr, lr, .LBB0_11
|
||||
; CHECK-NEXT: wls lr, r12, .LBB0_11
|
||||
; CHECK-NEXT: @ %bb.9: @ %for.body.epil.preheader
|
||||
; CHECK-NEXT: add.w r1, r1, r3, lsl #2
|
||||
; CHECK-NEXT: add.w r2, r2, r3, lsl #2
|
||||
|
|
|
@ -179,13 +179,11 @@ if.end: ; preds = %do.body, %entry
|
|||
ret void
|
||||
}
|
||||
|
||||
; TODO: Remove the tMOVr in the preheader!
|
||||
; CHECK: ne_trip_count
|
||||
; CHECK: body:
|
||||
; CHECK: bb.0.entry:
|
||||
; CHECK: $lr = t2WLS $r3, %bb.3
|
||||
; CHECK: $lr = t2WLS killed renamable $r3, %bb.3
|
||||
; CHECK: bb.1.do.body.preheader:
|
||||
; CHECK: $lr = tMOVr
|
||||
; CHECK: bb.2.do.body:
|
||||
; CHECK: $lr = t2LEUpdate killed renamable $lr, %bb.2
|
||||
define void @ne_trip_count(i1 zeroext %t1, i32* nocapture %a, i32* nocapture readonly %b, i32 %N) {
|
||||
|
|
|
@ -33,8 +33,8 @@ define arm_aapcs_vfpcc void @float_float_mul(float* nocapture readonly %a, float
|
|||
; CHECK-NEXT: .LBB0_4: @ %for.body.preheader22
|
||||
; CHECK-NEXT: mvn.w r7, r12
|
||||
; CHECK-NEXT: adds r4, r7, r3
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB0_7
|
||||
; CHECK-NEXT: and r7, r3, #3
|
||||
; CHECK-NEXT: wls lr, r7, .LBB0_7
|
||||
; CHECK-NEXT: @ %bb.5: @ %for.body.prol.preheader
|
||||
; CHECK-NEXT: add.w r5, r0, r12, lsl #2
|
||||
; CHECK-NEXT: add.w r6, r1, r12, lsl #2
|
||||
|
@ -246,8 +246,8 @@ define arm_aapcs_vfpcc void @float_float_add(float* nocapture readonly %a, float
|
|||
; CHECK-NEXT: .LBB1_4: @ %for.body.preheader22
|
||||
; CHECK-NEXT: mvn.w r7, r12
|
||||
; CHECK-NEXT: adds r4, r7, r3
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB1_7
|
||||
; CHECK-NEXT: and r7, r3, #3
|
||||
; CHECK-NEXT: wls lr, r7, .LBB1_7
|
||||
; CHECK-NEXT: @ %bb.5: @ %for.body.prol.preheader
|
||||
; CHECK-NEXT: add.w r5, r0, r12, lsl #2
|
||||
; CHECK-NEXT: add.w r6, r1, r12, lsl #2
|
||||
|
@ -459,8 +459,8 @@ define arm_aapcs_vfpcc void @float_float_sub(float* nocapture readonly %a, float
|
|||
; CHECK-NEXT: .LBB2_4: @ %for.body.preheader22
|
||||
; CHECK-NEXT: mvn.w r7, r12
|
||||
; CHECK-NEXT: adds r4, r7, r3
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB2_7
|
||||
; CHECK-NEXT: and r7, r3, #3
|
||||
; CHECK-NEXT: wls lr, r7, .LBB2_7
|
||||
; CHECK-NEXT: @ %bb.5: @ %for.body.prol.preheader
|
||||
; CHECK-NEXT: add.w r5, r0, r12, lsl #2
|
||||
; CHECK-NEXT: add.w r6, r1, r12, lsl #2
|
||||
|
@ -681,8 +681,8 @@ define arm_aapcs_vfpcc void @float_int_mul(float* nocapture readonly %a, i32* no
|
|||
; CHECK-NEXT: .LBB3_7: @ %for.body.preheader16
|
||||
; CHECK-NEXT: mvn.w r7, r12
|
||||
; CHECK-NEXT: add.w r8, r7, r3
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB3_10
|
||||
; CHECK-NEXT: and r7, r3, #3
|
||||
; CHECK-NEXT: wls lr, r7, .LBB3_10
|
||||
; CHECK-NEXT: @ %bb.8: @ %for.body.prol.preheader
|
||||
; CHECK-NEXT: add.w r5, r0, r12, lsl #2
|
||||
; CHECK-NEXT: add.w r6, r1, r12, lsl #2
|
||||
|
@ -1424,7 +1424,7 @@ define arm_aapcs_vfpcc float @half_half_mac(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: cbz r2, .LBB9_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
|
||||
; CHECK-NEXT: subs r3, r2, #1
|
||||
; CHECK-NEXT: and lr, r2, #3
|
||||
; CHECK-NEXT: and r12, r2, #3
|
||||
; CHECK-NEXT: vldr s0, .LCPI9_0
|
||||
; CHECK-NEXT: cmp r3, #3
|
||||
; CHECK-NEXT: bhs .LBB9_4
|
||||
|
@ -1435,7 +1435,7 @@ define arm_aapcs_vfpcc float @half_half_mac(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vldr s0, .LCPI9_0
|
||||
; CHECK-NEXT: b .LBB9_9
|
||||
; CHECK-NEXT: .LBB9_4: @ %for.body.preheader.new
|
||||
; CHECK-NEXT: sub.w r12, r2, lr
|
||||
; CHECK-NEXT: sub.w lr, r2, r12
|
||||
; CHECK-NEXT: movs r3, #0
|
||||
; CHECK-NEXT: movs r2, #0
|
||||
; CHECK-NEXT: .LBB9_5: @ %for.body
|
||||
|
@ -1459,7 +1459,7 @@ define arm_aapcs_vfpcc float @half_half_mac(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vcvtb.f32.f16 s6, s6
|
||||
; CHECK-NEXT: adds r3, #8
|
||||
; CHECK-NEXT: vmul.f16 s8, s10, s8
|
||||
; CHECK-NEXT: cmp r12, r2
|
||||
; CHECK-NEXT: cmp lr, r2
|
||||
; CHECK-NEXT: vcvtb.f32.f16 s8, s8
|
||||
; CHECK-NEXT: vadd.f32 s0, s0, s8
|
||||
; CHECK-NEXT: vadd.f32 s0, s0, s6
|
||||
|
@ -1467,7 +1467,7 @@ define arm_aapcs_vfpcc float @half_half_mac(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vadd.f32 s0, s0, s2
|
||||
; CHECK-NEXT: bne .LBB9_5
|
||||
; CHECK-NEXT: .LBB9_6: @ %for.cond.cleanup.loopexit.unr-lcssa
|
||||
; CHECK-NEXT: wls lr, lr, .LBB9_9
|
||||
; CHECK-NEXT: wls lr, r12, .LBB9_9
|
||||
; CHECK-NEXT: @ %bb.7: @ %for.body.epil.preheader
|
||||
; CHECK-NEXT: add.w r0, r0, r2, lsl #1
|
||||
; CHECK-NEXT: add.w r1, r1, r2, lsl #1
|
||||
|
@ -1576,7 +1576,7 @@ define arm_aapcs_vfpcc float @half_half_acc(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: cbz r2, .LBB10_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
|
||||
; CHECK-NEXT: subs r3, r2, #1
|
||||
; CHECK-NEXT: and lr, r2, #3
|
||||
; CHECK-NEXT: and r12, r2, #3
|
||||
; CHECK-NEXT: vldr s0, .LCPI10_0
|
||||
; CHECK-NEXT: cmp r3, #3
|
||||
; CHECK-NEXT: bhs .LBB10_4
|
||||
|
@ -1587,7 +1587,7 @@ define arm_aapcs_vfpcc float @half_half_acc(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vldr s0, .LCPI10_0
|
||||
; CHECK-NEXT: b .LBB10_9
|
||||
; CHECK-NEXT: .LBB10_4: @ %for.body.preheader.new
|
||||
; CHECK-NEXT: sub.w r12, r2, lr
|
||||
; CHECK-NEXT: sub.w lr, r2, r12
|
||||
; CHECK-NEXT: movs r3, #0
|
||||
; CHECK-NEXT: movs r2, #0
|
||||
; CHECK-NEXT: .LBB10_5: @ %for.body
|
||||
|
@ -1611,7 +1611,7 @@ define arm_aapcs_vfpcc float @half_half_acc(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vcvtb.f32.f16 s6, s6
|
||||
; CHECK-NEXT: adds r3, #8
|
||||
; CHECK-NEXT: vadd.f16 s8, s10, s8
|
||||
; CHECK-NEXT: cmp r12, r2
|
||||
; CHECK-NEXT: cmp lr, r2
|
||||
; CHECK-NEXT: vcvtb.f32.f16 s8, s8
|
||||
; CHECK-NEXT: vadd.f32 s0, s0, s8
|
||||
; CHECK-NEXT: vadd.f32 s0, s0, s6
|
||||
|
@ -1619,7 +1619,7 @@ define arm_aapcs_vfpcc float @half_half_acc(half* nocapture readonly %a, half* n
|
|||
; CHECK-NEXT: vadd.f32 s0, s0, s2
|
||||
; CHECK-NEXT: bne .LBB10_5
|
||||
; CHECK-NEXT: .LBB10_6: @ %for.cond.cleanup.loopexit.unr-lcssa
|
||||
; CHECK-NEXT: wls lr, lr, .LBB10_9
|
||||
; CHECK-NEXT: wls lr, r12, .LBB10_9
|
||||
; CHECK-NEXT: @ %bb.7: @ %for.body.epil.preheader
|
||||
; CHECK-NEXT: add.w r0, r0, r2, lsl #1
|
||||
; CHECK-NEXT: add.w r1, r1, r2, lsl #1
|
||||
|
@ -1728,7 +1728,7 @@ define arm_aapcs_vfpcc float @half_short_mac(half* nocapture readonly %a, i16* n
|
|||
; CHECK-NEXT: cbz r2, .LBB11_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %for.body.preheader
|
||||
; CHECK-NEXT: subs r3, r2, #1
|
||||
; CHECK-NEXT: and lr, r2, #3
|
||||
; CHECK-NEXT: and r12, r2, #3
|
||||
; CHECK-NEXT: vldr s0, .LCPI11_0
|
||||
; CHECK-NEXT: cmp r3, #3
|
||||
; CHECK-NEXT: bhs .LBB11_4
|
||||
|
@ -1739,7 +1739,7 @@ define arm_aapcs_vfpcc float @half_short_mac(half* nocapture readonly %a, i16* n
|
|||
; CHECK-NEXT: vldr s0, .LCPI11_0
|
||||
; CHECK-NEXT: b .LBB11_9
|
||||
; CHECK-NEXT: .LBB11_4: @ %for.body.preheader.new
|
||||
; CHECK-NEXT: sub.w r12, r2, lr
|
||||
; CHECK-NEXT: sub.w lr, r2, r12
|
||||
; CHECK-NEXT: adds r3, r1, #4
|
||||
; CHECK-NEXT: adds r4, r0, #4
|
||||
; CHECK-NEXT: movs r2, #0
|
||||
|
@ -1748,7 +1748,7 @@ define arm_aapcs_vfpcc float @half_short_mac(half* nocapture readonly %a, i16* n
|
|||
; CHECK-NEXT: ldrsh.w r5, [r3, #2]
|
||||
; CHECK-NEXT: vldr.16 s2, [r4, #2]
|
||||
; CHECK-NEXT: adds r2, #4
|
||||
; CHECK-NEXT: cmp r12, r2
|
||||
; CHECK-NEXT: cmp lr, r2
|
||||
; CHECK-NEXT: vmov s4, r5
|
||||
; CHECK-NEXT: ldrsh r5, [r3], #8
|
||||
; CHECK-NEXT: vcvt.f16.s32 s4, s4
|
||||
|
@ -1778,7 +1778,7 @@ define arm_aapcs_vfpcc float @half_short_mac(half* nocapture readonly %a, i16* n
|
|||
; CHECK-NEXT: vadd.f32 s0, s0, s2
|
||||
; CHECK-NEXT: bne .LBB11_5
|
||||
; CHECK-NEXT: .LBB11_6: @ %for.cond.cleanup.loopexit.unr-lcssa
|
||||
; CHECK-NEXT: wls lr, lr, .LBB11_9
|
||||
; CHECK-NEXT: wls lr, r12, .LBB11_9
|
||||
; CHECK-NEXT: @ %bb.7: @ %for.body.epil.preheader
|
||||
; CHECK-NEXT: add.w r0, r0, r2, lsl #1
|
||||
; CHECK-NEXT: add.w r1, r1, r2, lsl #1
|
||||
|
|
|
@ -117,7 +117,7 @@ body: |
|
|||
frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
renamable $r3, dead $cpsr = tADDi3 renamable $r2, 7, 14, $noreg
|
||||
renamable $lr = t2LSRri killed renamable $r3, 3, 14, $noreg, $noreg
|
||||
t2WhileLoopStart renamable $lr, %bb.4, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $lr, %bb.4, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
|
||||
bb.1.for.body.preheader:
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
# CHECK: bb.0.entry:
|
||||
# CHECK: tBcc %bb.2, 3
|
||||
# CHECK: bb.1.not.preheader:
|
||||
# CHECK: t2CMPri renamable $lr, 0, 14
|
||||
# CHECK: $lr = t2SUBri killed renamable $lr, 0, 14
|
||||
# CHECK: tBcc %bb.4, 0
|
||||
# CHECK: tB %bb.2
|
||||
# CHECK: bb.3.while.body:
|
||||
|
@ -119,7 +119,7 @@ body: |
|
|||
successors: %bb.2(0x40000000), %bb.4(0x40000000)
|
||||
liveins: $lr, $r0, $r1
|
||||
|
||||
t2WhileLoopStart renamable $lr, %bb.4, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $lr, %bb.4, implicit-def dead $cpsr
|
||||
tB %bb.2, 14, $noreg
|
||||
|
||||
bb.2.while.body.preheader:
|
||||
|
|
|
@ -102,7 +102,7 @@ body: |
|
|||
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
; CHECK: t2CMPri $r3, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
; CHECK: dead $lr = t2SUBri $r3, 0, 14 /* CC::al */, $noreg, def $cpsr
|
||||
; CHECK: t2Bcc %bb.3, 0 /* CC::eq */, killed $cpsr
|
||||
; CHECK: tB %bb.1, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.1.do.body.preheader:
|
||||
|
@ -130,7 +130,7 @@ body: |
|
|||
frame-setup CFI_INSTRUCTION def_cfa_offset 8
|
||||
frame-setup CFI_INSTRUCTION offset $lr, -4
|
||||
frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
t2WhileLoopStart $r3, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR $r3, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
|
||||
bb.1.do.body.preheader:
|
||||
|
|
|
@ -188,7 +188,7 @@ body: |
|
|||
renamable $r12 = t2LDRi12 $sp, 44, 14, $noreg :: (load 4 from %fixed-stack.0, align 8)
|
||||
renamable $r5 = t2ADDri renamable $r12, 3, 14, $noreg, $noreg
|
||||
renamable $lr = t2LSRri killed renamable $r5, 2, 14, $noreg, $noreg
|
||||
t2WhileLoopStart renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
|
||||
bb.1.for.body.lr.ph:
|
||||
|
|
|
@ -6,8 +6,10 @@
|
|||
entry:
|
||||
%add = add i32 %block_size, 3
|
||||
%div = lshr i32 %add, 2
|
||||
%0 = call i1 @llvm.test.set.loop.iterations.i32(i32 %div)
|
||||
br i1 %0, label %for.body.lr.ph, label %for.cond.cleanup
|
||||
%0 = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
|
||||
%wls0 = extractvalue { i32, i1 } %0, 0
|
||||
%wls1 = extractvalue { i32, i1 } %0, 1
|
||||
br i1 %wls1, label %for.body.lr.ph, label %for.cond.cleanup
|
||||
|
||||
for.body.lr.ph: ; preds = %entry
|
||||
%.splatinsert.i41 = insertelement <4 x i32> undef, i32 %out_activation_min, i32 0
|
||||
|
@ -21,7 +23,7 @@
|
|||
ret i32 %res
|
||||
|
||||
for.body: ; preds = %for.body, %for.body.lr.ph
|
||||
%lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ %div, %for.body.lr.ph ]
|
||||
%lsr.iv = phi i32 [ %iv.next, %for.body ], [ %wls0, %for.body.lr.ph ]
|
||||
%input_1_vect.addr.052 = phi i8* [ %input_1_vect, %for.body.lr.ph ], [ %add.ptr, %for.body ]
|
||||
%input_2_vect.addr.051 = phi i8* [ %input_2_vect, %for.body.lr.ph ], [ %add.ptr14, %for.body ]
|
||||
%num_elements.049 = phi i32 [ %block_size, %for.body.lr.ph ], [ %sub, %for.body ]
|
||||
|
@ -47,9 +49,8 @@
|
|||
%add.ptr = getelementptr inbounds i8, i8* %input_1_vect.addr.052, i32 4
|
||||
%add.ptr14 = getelementptr inbounds i8, i8* %input_2_vect.addr.051, i32 4
|
||||
%sub = add i32 %num_elements.049, -4
|
||||
%iv.next = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %lsr.iv, i32 1)
|
||||
%iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
|
||||
%cmp = icmp ne i32 %iv.next, 0
|
||||
%lsr.iv.next = add i32 %lsr.iv, -1
|
||||
br i1 %cmp, label %for.body, label %for.cond.cleanup
|
||||
}
|
||||
declare <4 x i1> @llvm.arm.mve.vctp32(i32) #1
|
||||
|
@ -58,8 +59,8 @@
|
|||
declare <4 x i32> @llvm.arm.mve.min.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, i32, <4 x i1>, <4 x i32>) #1
|
||||
declare <4 x i32> @llvm.arm.mve.max.predicated.v4i32.v4i1(<4 x i32>, <4 x i32>, i32, <4 x i1>, <4 x i32>) #1
|
||||
declare i32 @llvm.arm.mve.vmldava.predicated.v4i32.v4i1(i32, i32, i32, i32, <4 x i32>, <4 x i32>, <4 x i1>) #1
|
||||
declare i1 @llvm.test.set.loop.iterations.i32(i32) #4
|
||||
declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) #4
|
||||
declare { i32, i1 } @llvm.test.start.loop.iterations.i32(i32) #4
|
||||
declare i32 @llvm.loop.decrement.reg.i32(i32, i32) #4
|
||||
...
|
||||
---
|
||||
name: vmldava_in_vpt
|
||||
|
@ -82,7 +83,7 @@ frameInfo:
|
|||
isReturnAddressTaken: false
|
||||
hasStackMap: false
|
||||
hasPatchPoint: false
|
||||
stackSize: 20
|
||||
stackSize: 16
|
||||
offsetAdjustment: 0
|
||||
maxAlignment: 4
|
||||
adjustsStack: false
|
||||
|
@ -120,117 +121,109 @@ stack:
|
|||
stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 1, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 2, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r6', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 3, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
|
||||
- { id: 2, name: '', type: spill-slot, offset: -12, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r5', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
- { id: 4, name: '', type: spill-slot, offset: -20, size: 4, alignment: 4,
|
||||
- { id: 3, name: '', type: spill-slot, offset: -16, size: 4, alignment: 4,
|
||||
stack-id: default, callee-saved-register: '$r4', callee-saved-restored: true,
|
||||
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
|
||||
callSites: []
|
||||
debugValueSubstitutions: []
|
||||
constants: []
|
||||
machineFunctionInfo: {}
|
||||
body: |
|
||||
; CHECK-LABEL: name: vmldava_in_vpt
|
||||
; CHECK: bb.0.entry:
|
||||
; CHECK: successors: %bb.1(0x40000000), %bb.3(0x40000000)
|
||||
; CHECK: liveins: $lr, $r0, $r1, $r2, $r3, $r4, $r5, $r6, $r7
|
||||
; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, killed $r6, killed $r7, killed $lr, implicit-def $sp, implicit $sp
|
||||
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 20
|
||||
; CHECK: liveins: $lr, $r0, $r1, $r2, $r3, $r4, $r5, $r6
|
||||
; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, killed $r6, killed $lr, implicit-def $sp, implicit $sp
|
||||
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 16
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r6, -12
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r5, -16
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -20
|
||||
; CHECK: renamable $r7 = tLDRspi $sp, 10, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.5)
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r6, -8
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r5, -12
|
||||
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -16
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 9, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.5)
|
||||
; CHECK: renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: $lr = MVE_WLSTP_32 killed renamable $r7, %bb.3
|
||||
; CHECK: renamable $r5, dead $cpsr = tADDi3 renamable $r4, 3, 14 /* CC::al */, $noreg
|
||||
; CHECK: dead renamable $r5, dead $cpsr = tLSRri killed renamable $r5, 2, 14 /* CC::al */, $noreg
|
||||
; CHECK: $lr = MVE_WLSTP_32 killed renamable $r4, %bb.3
|
||||
; CHECK: bb.1.for.body.lr.ph:
|
||||
; CHECK: successors: %bb.2(0x80000000)
|
||||
; CHECK: liveins: $lr, $r0, $r1, $r2, $r3
|
||||
; CHECK: $r5, $r12 = t2LDRDi8 $sp, 32, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.3), (load 4 from %fixed-stack.4, align 8)
|
||||
; CHECK: renamable $r4 = tLDRspi $sp, 5, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.0, align 8)
|
||||
; CHECK: renamable $r5 = tLDRspi $sp, 4, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.0, align 8)
|
||||
; CHECK: $r6, $r12 = t2LDRDi8 $sp, 28, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.3), (load 4 from %fixed-stack.4, align 8)
|
||||
; CHECK: renamable $q0 = MVE_VDUP32 killed renamable $r12, 0, $noreg, undef renamable $q0
|
||||
; CHECK: renamable $q1 = MVE_VDUP32 killed renamable $r5, 0, $noreg, undef renamable $q1
|
||||
; CHECK: renamable $q1 = MVE_VDUP32 killed renamable $r6, 0, $noreg, undef renamable $q1
|
||||
; CHECK: renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: bb.2.for.body:
|
||||
; CHECK: successors: %bb.2(0x7c000000), %bb.3(0x04000000)
|
||||
; CHECK: liveins: $lr, $q0, $q1, $r0, $r1, $r2, $r3, $r4, $r12
|
||||
; CHECK: liveins: $lr, $q0, $q1, $r0, $r1, $r2, $r3, $r5, $r12
|
||||
; CHECK: renamable $r1, renamable $q2 = MVE_VLDRWU32_post killed renamable $r1, 4, 0, $noreg :: (load 16 from %ir.input_2_cast, align 4)
|
||||
; CHECK: renamable $r0, renamable $q3 = MVE_VLDRWU32_post killed renamable $r0, 4, 0, $noreg :: (load 16 from %ir.input_1_cast, align 4)
|
||||
; CHECK: renamable $q2 = MVE_VADD_qr_i32 killed renamable $q2, renamable $r3, 0, $noreg, undef renamable $q2
|
||||
; CHECK: renamable $q3 = MVE_VADD_qr_i32 killed renamable $q3, renamable $r2, 0, $noreg, undef renamable $q3
|
||||
; CHECK: renamable $q2 = MVE_VMULi32 killed renamable $q3, killed renamable $q2, 0, $noreg, undef renamable $q2
|
||||
; CHECK: renamable $q2 = MVE_VADD_qr_i32 killed renamable $q2, renamable $r4, 0, $noreg, undef renamable $q2
|
||||
; CHECK: renamable $q2 = MVE_VMAXu32 killed renamable $q2, renamable $q1, 0, $noreg, undef renamable $q2
|
||||
; CHECK: renamable $q3 = MVE_VMLAS_qr_u32 killed renamable $q3, killed renamable $q2, renamable $r5, 0, $noreg
|
||||
; CHECK: renamable $q2 = MVE_VMAXu32 killed renamable $q3, renamable $q1, 0, $noreg, undef renamable $q2
|
||||
; CHECK: renamable $q3 = MVE_VMINu32 renamable $q2, renamable $q0, 0, $noreg, undef renamable $q3
|
||||
; CHECK: renamable $r12 = MVE_VMLADAVas32 killed renamable $r12, killed renamable $q3, killed renamable $q2, 0, killed $noreg
|
||||
; CHECK: $lr = MVE_LETP killed renamable $lr, %bb.2
|
||||
; CHECK: bb.3.for.cond.cleanup:
|
||||
; CHECK: liveins: $r12
|
||||
; CHECK: $r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
|
||||
; CHECK: tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $r5, def $r6, def $r7, def $pc, implicit killed $r0
|
||||
; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $r5, def $r6, def $pc, implicit killed $r0
|
||||
bb.0.entry:
|
||||
successors: %bb.1(0x40000000), %bb.3(0x40000000)
|
||||
liveins: $r0, $r1, $r2, $r3, $r4, $r5, $r6, $r7, $lr
|
||||
liveins: $r0, $r1, $r2, $r3, $r4, $r5, $r6, $lr
|
||||
|
||||
frame-setup tPUSH 14, $noreg, killed $r4, killed $r5, killed $r6, killed $r7, killed $lr, implicit-def $sp, implicit $sp
|
||||
frame-setup CFI_INSTRUCTION def_cfa_offset 20
|
||||
frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, killed $r6, killed $lr, implicit-def $sp, implicit $sp
|
||||
frame-setup CFI_INSTRUCTION def_cfa_offset 16
|
||||
frame-setup CFI_INSTRUCTION offset $lr, -4
|
||||
frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
frame-setup CFI_INSTRUCTION offset $r6, -12
|
||||
frame-setup CFI_INSTRUCTION offset $r5, -16
|
||||
frame-setup CFI_INSTRUCTION offset $r4, -20
|
||||
renamable $r7 = tLDRspi $sp, 10, 14, $noreg :: (load 4 from %fixed-stack.0)
|
||||
renamable $r12 = t2MOVi 0, 14, $noreg, $noreg
|
||||
renamable $r4, dead $cpsr = tADDi3 renamable $r7, 3, 14, $noreg
|
||||
renamable $r5, dead $cpsr = tLSRri killed renamable $r4, 2, 14, $noreg
|
||||
t2WhileLoopStart renamable $r5, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
frame-setup CFI_INSTRUCTION offset $r6, -8
|
||||
frame-setup CFI_INSTRUCTION offset $r5, -12
|
||||
frame-setup CFI_INSTRUCTION offset $r4, -16
|
||||
renamable $r4 = tLDRspi $sp, 9, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.0)
|
||||
renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
|
||||
renamable $r5, dead $cpsr = tADDi3 renamable $r4, 3, 14 /* CC::al */, $noreg
|
||||
renamable $r5, dead $cpsr = tLSRri killed renamable $r5, 2, 14 /* CC::al */, $noreg
|
||||
renamable $lr = t2WhileLoopStartLR killed renamable $r5, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.1.for.body.lr.ph:
|
||||
successors: %bb.2(0x80000000)
|
||||
liveins: $r0, $r1, $r2, $r3, $r5, $r7
|
||||
liveins: $lr, $r0, $r1, $r2, $r3, $r4
|
||||
|
||||
$r6 = tMOVr killed $r5, 14, $noreg
|
||||
$r5, $r12 = t2LDRDi8 $sp, 32, 14, $noreg :: (load 4 from %fixed-stack.2), (load 4 from %fixed-stack.1, align 8)
|
||||
renamable $r4 = tLDRspi $sp, 5, 14, $noreg :: (load 4 from %fixed-stack.5, align 8)
|
||||
renamable $r5 = tLDRspi $sp, 4, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.5, align 8)
|
||||
$r6, $r12 = t2LDRDi8 $sp, 28, 14 /* CC::al */, $noreg :: (load 4 from %fixed-stack.2), (load 4 from %fixed-stack.1, align 8)
|
||||
renamable $q0 = MVE_VDUP32 killed renamable $r12, 0, $noreg, undef renamable $q0
|
||||
renamable $q1 = MVE_VDUP32 killed renamable $r5, 0, $noreg, undef renamable $q1
|
||||
renamable $r12 = t2MOVi 0, 14, $noreg, $noreg
|
||||
renamable $q1 = MVE_VDUP32 killed renamable $r6, 0, $noreg, undef renamable $q1
|
||||
renamable $r12 = t2MOVi 0, 14 /* CC::al */, $noreg, $noreg
|
||||
|
||||
bb.2.for.body:
|
||||
successors: %bb.2(0x7c000000), %bb.3(0x04000000)
|
||||
liveins: $q0, $q1, $r0, $r1, $r2, $r3, $r4, $r6, $r7, $r12
|
||||
liveins: $lr, $q0, $q1, $r0, $r1, $r2, $r3, $r4, $r5, $r12
|
||||
|
||||
renamable $vpr = MVE_VCTP32 renamable $r7, 0, $noreg
|
||||
renamable $vpr = MVE_VCTP32 renamable $r4, 0, $noreg
|
||||
MVE_VPST 8, implicit $vpr
|
||||
renamable $r1, renamable $q2 = MVE_VLDRWU32_post killed renamable $r1, 4, 1, renamable $vpr :: (load 16 from %ir.input_2_cast, align 4)
|
||||
MVE_VPST 8, implicit $vpr
|
||||
renamable $r0, renamable $q3 = MVE_VLDRWU32_post killed renamable $r0, 4, 1, renamable $vpr :: (load 16 from %ir.input_1_cast, align 4)
|
||||
renamable $q2 = MVE_VADD_qr_i32 killed renamable $q2, renamable $r3, 0, $noreg, undef renamable $q2
|
||||
renamable $q3 = MVE_VADD_qr_i32 killed renamable $q3, renamable $r2, 0, $noreg, undef renamable $q3
|
||||
$lr = tMOVr $r6, 14, $noreg
|
||||
renamable $q2 = MVE_VMULi32 killed renamable $q3, killed renamable $q2, 0, $noreg, undef renamable $q2
|
||||
renamable $r6, dead $cpsr = tSUBi8 killed $r6, 1, 14, $noreg
|
||||
renamable $q2 = MVE_VADD_qr_i32 killed renamable $q2, renamable $r4, 0, $noreg, undef renamable $q2
|
||||
renamable $r7, dead $cpsr = tSUBi8 killed renamable $r7, 4, 14, $noreg
|
||||
renamable $r4, dead $cpsr = tSUBi8 killed renamable $r4, 4, 14 /* CC::al */, $noreg
|
||||
renamable $q3 = MVE_VMLAS_qr_u32 killed renamable $q3, killed renamable $q2, renamable $r5, 0, $noreg
|
||||
MVE_VPST 2, implicit $vpr
|
||||
renamable $q2 = MVE_VMAXu32 killed renamable $q2, renamable $q1, 1, renamable $vpr, undef renamable $q2
|
||||
renamable $q2 = MVE_VMAXu32 killed renamable $q3, renamable $q1, 1, renamable $vpr, undef renamable $q2
|
||||
renamable $q3 = MVE_VMINu32 renamable $q2, renamable $q0, 1, renamable $vpr, undef renamable $q3
|
||||
renamable $r12 = MVE_VMLADAVas32 killed renamable $r12, killed renamable $q3, killed renamable $q2, 1, killed renamable $vpr
|
||||
renamable $lr = t2LoopDec killed renamable $lr, 1
|
||||
t2LoopEnd killed renamable $lr, %bb.2, implicit-def dead $cpsr
|
||||
tB %bb.3, 14, $noreg
|
||||
renamable $lr = t2LoopEndDec killed renamable $lr, %bb.2, implicit-def dead $cpsr
|
||||
tB %bb.3, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.3.for.cond.cleanup:
|
||||
liveins: $r12
|
||||
|
||||
$r0 = tMOVr killed $r12, 14, $noreg
|
||||
tPOP_RET 14, $noreg, def $r4, def $r5, def $r6, def $r7, def $pc, implicit killed $r0
|
||||
$r0 = tMOVr killed $r12, 14 /* CC::al */, $noreg
|
||||
frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $r5, def $r6, def $pc, implicit killed $r0
|
||||
|
||||
...
|
||||
|
|
|
@ -164,81 +164,75 @@ define dso_local i32 @b(i32* %c, i32 %d, i32 %e) "frame-pointer"="all" {
|
|||
; CHECK-NEXT: push.w {r8, r9, r10, r11}
|
||||
; CHECK-NEXT: .pad #8
|
||||
; CHECK-NEXT: sub sp, #8
|
||||
; CHECK-NEXT: str r1, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: cmp.w r1, #0
|
||||
; CHECK-NEXT: beq .LBB2_3
|
||||
; CHECK-NEXT: b .LBB2_1
|
||||
; CHECK-NEXT: .LBB2_1: @ %while.body.preheader
|
||||
; CHECK-NEXT: wls lr, r1, .LBB2_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %while.body.preheader
|
||||
; CHECK-NEXT: adds r1, r0, #4
|
||||
; CHECK-NEXT: mov r3, r2
|
||||
; CHECK-NEXT: mvn r2, #1
|
||||
; CHECK-NEXT: mvn r3, #1
|
||||
; CHECK-NEXT: @ implicit-def: $r9
|
||||
; CHECK-NEXT: @ implicit-def: $r10
|
||||
; CHECK-NEXT: @ implicit-def: $r6
|
||||
; CHECK-NEXT: @ implicit-def: $r8
|
||||
; CHECK-NEXT: str r3, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: @ implicit-def: $r4
|
||||
; CHECK-NEXT: str r2, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: .LBB2_2: @ %while.body
|
||||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: mov lr, r1
|
||||
; CHECK-NEXT: str r1, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: ldr r1, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr.w r8, [r10]
|
||||
; CHECK-NEXT: ldr r1, [r1, #-4]
|
||||
; CHECK-NEXT: mul r11, r8, r0
|
||||
; CHECK-NEXT: adds r0, #4
|
||||
; CHECK-NEXT: mul r1, r1, r9
|
||||
; CHECK-NEXT: adds.w r12, r1, #-2147483648
|
||||
; CHECK-NEXT: asr.w r5, r1, #31
|
||||
; CHECK-NEXT: ldr.w r1, [r10]
|
||||
; CHECK-NEXT: add.w r1, r11, #-2147483648
|
||||
; CHECK-NEXT: adc r5, r5, #0
|
||||
; CHECK-NEXT: mul r11, r1, r0
|
||||
; CHECK-NEXT: adds r0, #4
|
||||
; CHECK-NEXT: add.w r3, r11, #-2147483648
|
||||
; CHECK-NEXT: asrl r12, r5, r3
|
||||
; CHECK-NEXT: smull r4, r3, r1, r12
|
||||
; CHECK-NEXT: lsll r4, r3, #30
|
||||
; CHECK-NEXT: asrs r5, r3, #31
|
||||
; CHECK-NEXT: mov r4, r3
|
||||
; CHECK-NEXT: lsll r4, r5, r1
|
||||
; CHECK-NEXT: lsll r4, r5, #30
|
||||
; CHECK-NEXT: ldrd r4, r11, [r2]
|
||||
; CHECK-NEXT: asrs r3, r5, #31
|
||||
; CHECK-NEXT: asrl r12, r5, r1
|
||||
; CHECK-NEXT: smull r2, r1, r8, r12
|
||||
; CHECK-NEXT: lsll r2, r1, #30
|
||||
; CHECK-NEXT: asrs r5, r1, #31
|
||||
; CHECK-NEXT: mov r2, r1
|
||||
; CHECK-NEXT: lsll r2, r5, r8
|
||||
; CHECK-NEXT: lsll r2, r5, #30
|
||||
; CHECK-NEXT: ldrd r2, r11, [r3]
|
||||
; CHECK-NEXT: asrs r1, r5, #31
|
||||
; CHECK-NEXT: mov r12, r5
|
||||
; CHECK-NEXT: ldr.w r5, [lr]
|
||||
; CHECK-NEXT: muls r4, r6, r4
|
||||
; CHECK-NEXT: mul r5, r5, r9
|
||||
; CHECK-NEXT: asrs r5, r4, #31
|
||||
; CHECK-NEXT: muls r2, r6, r2
|
||||
; CHECK-NEXT: adds r2, #2
|
||||
; CHECK-NEXT: lsll r12, r1, r2
|
||||
; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: add.w r1, r12, #-2147483648
|
||||
; CHECK-NEXT: ldr r2, [r2]
|
||||
; CHECK-NEXT: mul r2, r2, r9
|
||||
; CHECK-NEXT: add.w r9, r9, #4
|
||||
; CHECK-NEXT: adds r4, #2
|
||||
; CHECK-NEXT: lsll r12, r3, r4
|
||||
; CHECK-NEXT: asr.w r4, r8, #31
|
||||
; CHECK-NEXT: adds.w r3, r8, r5
|
||||
; CHECK-NEXT: add.w r12, r12, #-2147483648
|
||||
; CHECK-NEXT: adc.w r4, r4, r5, asr #31
|
||||
; CHECK-NEXT: smull r5, r6, r11, r6
|
||||
; CHECK-NEXT: adds.w r3, r3, #-2147483648
|
||||
; CHECK-NEXT: adc r3, r4, #0
|
||||
; CHECK-NEXT: asrs r4, r3, #31
|
||||
; CHECK-NEXT: subs r5, r3, r5
|
||||
; CHECK-NEXT: sbcs r4, r6
|
||||
; CHECK-NEXT: adds.w r6, r5, #-2147483648
|
||||
; CHECK-NEXT: adc r5, r4, #0
|
||||
; CHECK-NEXT: asrl r6, r5, r12
|
||||
; CHECK-NEXT: adds r4, r4, r2
|
||||
; CHECK-NEXT: adc.w r2, r5, r2, asr #31
|
||||
; CHECK-NEXT: adds.w r5, r4, #-2147483648
|
||||
; CHECK-NEXT: smull r6, r4, r11, r6
|
||||
; CHECK-NEXT: adc r2, r2, #0
|
||||
; CHECK-NEXT: asrs r5, r2, #31
|
||||
; CHECK-NEXT: subs r6, r2, r6
|
||||
; CHECK-NEXT: sbcs r5, r4
|
||||
; CHECK-NEXT: adds.w r6, r6, #-2147483648
|
||||
; CHECK-NEXT: adc r5, r5, #0
|
||||
; CHECK-NEXT: asrl r6, r5, r1
|
||||
; CHECK-NEXT: movs r1, #2
|
||||
; CHECK-NEXT: lsrl r6, r5, #2
|
||||
; CHECK-NEXT: movs r5, #2
|
||||
; CHECK-NEXT: str r6, [r5]
|
||||
; CHECK-NEXT: ldr r5, [r2], #-4
|
||||
; CHECK-NEXT: mls r1, r5, r1, r3
|
||||
; CHECK-NEXT: adds.w r8, r1, #-2147483648
|
||||
; CHECK-NEXT: asr.w r3, r1, #31
|
||||
; CHECK-NEXT: adc r1, r3, #0
|
||||
; CHECK-NEXT: ldr r3, [sp] @ 4-byte Reload
|
||||
; CHECK-NEXT: lsrl r8, r1, #2
|
||||
; CHECK-NEXT: rsb.w r1, r8, #0
|
||||
; CHECK-NEXT: str r6, [r1]
|
||||
; CHECK-NEXT: ldr r1, [r3], #-4
|
||||
; CHECK-NEXT: mls r1, r1, r8, r2
|
||||
; CHECK-NEXT: adds.w r4, r1, #-2147483648
|
||||
; CHECK-NEXT: asr.w r2, r1, #31
|
||||
; CHECK-NEXT: adc r1, r2, #0
|
||||
; CHECK-NEXT: ldr r2, [sp] @ 4-byte Reload
|
||||
; CHECK-NEXT: lsrl r4, r1, #2
|
||||
; CHECK-NEXT: rsbs r1, r4, #0
|
||||
; CHECK-NEXT: str r1, [r10, #-4]
|
||||
; CHECK-NEXT: add.w r10, r10, #4
|
||||
; CHECK-NEXT: str r1, [r3]
|
||||
; CHECK-NEXT: mov r1, lr
|
||||
; CHECK-NEXT: add.w r1, lr, #4
|
||||
; CHECK-NEXT: ldr.w lr, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: subs.w lr, lr, #1
|
||||
; CHECK-NEXT: str.w lr, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: bne .LBB2_2
|
||||
; CHECK-NEXT: b .LBB2_3
|
||||
; CHECK-NEXT: str r1, [r2]
|
||||
; CHECK-NEXT: ldr r1, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: adds r1, #4
|
||||
; CHECK-NEXT: le lr, .LBB2_2
|
||||
; CHECK-NEXT: .LBB2_3: @ %while.end
|
||||
; CHECK-NEXT: add sp, #8
|
||||
; CHECK-NEXT: pop.w {r8, r9, r10, r11}
|
||||
|
@ -328,20 +322,21 @@ define void @callinpreheader(i32* noalias nocapture readonly %pAngle, i32* nocap
|
|||
; CHECK: @ %bb.0: @ %entry
|
||||
; CHECK-NEXT: .save {r4, r5, r6, lr}
|
||||
; CHECK-NEXT: push {r4, r5, r6, lr}
|
||||
; CHECK-NEXT: subs r6, r2, #0
|
||||
; CHECK-NEXT: mov r5, r0
|
||||
; CHECK-NEXT: mov r4, r1
|
||||
; CHECK-NEXT: movs r0, #0
|
||||
; CHECK-NEXT: wls lr, r2, .LBB3_3
|
||||
; CHECK-NEXT: mov.w r0, #0
|
||||
; CHECK-NEXT: beq .LBB3_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %for.body.ph
|
||||
; CHECK-NEXT: mov r6, r2
|
||||
; CHECK-NEXT: bl callee
|
||||
; CHECK-NEXT: mov lr, r6
|
||||
; CHECK-NEXT: movs r0, #0
|
||||
; CHECK-NEXT: .LBB3_2: @ %for.body
|
||||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: ldr r1, [r5], #4
|
||||
; CHECK-NEXT: subs r6, #1
|
||||
; CHECK-NEXT: add r0, r1
|
||||
; CHECK-NEXT: le lr, .LBB3_2
|
||||
; CHECK-NEXT: cbz r6, .LBB3_3
|
||||
; CHECK-NEXT: le .LBB3_2
|
||||
; CHECK-NEXT: .LBB3_3: @ %for.cond.cleanup
|
||||
; CHECK-NEXT: str r0, [r4]
|
||||
; CHECK-NEXT: pop {r4, r5, r6, pc}
|
||||
|
|
|
@ -189,7 +189,7 @@ body: |
|
|||
successors: %bb.2(0x40000000), %bb.1(0x40000000)
|
||||
|
||||
$r0 = tLDRspi $sp, 7, 14, $noreg :: (load 4 from %stack.0)
|
||||
t2WhileLoopStart killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
tB %bb.2, 14, $noreg
|
||||
|
||||
...
|
||||
|
|
|
@ -119,7 +119,7 @@ body: |
|
|||
frame-setup CFI_INSTRUCTION def_cfa_offset 8
|
||||
frame-setup CFI_INSTRUCTION offset $lr, -4
|
||||
frame-setup CFI_INSTRUCTION offset $r7, -8
|
||||
t2WhileLoopStart $r2, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR $r2, %bb.3, implicit-def dead $cpsr
|
||||
tB %bb.1, 14, $noreg
|
||||
|
||||
bb.1.while.body.preheader:
|
||||
|
|
|
@ -228,7 +228,7 @@ body: |
|
|||
renamable $r12 = t2BICri killed renamable $r12, 15, 14, $noreg, $noreg
|
||||
renamable $r12 = t2SUBri killed renamable $r12, 16, 14, $noreg, $noreg
|
||||
renamable $lr = nuw nsw t2ADDrs killed renamable $lr, killed renamable $r12, 35, 14, $noreg, $noreg
|
||||
t2WhileLoopStart renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
tB %bb.3, 14, $noreg
|
||||
|
||||
bb.1.vector.ph:
|
||||
|
@ -345,7 +345,7 @@ body: |
|
|||
renamable $r12 = t2BICri killed renamable $r12, 7, 14, $noreg, $noreg
|
||||
renamable $r12 = t2SUBri killed renamable $r12, 8, 14, $noreg, $noreg
|
||||
renamable $lr = nuw nsw t2ADDrs killed renamable $lr, killed renamable $r12, 27, 14, $noreg, $noreg
|
||||
t2WhileLoopStart renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
tB %bb.2, 14, $noreg
|
||||
|
||||
bb.1.vector.body:
|
||||
|
@ -477,7 +477,7 @@ body: |
|
|||
renamable $r3, dead $cpsr = tMOVi8 1, 14, $noreg
|
||||
renamable $lr = nuw nsw t2ADDrs killed renamable $r3, killed renamable $r12, 19, 14, $noreg, $noreg
|
||||
renamable $r12 = t2MOVi 0, 14, $noreg, $noreg
|
||||
t2WhileLoopStart renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
tB %bb.4, 14, $noreg
|
||||
|
||||
bb.1.vector.ph:
|
||||
|
|
|
@ -47,7 +47,7 @@ body: |
|
|||
; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
|
||||
; CHECK: bb.2:
|
||||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: t2WhileLoopStart killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: tB %bb.3, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.1:
|
||||
; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
|
||||
|
@ -72,7 +72,7 @@ body: |
|
|||
successors: %bb.3(0x80000000)
|
||||
liveins: $r0, $r1, $r2
|
||||
|
||||
t2WhileLoopStart killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
|
||||
bb.3:
|
||||
successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
|
@ -97,7 +97,7 @@ body: |
|
|||
; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
|
||||
; CHECK: bb.2:
|
||||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: t2WhileLoopStart killed renamable $r0, %bb.0, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r0, %bb.0, implicit-def dead $cpsr
|
||||
; CHECK: bb.3:
|
||||
; CHECK: successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
; CHECK: renamable $r0 = tLDRi renamable $r2, 0, 14 /* CC::al */, $noreg
|
||||
|
@ -119,7 +119,7 @@ body: |
|
|||
successors: %bb.3(0x80000000)
|
||||
liveins: $r0, $r1, $r2
|
||||
|
||||
t2WhileLoopStart killed renamable $r0, %bb.0, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r0, %bb.0, implicit-def dead $cpsr
|
||||
|
||||
bb.3:
|
||||
successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
|
@ -144,14 +144,14 @@ body: |
|
|||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: $lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r0 = t2ADDrs killed renamable $r2, killed $r0, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: t2WhileLoopStart killed renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: tB %bb.3, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.1:
|
||||
; CHECK: successors: %bb.4(0x80000000)
|
||||
; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
; CHECK: t2IT 11, 8, implicit-def $itstate
|
||||
; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
|
||||
; CHECK: t2WhileLoopStart killed renamable $r1, %bb.0, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r1, %bb.0, implicit-def dead $cpsr
|
||||
; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.3:
|
||||
; CHECK: successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
|
@ -160,7 +160,7 @@ body: |
|
|||
; CHECK: bb.4:
|
||||
; CHECK: successors: %bb.5(0x80000000)
|
||||
; CHECK: renamable $r0 = t2ADDrs killed renamable $r3, renamable $r1, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: t2WhileLoopStart killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
; CHECK: bb.5:
|
||||
; CHECK: successors: %bb.5(0x7c000000), %bb.6(0x04000000)
|
||||
; CHECK: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.5, implicit-def dead $cpsr
|
||||
|
@ -182,7 +182,7 @@ body: |
|
|||
tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
t2IT 11, 8, implicit-def $itstate
|
||||
frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
|
||||
t2WhileLoopStart killed renamable $r1, %bb.0, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r1, %bb.0, implicit-def dead $cpsr
|
||||
t2B %bb.4, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.1:
|
||||
|
@ -191,7 +191,7 @@ body: |
|
|||
|
||||
$lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
renamable $r0 = t2ADDrs killed renamable $r2, killed $r0, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
t2WhileLoopStart killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
|
||||
bb.2:
|
||||
successors: %bb.2(0x7c000000), %bb.3(0x04000000)
|
||||
|
@ -205,7 +205,7 @@ body: |
|
|||
liveins: $r1, $r3
|
||||
|
||||
renamable $r0 = t2ADDrs killed renamable $r3, renamable $r1, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
t2WhileLoopStart killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
|
||||
bb.5:
|
||||
successors: %bb.5(0x7c000000), %bb.6(0x04000000)
|
||||
|
@ -232,13 +232,13 @@ body: |
|
|||
; CHECK: tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
; CHECK: t2IT 11, 8, implicit-def $itstate
|
||||
; CHECK: frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
|
||||
; CHECK: t2WhileLoopStart killed renamable $r1, %bb.2, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r1, %bb.2, implicit-def dead $cpsr
|
||||
; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg
|
||||
; CHECK: bb.2:
|
||||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: $lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
; CHECK: renamable $r0 = t2ADDrs killed renamable $r2, killed $r0, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: t2WhileLoopStart killed renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $lr, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: bb.3:
|
||||
; CHECK: successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
; CHECK: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
|
@ -246,7 +246,7 @@ body: |
|
|||
; CHECK: bb.4:
|
||||
; CHECK: successors: %bb.5(0x80000000)
|
||||
; CHECK: renamable $r0 = t2ADDrs killed renamable $r3, renamable $r1, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
; CHECK: t2WhileLoopStart killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
; CHECK: bb.5:
|
||||
; CHECK: successors: %bb.5(0x7c000000), %bb.6(0x04000000)
|
||||
; CHECK: renamable $lr = t2LoopEndDec killed renamable $lr, %bb.5, implicit-def dead $cpsr
|
||||
|
@ -268,7 +268,7 @@ body: |
|
|||
tCMPi8 renamable $r1, 1, 14 /* CC::al */, $noreg, implicit-def $cpsr
|
||||
t2IT 11, 8, implicit-def $itstate
|
||||
frame-destroy tPOP_RET 11 /* CC::lt */, killed $cpsr, def $r7, def $pc, implicit killed $itstate
|
||||
t2WhileLoopStart killed renamable $r1, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r1, %bb.1, implicit-def dead $cpsr
|
||||
t2B %bb.4, 14 /* CC::al */, $noreg
|
||||
|
||||
bb.1:
|
||||
|
@ -277,7 +277,7 @@ body: |
|
|||
|
||||
$lr = tMOVr $r0, 14 /* CC::al */, $noreg
|
||||
renamable $r0 = t2ADDrs killed renamable $r2, killed $r0, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
t2WhileLoopStart killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $lr, %bb.3, implicit-def dead $cpsr
|
||||
|
||||
bb.2:
|
||||
successors: %bb.2(0x7c000000), %bb.3(0x04000000)
|
||||
|
@ -291,7 +291,7 @@ body: |
|
|||
liveins: $r1, $r3
|
||||
|
||||
renamable $r0 = t2ADDrs killed renamable $r3, renamable $r1, 18, 14 /* CC::al */, $noreg, $noreg
|
||||
t2WhileLoopStart killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r1, %bb.6, implicit-def dead $cpsr
|
||||
|
||||
bb.5:
|
||||
successors: %bb.5(0x7c000000), %bb.6(0x04000000)
|
||||
|
@ -318,7 +318,7 @@ body: |
|
|||
; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc
|
||||
; CHECK: bb.2:
|
||||
; CHECK: successors: %bb.3(0x80000000)
|
||||
; CHECK: t2WhileLoopStart killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: $lr = t2WhileLoopStartLR killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
; CHECK: bb.3:
|
||||
; CHECK: successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
; CHECK: renamable $r0 = tLDRi renamable $r2, 0, 14 /* CC::al */, $noreg
|
||||
|
@ -341,7 +341,7 @@ body: |
|
|||
successors: %bb.3(0x80000000)
|
||||
liveins: $r0, $r1, $r2
|
||||
|
||||
t2WhileLoopStart killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
$lr = t2WhileLoopStartLR killed renamable $r0, %bb.1, implicit-def dead $cpsr
|
||||
|
||||
bb.3:
|
||||
successors: %bb.3(0x7c000000), %bb.1(0x04000000)
|
||||
|
|
|
@ -785,24 +785,24 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
|
||||
; CHECK-NEXT: .pad #16
|
||||
; CHECK-NEXT: sub sp, #16
|
||||
; CHECK-NEXT: ldrh r5, [r0]
|
||||
; CHECK-NEXT: ldr.w r9, [r0, #4]
|
||||
; CHECK-NEXT: subs r6, r5, #1
|
||||
; CHECK-NEXT: ldrh.w r9, [r0]
|
||||
; CHECK-NEXT: ldr.w r10, [r0, #4]
|
||||
; CHECK-NEXT: sub.w r6, r9, #1
|
||||
; CHECK-NEXT: cmp r6, #3
|
||||
; CHECK-NEXT: bhi .LBB15_6
|
||||
; CHECK-NEXT: @ %bb.1: @ %if.then
|
||||
; CHECK-NEXT: ldr r7, [r0, #8]
|
||||
; CHECK-NEXT: add.w r4, r9, r6, lsl #1
|
||||
; CHECK-NEXT: lsr.w lr, r3, #2
|
||||
; CHECK-NEXT: add.w r4, r10, r6, lsl #1
|
||||
; CHECK-NEXT: lsrs r5, r3, #2
|
||||
; CHECK-NEXT: ldrh.w r8, [r7, #6]
|
||||
; CHECK-NEXT: ldrh.w r12, [r7, #4]
|
||||
; CHECK-NEXT: ldrh r6, [r7, #2]
|
||||
; CHECK-NEXT: ldrh r7, [r7]
|
||||
; CHECK-NEXT: wls lr, lr, .LBB15_5
|
||||
; CHECK-NEXT: wls lr, r5, .LBB15_5
|
||||
; CHECK-NEXT: @ %bb.2: @ %while.body.lr.ph
|
||||
; CHECK-NEXT: str r5, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: str.w r9, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: bic r5, r3, #3
|
||||
; CHECK-NEXT: add.w r10, r9, #2
|
||||
; CHECK-NEXT: add.w r9, r10, #2
|
||||
; CHECK-NEXT: str r5, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: add.w r5, r2, r5, lsl #1
|
||||
; CHECK-NEXT: str r5, [sp, #4] @ 4-byte Spill
|
||||
|
@ -810,71 +810,71 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: .LBB15_3: @ %while.body
|
||||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r1], #8
|
||||
; CHECK-NEXT: sub.w r11, r10, #2
|
||||
; CHECK-NEXT: add.w r5, r10, #2
|
||||
; CHECK-NEXT: sub.w r11, r9, #2
|
||||
; CHECK-NEXT: add.w r5, r9, #2
|
||||
; CHECK-NEXT: vstrb.8 q0, [r4], #8
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r11]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r10]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9]
|
||||
; CHECK-NEXT: vmul.f16 q0, q0, r7
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r12
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r10, #4]
|
||||
; CHECK-NEXT: add.w r10, r10, #8
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9, #4]
|
||||
; CHECK-NEXT: add.w r9, r9, #8
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r8
|
||||
; CHECK-NEXT: vstrb.8 q0, [r2], #8
|
||||
; CHECK-NEXT: le lr, .LBB15_3
|
||||
; CHECK-NEXT: @ %bb.4: @ %while.end.loopexit
|
||||
; CHECK-NEXT: ldr r2, [sp] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r1, [sp, #8] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r5, [sp, #12] @ 4-byte Reload
|
||||
; CHECK-NEXT: add.w r9, r9, r2, lsl #1
|
||||
; CHECK-NEXT: ldr.w r9, [sp, #12] @ 4-byte Reload
|
||||
; CHECK-NEXT: add.w r10, r10, r2, lsl #1
|
||||
; CHECK-NEXT: add.w r1, r1, r2, lsl #1
|
||||
; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: .LBB15_5: @ %while.end
|
||||
; CHECK-NEXT: and lr, r3, #3
|
||||
; CHECK-NEXT: and r5, r3, #3
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r1]
|
||||
; CHECK-NEXT: vctp.16 lr
|
||||
; CHECK-NEXT: vctp.16 r5
|
||||
; CHECK-NEXT: vpst
|
||||
; CHECK-NEXT: vstrht.16 q0, [r4]
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r9]
|
||||
; CHECK-NEXT: add.w r1, r9, #2
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r10]
|
||||
; CHECK-NEXT: add.w r1, r10, #2
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: add.w r1, r9, #6
|
||||
; CHECK-NEXT: add.w r1, r10, #6
|
||||
; CHECK-NEXT: vmul.f16 q0, q0, r7
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9, #4]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r10, #4]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r12
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r8
|
||||
; CHECK-NEXT: vpst
|
||||
; CHECK-NEXT: vstrht.16 q0, [r2]
|
||||
; CHECK-NEXT: ldr.w r9, [r0, #4]
|
||||
; CHECK-NEXT: ldr.w r10, [r0, #4]
|
||||
; CHECK-NEXT: .LBB15_6: @ %if.end
|
||||
; CHECK-NEXT: add.w r0, r9, r3, lsl #1
|
||||
; CHECK-NEXT: lsr.w lr, r5, #2
|
||||
; CHECK-NEXT: wls lr, lr, .LBB15_10
|
||||
; CHECK-NEXT: add.w r0, r10, r3, lsl #1
|
||||
; CHECK-NEXT: lsr.w r1, r9, #2
|
||||
; CHECK-NEXT: wls lr, r1, .LBB15_10
|
||||
; CHECK-NEXT: @ %bb.7: @ %while.body51.preheader
|
||||
; CHECK-NEXT: bic r2, r5, #3
|
||||
; CHECK-NEXT: bic r2, r9, #3
|
||||
; CHECK-NEXT: adds r1, r2, r3
|
||||
; CHECK-NEXT: mov r3, r9
|
||||
; CHECK-NEXT: add.w r1, r9, r1, lsl #1
|
||||
; CHECK-NEXT: mov r3, r10
|
||||
; CHECK-NEXT: add.w r1, r10, r1, lsl #1
|
||||
; CHECK-NEXT: .LBB15_8: @ %while.body51
|
||||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r0], #8
|
||||
; CHECK-NEXT: vstrb.8 q0, [r3], #8
|
||||
; CHECK-NEXT: le lr, .LBB15_8
|
||||
; CHECK-NEXT: @ %bb.9: @ %while.end55.loopexit
|
||||
; CHECK-NEXT: add.w r9, r9, r2, lsl #1
|
||||
; CHECK-NEXT: add.w r10, r10, r2, lsl #1
|
||||
; CHECK-NEXT: mov r0, r1
|
||||
; CHECK-NEXT: .LBB15_10: @ %while.end55
|
||||
; CHECK-NEXT: ands r1, r5, #3
|
||||
; CHECK-NEXT: ands r1, r9, #3
|
||||
; CHECK-NEXT: beq .LBB15_12
|
||||
; CHECK-NEXT: @ %bb.11: @ %if.then59
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r0]
|
||||
; CHECK-NEXT: vctp.16 r1
|
||||
; CHECK-NEXT: vpst
|
||||
; CHECK-NEXT: vstrht.16 q0, [r9]
|
||||
; CHECK-NEXT: vstrht.16 q0, [r10]
|
||||
; CHECK-NEXT: .LBB15_12: @ %if.end61
|
||||
; CHECK-NEXT: add sp, #16
|
||||
; CHECK-NEXT: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
|
||||
|
@ -1052,36 +1052,36 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, half* noca
|
|||
; CHECK-NEXT: .pad #24
|
||||
; CHECK-NEXT: sub sp, #24
|
||||
; CHECK-NEXT: cmp r3, #8
|
||||
; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: blo.w .LBB16_12
|
||||
; CHECK-NEXT: @ %bb.1: @ %entry
|
||||
; CHECK-NEXT: lsrs.w r12, r3, #2
|
||||
; CHECK-NEXT: beq.w .LBB16_12
|
||||
; CHECK-NEXT: @ %bb.2: @ %while.body.lr.ph
|
||||
; CHECK-NEXT: ldrh r4, [r0]
|
||||
; CHECK-NEXT: movs r6, #1
|
||||
; CHECK-NEXT: movs r1, #1
|
||||
; CHECK-NEXT: ldrd r5, r3, [r0, #4]
|
||||
; CHECK-NEXT: sub.w r0, r4, #8
|
||||
; CHECK-NEXT: and r8, r0, #7
|
||||
; CHECK-NEXT: add.w r7, r0, r0, lsr #29
|
||||
; CHECK-NEXT: asr.w lr, r7, #3
|
||||
; CHECK-NEXT: cmp.w lr, #1
|
||||
; CHECK-NEXT: and r0, r0, #7
|
||||
; CHECK-NEXT: asrs r6, r7, #3
|
||||
; CHECK-NEXT: cmp r6, #1
|
||||
; CHECK-NEXT: it gt
|
||||
; CHECK-NEXT: asrgt r6, r7, #3
|
||||
; CHECK-NEXT: asrgt r1, r7, #3
|
||||
; CHECK-NEXT: add.w r7, r5, r4, lsl #1
|
||||
; CHECK-NEXT: subs r7, #2
|
||||
; CHECK-NEXT: str r7, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r1, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: subs r1, r7, #2
|
||||
; CHECK-NEXT: rsbs r7, r4, #0
|
||||
; CHECK-NEXT: str r7, [sp, #8] @ 4-byte Spill
|
||||
; CHECK-NEXT: add.w r7, r3, #16
|
||||
; CHECK-NEXT: str r6, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r4, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r7, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r0, [sp, #16] @ 4-byte Spill
|
||||
; CHECK-NEXT: b .LBB16_4
|
||||
; CHECK-NEXT: .LBB16_3: @ %while.end
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: ldr r0, [sp, #8] @ 4-byte Reload
|
||||
; CHECK-NEXT: subs.w r12, r12, #1
|
||||
; CHECK-NEXT: ldr r1, [sp, #16] @ 4-byte Reload
|
||||
; CHECK-NEXT: vstrb.8 q0, [r2], #8
|
||||
; CHECK-NEXT: add.w r0, r5, r0, lsl #1
|
||||
; CHECK-NEXT: add.w r5, r0, #8
|
||||
|
@ -1090,40 +1090,39 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, half* noca
|
|||
; CHECK-NEXT: @ =>This Loop Header: Depth=1
|
||||
; CHECK-NEXT: @ Child Loop BB16_6 Depth 2
|
||||
; CHECK-NEXT: @ Child Loop BB16_10 Depth 2
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r1], #8
|
||||
; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldrh.w lr, [r3, #14]
|
||||
; CHECK-NEXT: ldrh r0, [r3, #12]
|
||||
; CHECK-NEXT: str r1, [sp, #16] @ 4-byte Spill
|
||||
; CHECK-NEXT: ldr r1, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldrh r4, [r3, #10]
|
||||
; CHECK-NEXT: ldrh r7, [r3, #8]
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r0], #8
|
||||
; CHECK-NEXT: ldrh.w r8, [r3, #12]
|
||||
; CHECK-NEXT: ldrh r7, [r3, #10]
|
||||
; CHECK-NEXT: ldrh r4, [r3, #8]
|
||||
; CHECK-NEXT: ldrh r6, [r3, #6]
|
||||
; CHECK-NEXT: ldrh.w r9, [r3, #4]
|
||||
; CHECK-NEXT: ldrh.w r11, [r3, #2]
|
||||
; CHECK-NEXT: ldrh.w r10, [r3]
|
||||
; CHECK-NEXT: vstrb.8 q0, [r1], #8
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r5]
|
||||
; CHECK-NEXT: str r1, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: adds r1, r5, #2
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: str r0, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: adds r0, r5, #2
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: vmul.f16 q0, q0, r10
|
||||
; CHECK-NEXT: adds r1, r5, #6
|
||||
; CHECK-NEXT: adds r0, r5, #6
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r11
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #4]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r9
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: add.w r1, r5, #10
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: add.w r0, r5, #10
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #8]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r7
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r4
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #12]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: add.w r0, r5, #14
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r7
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #12]
|
||||
; CHECK-NEXT: adds r5, #16
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r8
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: ldr r0, [sp, #12] @ 4-byte Reload
|
||||
; CHECK-NEXT: adds r5, #16
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, lr
|
||||
; CHECK-NEXT: cmp r0, #16
|
||||
; CHECK-NEXT: blo .LBB16_7
|
||||
|
@ -1137,25 +1136,25 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, half* noca
|
|||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: ldrh r0, [r6], #16
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5]
|
||||
; CHECK-NEXT: adds r1, r5, #2
|
||||
; CHECK-NEXT: adds r4, r5, #2
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4]
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-14]
|
||||
; CHECK-NEXT: adds r1, r5, #6
|
||||
; CHECK-NEXT: adds r4, r5, #6
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-12]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #4]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4]
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-10]
|
||||
; CHECK-NEXT: add.w r1, r5, #10
|
||||
; CHECK-NEXT: add.w r4, r5, #10
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-8]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #8]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4]
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-6]
|
||||
; CHECK-NEXT: ldrh r1, [r6, #-2]
|
||||
; CHECK-NEXT: ldrh r4, [r6, #-2]
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r0
|
||||
; CHECK-NEXT: ldrh r0, [r6, #-4]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #12]
|
||||
|
@ -1163,32 +1162,33 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, half* noca
|
|||
; CHECK-NEXT: add.w r0, r5, #14
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: adds r5, #16
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r1
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r4
|
||||
; CHECK-NEXT: le lr, .LBB16_6
|
||||
; CHECK-NEXT: b .LBB16_8
|
||||
; CHECK-NEXT: .LBB16_7: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: ldr r6, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: .LBB16_8: @ %for.end
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: cmp.w r8, #0
|
||||
; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload
|
||||
; CHECK-NEXT: subs.w lr, r0, #0
|
||||
; CHECK-NEXT: beq.w .LBB16_3
|
||||
; CHECK-NEXT: b .LBB16_9
|
||||
; CHECK-NEXT: .LBB16_9: @ %while.body76.preheader
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: mov r0, r5
|
||||
; CHECK-NEXT: mov lr, r8
|
||||
; CHECK-NEXT: .LBB16_10: @ %while.body76
|
||||
; CHECK-NEXT: @ Parent Loop BB16_4 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: ldrh r1, [r6], #2
|
||||
; CHECK-NEXT: ldrh r4, [r6], #2
|
||||
; CHECK-NEXT: vldrh.u16 q1, [r0], #2
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r4
|
||||
; CHECK-NEXT: subs.w lr, lr, #1
|
||||
; CHECK-NEXT: vfma.f16 q0, q1, r1
|
||||
; CHECK-NEXT: bne .LBB16_10
|
||||
; CHECK-NEXT: b .LBB16_11
|
||||
; CHECK-NEXT: .LBB16_11: @ %while.end.loopexit
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: add.w r5, r5, r8, lsl #1
|
||||
; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload
|
||||
; CHECK-NEXT: add.w r5, r5, r0, lsl #1
|
||||
; CHECK-NEXT: b .LBB16_3
|
||||
; CHECK-NEXT: .LBB16_12: @ %if.end
|
||||
; CHECK-NEXT: add sp, #24
|
||||
|
@ -1450,12 +1450,12 @@ define void @arm_biquad_cascade_df2T_f16(%struct.arm_biquad_cascade_df2T_instanc
|
|||
; CHECK-NEXT: .LBB17_3: @ %do.body
|
||||
; CHECK-NEXT: @ =>This Loop Header: Depth=1
|
||||
; CHECK-NEXT: @ Child Loop BB17_5 Depth 2
|
||||
; CHECK-NEXT: vldrh.u16 q4, [r6]
|
||||
; CHECK-NEXT: vldrh.u16 q3, [r6, #4]
|
||||
; CHECK-NEXT: vldrh.u16 q3, [r6]
|
||||
; CHECK-NEXT: movs r5, #0
|
||||
; CHECK-NEXT: vmov q5, q4
|
||||
; CHECK-NEXT: vmov q6, q3
|
||||
; CHECK-NEXT: vmov q5, q3
|
||||
; CHECK-NEXT: vshlc q5, r5, #16
|
||||
; CHECK-NEXT: vldrh.u16 q4, [r6, #4]
|
||||
; CHECK-NEXT: vmov q6, q4
|
||||
; CHECK-NEXT: vshlc q6, r5, #16
|
||||
; CHECK-NEXT: vldrh.u16 q2, [r12]
|
||||
; CHECK-NEXT: vmov.f32 s9, s1
|
||||
|
@ -1464,16 +1464,15 @@ define void @arm_biquad_cascade_df2T_f16(%struct.arm_biquad_cascade_df2T_instanc
|
|||
; CHECK-NEXT: @ %bb.4: @ %while.body.preheader
|
||||
; CHECK-NEXT: @ in Loop: Header=BB17_3 Depth=1
|
||||
; CHECK-NEXT: mov r5, r2
|
||||
; CHECK-NEXT: mov lr, r9
|
||||
; CHECK-NEXT: .LBB17_5: @ %while.body
|
||||
; CHECK-NEXT: @ Parent Loop BB17_3 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: ldrh r7, [r1], #4
|
||||
; CHECK-NEXT: vmov r4, s4
|
||||
; CHECK-NEXT: vfma.f16 q2, q4, r7
|
||||
; CHECK-NEXT: vfma.f16 q2, q3, r7
|
||||
; CHECK-NEXT: ldrh r3, [r1, #-2]
|
||||
; CHECK-NEXT: vmov.u16 r7, q2[0]
|
||||
; CHECK-NEXT: vfma.f16 q2, q3, r7
|
||||
; CHECK-NEXT: vfma.f16 q2, q4, r7
|
||||
; CHECK-NEXT: vmov.16 q2[3], r4
|
||||
; CHECK-NEXT: vfma.f16 q2, q5, r3
|
||||
; CHECK-NEXT: vmov.u16 r3, q2[1]
|
||||
|
@ -1490,9 +1489,9 @@ define void @arm_biquad_cascade_df2T_f16(%struct.arm_biquad_cascade_df2T_instanc
|
|||
; CHECK-NEXT: @ %bb.7: @ %if.then
|
||||
; CHECK-NEXT: @ in Loop: Header=BB17_3 Depth=1
|
||||
; CHECK-NEXT: ldrh r1, [r1]
|
||||
; CHECK-NEXT: vfma.f16 q2, q4, r1
|
||||
; CHECK-NEXT: vmov.u16 r1, q2[0]
|
||||
; CHECK-NEXT: vfma.f16 q2, q3, r1
|
||||
; CHECK-NEXT: vmov.u16 r1, q2[0]
|
||||
; CHECK-NEXT: vfma.f16 q2, q4, r1
|
||||
; CHECK-NEXT: strh r1, [r5]
|
||||
; CHECK-NEXT: vmovx.f16 s6, s8
|
||||
; CHECK-NEXT: vstr.16 s6, [r12]
|
||||
|
|
|
@ -785,23 +785,23 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
|
||||
; CHECK-NEXT: .pad #8
|
||||
; CHECK-NEXT: sub sp, #8
|
||||
; CHECK-NEXT: ldrh.w r9, [r0]
|
||||
; CHECK-NEXT: ldrh.w r10, [r0]
|
||||
; CHECK-NEXT: mov r11, r1
|
||||
; CHECK-NEXT: ldr.w r12, [r0, #4]
|
||||
; CHECK-NEXT: sub.w r1, r9, #1
|
||||
; CHECK-NEXT: sub.w r1, r10, #1
|
||||
; CHECK-NEXT: cmp r1, #3
|
||||
; CHECK-NEXT: bhi .LBB15_6
|
||||
; CHECK-NEXT: @ %bb.1: @ %if.then
|
||||
; CHECK-NEXT: ldr r4, [r0, #8]
|
||||
; CHECK-NEXT: lsr.w lr, r3, #2
|
||||
; CHECK-NEXT: ldrd r7, r6, [r4]
|
||||
; CHECK-NEXT: ldrd r5, r8, [r4, #8]
|
||||
; CHECK-NEXT: add.w r4, r12, r1, lsl #2
|
||||
; CHECK-NEXT: wls lr, lr, .LBB15_5
|
||||
; CHECK-NEXT: lsrs r1, r3, #2
|
||||
; CHECK-NEXT: wls lr, r1, .LBB15_5
|
||||
; CHECK-NEXT: @ %bb.2: @ %while.body.lr.ph
|
||||
; CHECK-NEXT: bic r1, r3, #3
|
||||
; CHECK-NEXT: str r1, [sp] @ 4-byte Spill
|
||||
; CHECK-NEXT: add.w r10, r12, #4
|
||||
; CHECK-NEXT: add.w r9, r12, #4
|
||||
; CHECK-NEXT: add.w r1, r2, r1, lsl #2
|
||||
; CHECK-NEXT: str r1, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: mov r1, r11
|
||||
|
@ -809,12 +809,12 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r1], #16
|
||||
; CHECK-NEXT: vstrb.8 q0, [r4], #16
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r10, #-4]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r10], #16
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r9, #-4]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9], #16
|
||||
; CHECK-NEXT: vmul.f32 q0, q0, r7
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r10, #-8]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r9, #-8]
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r10, #-12]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9, #-12]
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r5
|
||||
; CHECK-NEXT: vfma.f32 q0, q2, r8
|
||||
; CHECK-NEXT: vstrb.8 q0, [r2], #16
|
||||
|
@ -843,10 +843,10 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: ldr.w r12, [r0, #4]
|
||||
; CHECK-NEXT: .LBB15_6: @ %if.end
|
||||
; CHECK-NEXT: add.w r0, r12, r3, lsl #2
|
||||
; CHECK-NEXT: lsr.w lr, r9, #2
|
||||
; CHECK-NEXT: wls lr, lr, .LBB15_10
|
||||
; CHECK-NEXT: lsr.w r1, r10, #2
|
||||
; CHECK-NEXT: wls lr, r1, .LBB15_10
|
||||
; CHECK-NEXT: @ %bb.7: @ %while.body51.preheader
|
||||
; CHECK-NEXT: bic r2, r9, #3
|
||||
; CHECK-NEXT: bic r2, r10, #3
|
||||
; CHECK-NEXT: adds r1, r2, r3
|
||||
; CHECK-NEXT: mov r3, r12
|
||||
; CHECK-NEXT: add.w r1, r12, r1, lsl #2
|
||||
|
@ -859,7 +859,7 @@ define void @arm_fir_f32_1_4_mve(%struct.arm_fir_instance_f32* nocapture readonl
|
|||
; CHECK-NEXT: add.w r12, r12, r2, lsl #2
|
||||
; CHECK-NEXT: mov r0, r1
|
||||
; CHECK-NEXT: .LBB15_10: @ %while.end55
|
||||
; CHECK-NEXT: ands r1, r9, #3
|
||||
; CHECK-NEXT: ands r1, r10, #3
|
||||
; CHECK-NEXT: beq .LBB15_12
|
||||
; CHECK-NEXT: @ %bb.11: @ %if.then59
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r0]
|
||||
|
@ -1053,32 +1053,32 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, float* noc
|
|||
; CHECK-NEXT: beq.w .LBB16_12
|
||||
; CHECK-NEXT: @ %bb.2: @ %while.body.lr.ph
|
||||
; CHECK-NEXT: ldrh r6, [r0]
|
||||
; CHECK-NEXT: movs r4, #1
|
||||
; CHECK-NEXT: ldrd r5, r10, [r0, #4]
|
||||
; CHECK-NEXT: sub.w r3, r6, #8
|
||||
; CHECK-NEXT: add.w r0, r3, r3, lsr #29
|
||||
; CHECK-NEXT: asrs r7, r0, #3
|
||||
; CHECK-NEXT: movs r5, #1
|
||||
; CHECK-NEXT: ldrd r4, r10, [r0, #4]
|
||||
; CHECK-NEXT: sub.w r0, r6, #8
|
||||
; CHECK-NEXT: add.w r3, r0, r0, lsr #29
|
||||
; CHECK-NEXT: and r0, r0, #7
|
||||
; CHECK-NEXT: asrs r7, r3, #3
|
||||
; CHECK-NEXT: cmp r7, #1
|
||||
; CHECK-NEXT: it gt
|
||||
; CHECK-NEXT: asrgt r4, r0, #3
|
||||
; CHECK-NEXT: add.w r0, r5, r6, lsl #2
|
||||
; CHECK-NEXT: sub.w r9, r0, #4
|
||||
; CHECK-NEXT: rsbs r0, r6, #0
|
||||
; CHECK-NEXT: str r4, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: and r4, r3, #7
|
||||
; CHECK-NEXT: str r0, [sp, #16] @ 4-byte Spill
|
||||
; CHECK-NEXT: add.w r0, r10, #32
|
||||
; CHECK-NEXT: str r6, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r0, [sp, #8] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r4, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: asrgt r5, r3, #3
|
||||
; CHECK-NEXT: add.w r3, r4, r6, lsl #2
|
||||
; CHECK-NEXT: sub.w r9, r3, #4
|
||||
; CHECK-NEXT: rsbs r3, r6, #0
|
||||
; CHECK-NEXT: str r3, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: add.w r3, r10, #32
|
||||
; CHECK-NEXT: str r5, [sp, #4] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r6, [sp, #16] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r3, [sp, #8] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r0, [sp, #20] @ 4-byte Spill
|
||||
; CHECK-NEXT: b .LBB16_4
|
||||
; CHECK-NEXT: .LBB16_3: @ %while.end
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r0, [sp, #12] @ 4-byte Reload
|
||||
; CHECK-NEXT: subs.w r12, r12, #1
|
||||
; CHECK-NEXT: vstrb.8 q0, [r2], #16
|
||||
; CHECK-NEXT: add.w r0, r5, r0, lsl #2
|
||||
; CHECK-NEXT: add.w r5, r0, #16
|
||||
; CHECK-NEXT: add.w r0, r4, r0, lsl #2
|
||||
; CHECK-NEXT: add.w r4, r0, #16
|
||||
; CHECK-NEXT: beq .LBB16_12
|
||||
; CHECK-NEXT: .LBB16_4: @ %while.body
|
||||
; CHECK-NEXT: @ =>This Loop Header: Depth=1
|
||||
|
@ -1087,25 +1087,25 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, float* noc
|
|||
; CHECK-NEXT: add.w lr, r10, #8
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r1], #16
|
||||
; CHECK-NEXT: ldrd r3, r7, [r10]
|
||||
; CHECK-NEXT: ldm.w lr, {r0, r4, r6, lr}
|
||||
; CHECK-NEXT: ldm.w lr, {r0, r5, r6, lr}
|
||||
; CHECK-NEXT: ldrd r11, r8, [r10, #24]
|
||||
; CHECK-NEXT: vstrb.8 q0, [r9], #16
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r5], #32
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r4], #32
|
||||
; CHECK-NEXT: strd r9, r1, [sp, #24] @ 8-byte Folded Spill
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #-28]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4, #-28]
|
||||
; CHECK-NEXT: vmul.f32 q0, q0, r3
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r5, #-24]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r5, #-20]
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r4, #-24]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r4, #-20]
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r7
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r5, #-16]
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r4, #-16]
|
||||
; CHECK-NEXT: vfma.f32 q0, q6, r0
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r5, #-12]
|
||||
; CHECK-NEXT: vfma.f32 q0, q4, r4
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r5, #-8]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r4, #-12]
|
||||
; CHECK-NEXT: vfma.f32 q0, q4, r5
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r4, #-8]
|
||||
; CHECK-NEXT: vfma.f32 q0, q5, r6
|
||||
; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r0, [sp, #16] @ 4-byte Reload
|
||||
; CHECK-NEXT: vfma.f32 q0, q2, lr
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #-4]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4, #-4]
|
||||
; CHECK-NEXT: vfma.f32 q0, q3, r11
|
||||
; CHECK-NEXT: cmp r0, #16
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r8
|
||||
|
@ -1118,54 +1118,52 @@ define void @fir(%struct.arm_fir_instance_f32* nocapture readonly %S, float* noc
|
|||
; CHECK-NEXT: .LBB16_6: @ %for.body
|
||||
; CHECK-NEXT: @ Parent Loop BB16_4 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: ldm.w r7, {r0, r3, r4, r6}
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5], #32
|
||||
; CHECK-NEXT: add.w r11, r7, #16
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r5, #-24]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r5, #-20]
|
||||
; CHECK-NEXT: ldm.w r7, {r0, r3, r5, r6, r8, r11}
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4], #32
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r4, #-24]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r4, #-20]
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #-28]
|
||||
; CHECK-NEXT: ldm.w r11, {r1, r8, r11}
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r5, #-16]
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4, #-28]
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r4, #-16]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r4, #-12]
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r3
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r5, #-12]
|
||||
; CHECK-NEXT: vfma.f32 q0, q6, r4
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r5, #-8]
|
||||
; CHECK-NEXT: ldrd r9, r1, [r7, #24]
|
||||
; CHECK-NEXT: vfma.f32 q0, q6, r5
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r4, #-8]
|
||||
; CHECK-NEXT: vfma.f32 q0, q4, r6
|
||||
; CHECK-NEXT: ldr.w r9, [r7, #28]
|
||||
; CHECK-NEXT: vfma.f32 q0, q5, r1
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r5, #-4]
|
||||
; CHECK-NEXT: vfma.f32 q0, q2, r8
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r4, #-4]
|
||||
; CHECK-NEXT: vfma.f32 q0, q5, r8
|
||||
; CHECK-NEXT: adds r7, #32
|
||||
; CHECK-NEXT: vfma.f32 q0, q3, r11
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r9
|
||||
; CHECK-NEXT: vfma.f32 q0, q2, r11
|
||||
; CHECK-NEXT: vfma.f32 q0, q3, r9
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r1
|
||||
; CHECK-NEXT: le lr, .LBB16_6
|
||||
; CHECK-NEXT: b .LBB16_8
|
||||
; CHECK-NEXT: .LBB16_7: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: ldr r7, [sp, #8] @ 4-byte Reload
|
||||
; CHECK-NEXT: .LBB16_8: @ %for.end
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: ldrd r9, r1, [sp, #24] @ 8-byte Folded Reload
|
||||
; CHECK-NEXT: ldr r4, [sp, #12] @ 4-byte Reload
|
||||
; CHECK-NEXT: cmp.w r4, #0
|
||||
; CHECK-NEXT: ldr r1, [sp, #28] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldrd r0, r9, [sp, #20] @ 8-byte Folded Reload
|
||||
; CHECK-NEXT: subs.w lr, r0, #0
|
||||
; CHECK-NEXT: beq .LBB16_3
|
||||
; CHECK-NEXT: b .LBB16_9
|
||||
; CHECK-NEXT: .LBB16_9: @ %while.body76.preheader
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: mov r3, r5
|
||||
; CHECK-NEXT: mov lr, r4
|
||||
; CHECK-NEXT: mov r3, r4
|
||||
; CHECK-NEXT: .LBB16_10: @ %while.body76
|
||||
; CHECK-NEXT: @ Parent Loop BB16_4 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: ldr r0, [r7], #4
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r3], #4
|
||||
; CHECK-NEXT: subs.w lr, lr, #1
|
||||
; CHECK-NEXT: vfma.f32 q0, q1, r0
|
||||
; CHECK-NEXT: subs.w lr, lr, #1
|
||||
; CHECK-NEXT: bne .LBB16_10
|
||||
; CHECK-NEXT: b .LBB16_11
|
||||
; CHECK-NEXT: .LBB16_11: @ %while.end.loopexit
|
||||
; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
|
||||
; CHECK-NEXT: add.w r5, r5, r4, lsl #2
|
||||
; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: add.w r4, r4, r0, lsl #2
|
||||
; CHECK-NEXT: b .LBB16_3
|
||||
; CHECK-NEXT: .LBB16_12: @ %if.end
|
||||
; CHECK-NEXT: add sp, #32
|
||||
|
@ -1660,15 +1658,15 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_df1_f32(%struct.arm_biquad_casd_
|
|||
; CHECK-NEXT: ldrd r12, r10, [r0]
|
||||
; CHECK-NEXT: @ implicit-def: $s2
|
||||
; CHECK-NEXT: and r7, r3, #3
|
||||
; CHECK-NEXT: ldr.w r11, [r0, #8]
|
||||
; CHECK-NEXT: ldr.w r9, [r0, #8]
|
||||
; CHECK-NEXT: lsrs r0, r3, #2
|
||||
; CHECK-NEXT: str r0, [sp, #60] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r0, [sp, #8] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r7, [sp, #12] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r2, [sp, #56] @ 4-byte Spill
|
||||
; CHECK-NEXT: str r2, [sp, #60] @ 4-byte Spill
|
||||
; CHECK-NEXT: b .LBB19_3
|
||||
; CHECK-NEXT: .LBB19_1: @ in Loop: Header=BB19_3 Depth=1
|
||||
; CHECK-NEXT: vmov.f32 s14, s7
|
||||
; CHECK-NEXT: ldr r2, [sp, #56] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r2, [sp, #60] @ 4-byte Reload
|
||||
; CHECK-NEXT: vmov.f32 s0, s10
|
||||
; CHECK-NEXT: vmov.f32 s7, s6
|
||||
; CHECK-NEXT: .LBB19_2: @ %if.end69
|
||||
|
@ -1676,7 +1674,7 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_df1_f32(%struct.arm_biquad_casd_
|
|||
; CHECK-NEXT: vstr s8, [r10]
|
||||
; CHECK-NEXT: subs.w r12, r12, #1
|
||||
; CHECK-NEXT: vstr s0, [r10, #4]
|
||||
; CHECK-NEXT: add.w r11, r11, #128
|
||||
; CHECK-NEXT: add.w r9, r9, #128
|
||||
; CHECK-NEXT: vstr s14, [r10, #8]
|
||||
; CHECK-NEXT: mov r1, r2
|
||||
; CHECK-NEXT: vstr s7, [r10, #12]
|
||||
|
@ -1687,45 +1685,45 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_df1_f32(%struct.arm_biquad_casd_
|
|||
; CHECK-NEXT: @ Child Loop BB19_5 Depth 2
|
||||
; CHECK-NEXT: vldr s7, [r10, #8]
|
||||
; CHECK-NEXT: mov r5, r2
|
||||
; CHECK-NEXT: ldr r0, [sp, #60] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r0, [sp, #8] @ 4-byte Reload
|
||||
; CHECK-NEXT: vldr s8, [r10]
|
||||
; CHECK-NEXT: vldr s10, [r10, #4]
|
||||
; CHECK-NEXT: vldr s6, [r10, #12]
|
||||
; CHECK-NEXT: wls lr, r0, .LBB19_6
|
||||
; CHECK-NEXT: @ %bb.4: @ %while.body.lr.ph
|
||||
; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1
|
||||
; CHECK-NEXT: ldrd r5, lr, [sp, #56] @ 8-byte Folded Reload
|
||||
; CHECK-NEXT: ldr r5, [sp, #60] @ 4-byte Reload
|
||||
; CHECK-NEXT: .LBB19_5: @ %while.body
|
||||
; CHECK-NEXT: @ Parent Loop BB19_3 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
; CHECK-NEXT: vmov r4, s8
|
||||
; CHECK-NEXT: vldr s8, [r1, #12]
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r11, #112]
|
||||
; CHECK-NEXT: vmov r0, s10
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r9, #112]
|
||||
; CHECK-NEXT: vmov r3, s10
|
||||
; CHECK-NEXT: vldr s10, [r1, #8]
|
||||
; CHECK-NEXT: vmov r7, s7
|
||||
; CHECK-NEXT: vmov r9, s6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r11]
|
||||
; CHECK-NEXT: vmov r11, s6
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r9]
|
||||
; CHECK-NEXT: vstrw.32 q0, [sp, #64] @ 16-byte Spill
|
||||
; CHECK-NEXT: vmov r8, s8
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r11, #16]
|
||||
; CHECK-NEXT: vldrw.u32 q0, [r9, #16]
|
||||
; CHECK-NEXT: ldr r6, [r1, #4]
|
||||
; CHECK-NEXT: vldrw.u32 q7, [r11, #32]
|
||||
; CHECK-NEXT: vldrw.u32 q7, [r9, #32]
|
||||
; CHECK-NEXT: vmul.f32 q1, q1, r8
|
||||
; CHECK-NEXT: vmov r3, s10
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r11, #48]
|
||||
; CHECK-NEXT: vfma.f32 q1, q0, r3
|
||||
; CHECK-NEXT: ldr r3, [r1], #16
|
||||
; CHECK-NEXT: vmov r0, s10
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r9, #48]
|
||||
; CHECK-NEXT: vfma.f32 q1, q0, r0
|
||||
; CHECK-NEXT: ldr r0, [r1], #16
|
||||
; CHECK-NEXT: vfma.f32 q1, q7, r6
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r11, #64]
|
||||
; CHECK-NEXT: vfma.f32 q1, q3, r3
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r11, #80]
|
||||
; CHECK-NEXT: vldrw.u32 q6, [r9, #64]
|
||||
; CHECK-NEXT: vfma.f32 q1, q3, r0
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r9, #80]
|
||||
; CHECK-NEXT: vfma.f32 q1, q6, r4
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r11, #96]
|
||||
; CHECK-NEXT: vfma.f32 q1, q5, r0
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r9, #96]
|
||||
; CHECK-NEXT: vfma.f32 q1, q5, r3
|
||||
; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload
|
||||
; CHECK-NEXT: vfma.f32 q1, q4, r7
|
||||
; CHECK-NEXT: vfma.f32 q1, q0, r9
|
||||
; CHECK-NEXT: vfma.f32 q1, q0, r11
|
||||
; CHECK-NEXT: vmov.f32 s2, s8
|
||||
; CHECK-NEXT: vstrb.8 q1, [r5], #16
|
||||
; CHECK-NEXT: le lr, .LBB19_5
|
||||
|
@ -1739,25 +1737,25 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_df1_f32(%struct.arm_biquad_casd_
|
|||
; CHECK-NEXT: vldr s24, [r1]
|
||||
; CHECK-NEXT: vmov r0, s8
|
||||
; CHECK-NEXT: vldr s0, [r1, #4]
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r11]
|
||||
; CHECK-NEXT: vldrw.u32 q3, [r9]
|
||||
; CHECK-NEXT: vldr s3, [r1, #12]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r11, #32]
|
||||
; CHECK-NEXT: vldrw.u32 q4, [r9, #32]
|
||||
; CHECK-NEXT: vldr s1, [r1, #8]
|
||||
; CHECK-NEXT: vmov r1, s10
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r11, #96]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r9, #96]
|
||||
; CHECK-NEXT: vmov r6, s3
|
||||
; CHECK-NEXT: vmul.f32 q3, q3, r6
|
||||
; CHECK-NEXT: vmov r6, s1
|
||||
; CHECK-NEXT: vstrw.32 q2, [sp, #32] @ 16-byte Spill
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r11, #112]
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r11, #48]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r9, #112]
|
||||
; CHECK-NEXT: vldrw.u32 q5, [r9, #48]
|
||||
; CHECK-NEXT: vmov r4, s0
|
||||
; CHECK-NEXT: vstrw.32 q2, [sp, #64] @ 16-byte Spill
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r11, #80]
|
||||
; CHECK-NEXT: vldrw.u32 q7, [r11, #64]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r9, #80]
|
||||
; CHECK-NEXT: vldrw.u32 q7, [r9, #64]
|
||||
; CHECK-NEXT: vmov r3, s24
|
||||
; CHECK-NEXT: vstrw.32 q2, [sp, #16] @ 16-byte Spill
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r11, #16]
|
||||
; CHECK-NEXT: vldrw.u32 q2, [r9, #16]
|
||||
; CHECK-NEXT: vmov r2, s7
|
||||
; CHECK-NEXT: cmp r7, #1
|
||||
; CHECK-NEXT: vfma.f32 q3, q2, r6
|
||||
|
@ -1792,12 +1790,12 @@ define arm_aapcs_vfpcc void @arm_biquad_cascade_df1_f32(%struct.arm_biquad_casd_
|
|||
; CHECK-NEXT: .LBB19_11: @ %if.end69
|
||||
; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1
|
||||
; CHECK-NEXT: vmov.f32 s2, s3
|
||||
; CHECK-NEXT: ldr r2, [sp, #56] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r2, [sp, #60] @ 4-byte Reload
|
||||
; CHECK-NEXT: b .LBB19_2
|
||||
; CHECK-NEXT: .LBB19_12: @ %if.else64
|
||||
; CHECK-NEXT: @ in Loop: Header=BB19_3 Depth=1
|
||||
; CHECK-NEXT: vmov.f32 s7, s13
|
||||
; CHECK-NEXT: ldr r2, [sp, #56] @ 4-byte Reload
|
||||
; CHECK-NEXT: ldr r2, [sp, #60] @ 4-byte Reload
|
||||
; CHECK-NEXT: vmov.f32 s2, s3
|
||||
; CHECK-NEXT: vstr s14, [r5, #8]
|
||||
; CHECK-NEXT: vmov.f32 s8, s1
|
||||
|
@ -2063,7 +2061,6 @@ define void @arm_biquad_cascade_df2T_f32(%struct.arm_biquad_cascade_df2T_instanc
|
|||
; CHECK-NEXT: @ in Loop: Header=BB20_3 Depth=1
|
||||
; CHECK-NEXT: vmov q6, q1
|
||||
; CHECK-NEXT: mov r5, r2
|
||||
; CHECK-NEXT: mov lr, r3
|
||||
; CHECK-NEXT: .LBB20_5: @ %while.body
|
||||
; CHECK-NEXT: @ Parent Loop BB20_3 Depth=1
|
||||
; CHECK-NEXT: @ => This Inner Loop Header: Depth=2
|
||||
|
|
|
@ -64,16 +64,16 @@ define void @arm_cmplx_dot_prod_q15(i16* nocapture readonly %pSrcA, i16* nocaptu
|
|||
; CHECK-NEXT: vldrh.u16 q0, [r0]
|
||||
; CHECK-NEXT: vldrh.u16 q1, [r1]
|
||||
; CHECK-NEXT: movs r4, #0
|
||||
; CHECK-NEXT: lsr.w lr, r7, #3
|
||||
; CHECK-NEXT: lsr.w r9, r7, #3
|
||||
; CHECK-NEXT: mov r7, r12
|
||||
; CHECK-NEXT: mov r11, r12
|
||||
; CHECK-NEXT: wls lr, lr, .LBB1_4
|
||||
; CHECK-NEXT: wls lr, r9, .LBB1_4
|
||||
; CHECK-NEXT: @ %bb.1: @ %while.body.preheader
|
||||
; CHECK-NEXT: add.w r8, r0, r9, lsl #5
|
||||
; CHECK-NEXT: mov.w r11, #0
|
||||
; CHECK-NEXT: add.w r8, r0, lr, lsl #5
|
||||
; CHECK-NEXT: adds r0, #32
|
||||
; CHECK-NEXT: add.w r6, r1, #32
|
||||
; CHECK-NEXT: lsl.w r9, lr, #4
|
||||
; CHECK-NEXT: lsl.w r9, r9, #4
|
||||
; CHECK-NEXT: mov r4, r11
|
||||
; CHECK-NEXT: movs r7, #0
|
||||
; CHECK-NEXT: mov r12, r11
|
||||
|
@ -100,9 +100,9 @@ define void @arm_cmplx_dot_prod_q15(i16* nocapture readonly %pSrcA, i16* nocaptu
|
|||
; CHECK-NEXT: ldr.w r8, [sp, #36]
|
||||
; CHECK-NEXT: mov r6, r12
|
||||
; CHECK-NEXT: mov r5, r7
|
||||
; CHECK-NEXT: and lr, r2, #3
|
||||
; CHECK-NEXT: and r2, r2, #3
|
||||
; CHECK-NEXT: lsrl r6, r5, #6
|
||||
; CHECK-NEXT: wls lr, lr, .LBB1_7
|
||||
; CHECK-NEXT: wls lr, r2, .LBB1_7
|
||||
; CHECK-NEXT: .LBB1_5: @ %while.body11
|
||||
; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
|
||||
; CHECK-NEXT: ldrsh r9, [r0], #4
|
||||
|
|
|
@ -1163,14 +1163,14 @@ define arm_aapcs_vfpcc void @_Z37_arm_radix4_butterfly_inverse_f32_mvePK21arm_cf
|
|||
; CHECK-NEXT: bne .LBB7_6
|
||||
; CHECK-NEXT: b .LBB7_2
|
||||
; CHECK-NEXT: .LBB7_9:
|
||||
; CHECK-NEXT: adr r0, .LCPI7_0
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r0]
|
||||
; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: vadd.i32 q1, q1, r0
|
||||
; CHECK-NEXT: vldrw.u32 q2, [q1, #64]!
|
||||
; CHECK-NEXT: adr r1, .LCPI7_0
|
||||
; CHECK-NEXT: ldr r0, [sp, #8] @ 4-byte Reload
|
||||
; CHECK-NEXT: lsr.w lr, r0, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB7_12
|
||||
; CHECK-NEXT: vldrw.u32 q1, [r1]
|
||||
; CHECK-NEXT: ldr r1, [sp, #20] @ 4-byte Reload
|
||||
; CHECK-NEXT: vadd.i32 q1, q1, r1
|
||||
; CHECK-NEXT: lsrs r0, r0, #3
|
||||
; CHECK-NEXT: vldrw.u32 q2, [q1, #64]!
|
||||
; CHECK-NEXT: wls lr, r0, .LBB7_12
|
||||
; CHECK-NEXT: @ %bb.10:
|
||||
; CHECK-NEXT: vldr s0, [sp, #4] @ 4-byte Reload
|
||||
; CHECK-NEXT: vmov r0, s0
|
||||
|
|
|
@ -197,8 +197,8 @@ define void @loop_absmax32(float* nocapture readonly %0, i32 %1, float* nocaptur
|
|||
; CHECK-NEXT: .save {r7, lr}
|
||||
; CHECK-NEXT: push {r7, lr}
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: lsr.w lr, r1, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB16_3
|
||||
; CHECK-NEXT: lsrs r1, r1, #3
|
||||
; CHECK-NEXT: wls lr, r1, .LBB16_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %.preheader
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: .LBB16_2: @ =>This Inner Loop Header: Depth=1
|
||||
|
@ -247,8 +247,8 @@ define void @loop_absmax32_c(float* nocapture readonly %0, i32 %1, float* nocapt
|
|||
; CHECK-NEXT: .save {r7, lr}
|
||||
; CHECK-NEXT: push {r7, lr}
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: lsr.w lr, r1, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB17_3
|
||||
; CHECK-NEXT: lsrs r1, r1, #3
|
||||
; CHECK-NEXT: wls lr, r1, .LBB17_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %.preheader
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: .LBB17_2: @ =>This Inner Loop Header: Depth=1
|
||||
|
@ -389,8 +389,8 @@ define void @loop_absmax16(half* nocapture readonly %0, i32 %1, half* nocapture
|
|||
; CHECK-NEXT: .save {r7, lr}
|
||||
; CHECK-NEXT: push {r7, lr}
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: lsr.w lr, r1, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB20_3
|
||||
; CHECK-NEXT: lsrs r1, r1, #3
|
||||
; CHECK-NEXT: wls lr, r1, .LBB20_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %.preheader
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: .LBB20_2: @ =>This Inner Loop Header: Depth=1
|
||||
|
@ -439,8 +439,8 @@ define void @loop_absmax16_c(half* nocapture readonly %0, i32 %1, half* nocaptur
|
|||
; CHECK-NEXT: .save {r7, lr}
|
||||
; CHECK-NEXT: push {r7, lr}
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: lsr.w lr, r1, #3
|
||||
; CHECK-NEXT: wls lr, lr, .LBB21_3
|
||||
; CHECK-NEXT: lsrs r1, r1, #3
|
||||
; CHECK-NEXT: wls lr, r1, .LBB21_3
|
||||
; CHECK-NEXT: @ %bb.1: @ %.preheader
|
||||
; CHECK-NEXT: vmov.i32 q0, #0x0
|
||||
; CHECK-NEXT: .LBB21_2: @ =>This Inner Loop Header: Depth=1
|
||||
|
|
|
@ -4,14 +4,16 @@
|
|||
|
||||
; CHECK-LABEL: do_with_i32_urem
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %while.end
|
||||
; CHECK: [[TEST:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 1
|
||||
; CHECK: [[TEST0:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 0
|
||||
; CHECK: br i1 [[TEST1]], label %while.body.preheader, label %while.end
|
||||
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK-NEXT: br label %while.body
|
||||
|
||||
; CHECK: while.body:
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[TEST0]], %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[REM]], i32 1)
|
||||
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
|
||||
; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit
|
||||
|
@ -43,14 +45,16 @@ while.end:
|
|||
|
||||
; CHECK-LABEL: do_with_i32_srem
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %while.end
|
||||
; CHECK: [[TEST:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 1
|
||||
; CHECK: [[TEST0:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 0
|
||||
; CHECK: br i1 [[TEST1]], label %while.body.preheader, label %while.end
|
||||
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK-NEXT: br label %while.body
|
||||
|
||||
; CHECK: while.body:
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[TEST0]], %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[REM]], i32 1)
|
||||
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
|
||||
; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit
|
||||
|
@ -82,14 +86,16 @@ while.end:
|
|||
|
||||
; CHECK-LABEL: do_with_i32_udiv
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %while.end
|
||||
; CHECK: [[TEST:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 1
|
||||
; CHECK: [[TEST0:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 0
|
||||
; CHECK: br i1 [[TEST1]], label %while.body.preheader, label %while.end
|
||||
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK-NEXT: br label %while.body
|
||||
|
||||
; CHECK: while.body:
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[TEST0]], %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[REM]], i32 1)
|
||||
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
|
||||
; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit
|
||||
|
@ -121,14 +127,16 @@ while.end:
|
|||
|
||||
; CHECK-LABEL: do_with_i32_sdiv
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %while.end
|
||||
; CHECK: [[TEST:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 1
|
||||
; CHECK: [[TEST0:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 0
|
||||
; CHECK: br i1 [[TEST1]], label %while.body.preheader, label %while.end
|
||||
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK-NEXT: br label %while.body
|
||||
|
||||
; CHECK: while.body:
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[TEST0]], %while.body.preheader ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[REM]], i32 1)
|
||||
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
|
||||
; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit
|
||||
|
|
|
@ -46,13 +46,15 @@ while.end:
|
|||
|
||||
; CHECK-LABEL: do_inc1
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
; CHECK: br i1 [[TEST]], label %while.body.lr.ph, label %while.end
|
||||
; CHECK: [[TEST:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 1
|
||||
; CHECK: [[TEST0:%[^ ]+]] = extractvalue { i32, i1 } [[TEST]], 0
|
||||
; CHECK: br i1 [[TEST1]], label %while.body.lr.ph, label %while.end
|
||||
|
||||
; CHECK: while.body.lr.ph:
|
||||
; CHECK: br label %while.body
|
||||
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ %n, %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[REM:%[^ ]+]] = phi i32 [ [[TEST0]], %while.body.lr.ph ], [ [[LOOP_DEC:%[^ ]+]], %while.body ]
|
||||
; CHECK: [[LOOP_DEC]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[REM]], i32 1)
|
||||
; CHECK: [[CMP:%[^ ]+]] = icmp ne i32 [[LOOP_DEC]], 0
|
||||
; CHECK: br i1 [[CMP]], label %while.body, label %while.end.loopexit
|
||||
|
|
|
@ -118,14 +118,16 @@ while.end: ; preds = %while.body
|
|||
}
|
||||
|
||||
; CHECK-LABEL: pre_existing_test_set
|
||||
; CHECK: call i1 @llvm.test.set.loop.iterations
|
||||
; CHECK: call { i32, i1 } @llvm.test.start.loop.iterations
|
||||
; CHECK-NOT: llvm.set{{.*}}.loop.iterations
|
||||
; CHECK: call i32 @llvm.loop.decrement.reg.i32(i32 %0, i32 1)
|
||||
; CHECK-NOT: call i32 @llvm.loop.decrement.reg
|
||||
define i32 @pre_existing_test_set(i32 %n, i32* nocapture %p, i32* nocapture readonly %q) {
|
||||
entry:
|
||||
%guard = call i1 @llvm.test.set.loop.iterations.i32(i32 %n)
|
||||
br i1 %guard, label %while.preheader, label %while.end
|
||||
%guard = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %n)
|
||||
%g0 = extractvalue { i32, i1 } %guard, 0
|
||||
%g1 = extractvalue { i32, i1 } %guard, 1
|
||||
br i1 %g1, label %while.preheader, label %while.end
|
||||
|
||||
while.preheader:
|
||||
br label %while.body
|
||||
|
@ -133,7 +135,7 @@ while.preheader:
|
|||
while.body: ; preds = %while.body, %entry
|
||||
%q.addr.05 = phi i32* [ %incdec.ptr, %while.body ], [ %q, %while.preheader ]
|
||||
%p.addr.04 = phi i32* [ %incdec.ptr1, %while.body ], [ %p, %while.preheader ]
|
||||
%0 = phi i32 [ %n, %while.preheader ], [ %2, %while.body ]
|
||||
%0 = phi i32 [ %g0, %while.preheader ], [ %2, %while.body ]
|
||||
%incdec.ptr = getelementptr inbounds i32, i32* %q.addr.05, i32 1
|
||||
%1 = load i32, i32* %q.addr.05, align 4
|
||||
%incdec.ptr1 = getelementptr inbounds i32, i32* %p.addr.04, i32 1
|
||||
|
@ -261,7 +263,8 @@ exit:
|
|||
|
||||
; CHECK-LABEL: search
|
||||
; CHECK: entry:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK: [[TEST1:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
; CHECK: [[TEST:%[^ ]+]] = extractvalue { i32, i1 } [[TEST1]], 1
|
||||
; CHECK: br i1 [[TEST]], label %for.body.preheader, label %for.cond.cleanup
|
||||
; CHECK: for.body.preheader:
|
||||
; CHECK: br label %for.body
|
||||
|
@ -321,7 +324,7 @@ for.inc: ; preds = %sw.bb, %sw.bb1, %fo
|
|||
; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
|
||||
; CHECK-UNROLL-NOT: le lr, [[LOOP]]
|
||||
; CHECK-UNROLL: bne [[LOOP]]
|
||||
; CHECK-UNROLL: wls lr, lr, [[EXIT:.LBB[0-9_]+]]
|
||||
; CHECK-UNROLL: wls lr, r12, [[EXIT:.LBB[0-9_]+]]
|
||||
; CHECK-UNROLL: [[EPIL:.LBB[0-9_]+]]:
|
||||
; CHECK-UNROLL: le lr, [[EPIL]]
|
||||
; CHECK-UNROLL-NEXT: [[EXIT]]
|
||||
|
@ -349,7 +352,7 @@ for.body:
|
|||
}
|
||||
|
||||
; CHECK-LABEL: unroll_inc_unsigned
|
||||
; CHECK: call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK: call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
; CHECK: call i32 @llvm.loop.decrement.reg.i32(
|
||||
|
||||
; TODO: We should be able to support the unrolled loop body.
|
||||
|
@ -359,7 +362,7 @@ for.body:
|
|||
; CHECK-UNROLL: [[LOOP:.LBB[0-9_]+]]: @ %for.body
|
||||
; CHECK-UNROLL-NOT: le lr, [[LOOP]]
|
||||
; CHECK-UNROLL: bne [[LOOP]]
|
||||
; CHECK-UNROLL: wls lr, lr, [[EPIL_EXIT:.LBB[0-9_]+]]
|
||||
; CHECK-UNROLL: wls lr, r12, [[EPIL_EXIT:.LBB[0-9_]+]]
|
||||
; CHECK-UNROLL: [[EPIL:.LBB[0-9_]+]]:
|
||||
; CHECK-UNROLL: le lr, [[EPIL]]
|
||||
; CHECK-UNROLL: [[EPIL_EXIT]]:
|
||||
|
@ -422,6 +425,6 @@ for.body:
|
|||
}
|
||||
|
||||
declare i32 @llvm.start.loop.iterations.i32(i32) #0
|
||||
declare i1 @llvm.test.set.loop.iterations.i32(i32) #0
|
||||
declare { i32, i1 } @llvm.test.start.loop.iterations.i32(i32) #0
|
||||
declare i32 @llvm.loop.decrement.reg.i32(i32, i32) #0
|
||||
|
||||
|
|
|
@ -153,7 +153,9 @@ if.end: ; preds = %while.body, %entry
|
|||
; CHECK: entry:
|
||||
; CHECK: br i1 %brmerge.demorgan, label %while.preheader
|
||||
; CHECK: while.preheader:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-EXIT: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST1:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST:%[^ ]+]] = extractvalue { i32, i1 } [[TEST1]], 1
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %if.end
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK: br label %while.body
|
||||
|
@ -186,7 +188,9 @@ if.end: ; preds = %while.body, %while.
|
|||
; CHECK: entry:
|
||||
; CHECK: br i1 %brmerge.demorgan, label %while.preheader
|
||||
; CHECK: while.preheader:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-EXIT: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST1:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST:%[^ ]+]] = extractvalue { i32, i1 } [[TEST1]], 1
|
||||
; CHECK: br i1 [[TEST]], label %while.body.preheader, label %if.end
|
||||
; CHECK: while.body.preheader:
|
||||
; CHECK: br label %while.body
|
||||
|
@ -315,7 +319,9 @@ if.end: ; preds = %do.body, %entry
|
|||
; CHECK: entry:
|
||||
; CHECK: br label %do.body.preheader
|
||||
; CHECK: do.body.preheader:
|
||||
; CHECK: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-EXIT: [[TEST:%[^ ]+]] = call i1 @llvm.test.set.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST1:%[^ ]+]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %N)
|
||||
; CHECK-LATCH: [[TEST:%[^ ]+]] = extractvalue { i32, i1 } [[TEST1]], 1
|
||||
; CHECK: br i1 [[TEST]], label %do.body.preheader1, label %if.end
|
||||
; CHECK: do.body.preheader1:
|
||||
; CHECK: br label %do.body
|
||||
|
|
|
@ -417,19 +417,21 @@ define void @while_ne(i32 %N, i32* nocapture %A) {
|
|||
; CHECK-PHIGUARD-LABEL: @while_ne(
|
||||
; CHECK-PHIGUARD-NEXT: entry:
|
||||
; CHECK-PHIGUARD-NEXT: [[CMP:%.*]] = icmp ne i32 [[N:%.*]], 0
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call i1 @llvm.test.set.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP0]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = extractvalue { i32, i1 } [[TMP0]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2:%.*]] = extractvalue { i32, i1 } [[TMP0]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP1]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD: while.body.preheader:
|
||||
; CHECK-PHIGUARD-NEXT: br label [[WHILE_BODY:%.*]]
|
||||
; CHECK-PHIGUARD: while.body:
|
||||
; CHECK-PHIGUARD-NEXT: [[I_ADDR_05:%.*]] = phi i32 [ [[INC:%.*]], [[WHILE_BODY]] ], [ 0, [[WHILE_BODY_PREHEADER]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = phi i32 [ [[N]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP2:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = phi i32 [ [[TMP2]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP4:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[I_ADDR_05]]
|
||||
; CHECK-PHIGUARD-NEXT: store i32 [[I_ADDR_05]], i32* [[ARRAYIDX]], align 4
|
||||
; CHECK-PHIGUARD-NEXT: [[INC]] = add nuw i32 [[I_ADDR_05]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP1]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP3]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP4]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP3]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP5:%.*]] = icmp ne i32 [[TMP4]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP5]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD: while.end:
|
||||
; CHECK-PHIGUARD-NEXT: ret void
|
||||
;
|
||||
|
@ -523,19 +525,21 @@ define void @while_eq(i32 %N, i32* nocapture %A) {
|
|||
; CHECK-PHIGUARD-LABEL: @while_eq(
|
||||
; CHECK-PHIGUARD-NEXT: entry:
|
||||
; CHECK-PHIGUARD-NEXT: [[CMP:%.*]] = icmp eq i32 [[N:%.*]], 0
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call i1 @llvm.test.set.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP0]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = extractvalue { i32, i1 } [[TMP0]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2:%.*]] = extractvalue { i32, i1 } [[TMP0]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP1]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD: while.body.preheader:
|
||||
; CHECK-PHIGUARD-NEXT: br label [[WHILE_BODY:%.*]]
|
||||
; CHECK-PHIGUARD: while.body:
|
||||
; CHECK-PHIGUARD-NEXT: [[I_ADDR_05:%.*]] = phi i32 [ [[INC:%.*]], [[WHILE_BODY]] ], [ 0, [[WHILE_BODY_PREHEADER]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = phi i32 [ [[N]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP2:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = phi i32 [ [[TMP2]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP4:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[I_ADDR_05]]
|
||||
; CHECK-PHIGUARD-NEXT: store i32 [[I_ADDR_05]], i32* [[ARRAYIDX]], align 4
|
||||
; CHECK-PHIGUARD-NEXT: [[INC]] = add nuw i32 [[I_ADDR_05]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP1]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP3]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP4]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP3]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP5:%.*]] = icmp ne i32 [[TMP4]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP5]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD: while.end:
|
||||
; CHECK-PHIGUARD-NEXT: ret void
|
||||
;
|
||||
|
@ -639,19 +643,21 @@ define void @while_preheader_eq(i32 %N, i32* nocapture %A) {
|
|||
; CHECK-PHIGUARD-NEXT: br label [[PREHEADER:%.*]]
|
||||
; CHECK-PHIGUARD: preheader:
|
||||
; CHECK-PHIGUARD-NEXT: [[CMP:%.*]] = icmp eq i32 [[N:%.*]], 0
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call i1 @llvm.test.set.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP0]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP0:%.*]] = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 [[N]])
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = extractvalue { i32, i1 } [[TMP0]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2:%.*]] = extractvalue { i32, i1 } [[TMP0]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP1]], label [[WHILE_BODY_PREHEADER:%.*]], label [[WHILE_END:%.*]]
|
||||
; CHECK-PHIGUARD: while.body.preheader:
|
||||
; CHECK-PHIGUARD-NEXT: br label [[WHILE_BODY:%.*]]
|
||||
; CHECK-PHIGUARD: while.body:
|
||||
; CHECK-PHIGUARD-NEXT: [[I_ADDR_05:%.*]] = phi i32 [ [[INC:%.*]], [[WHILE_BODY]] ], [ 0, [[WHILE_BODY_PREHEADER]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP1:%.*]] = phi i32 [ [[N]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP2:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = phi i32 [ [[TMP2]], [[WHILE_BODY_PREHEADER]] ], [ [[TMP4:%.*]], [[WHILE_BODY]] ]
|
||||
; CHECK-PHIGUARD-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, i32* [[A:%.*]], i32 [[I_ADDR_05]]
|
||||
; CHECK-PHIGUARD-NEXT: store i32 [[I_ADDR_05]], i32* [[ARRAYIDX]], align 4
|
||||
; CHECK-PHIGUARD-NEXT: [[INC]] = add nuw i32 [[I_ADDR_05]], 1
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP2]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP1]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP3]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP4]] = call i32 @llvm.loop.decrement.reg.i32(i32 [[TMP3]], i32 1)
|
||||
; CHECK-PHIGUARD-NEXT: [[TMP5:%.*]] = icmp ne i32 [[TMP4]], 0
|
||||
; CHECK-PHIGUARD-NEXT: br i1 [[TMP5]], label [[WHILE_BODY]], label [[WHILE_END]]
|
||||
; CHECK-PHIGUARD: while.end:
|
||||
; CHECK-PHIGUARD-NEXT: ret void
|
||||
;
|
||||
|
|
Loading…
Reference in New Issue