forked from OSchip/llvm-project
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: | statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% | | indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% | | indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% | | indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% | | indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% | | indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% | | instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% | | lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% | | licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% | | licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% | | licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% | | licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% | | licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% | | loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% | | loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% | | loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% | | loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% | | loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% | | mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% | | mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% | | mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? | statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% | | indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% | | indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% | | indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% | | indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% | | indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% | | indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% | | indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% | | indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% | | indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% | | instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% | | instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% | | instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% | | lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% | | licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% | | licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% | | licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% | | licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% | | licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% | | loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% | | loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% | | loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% | | loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% | | loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% | | loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% | | loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% | | loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% | | loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% | | mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% | | scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% | | scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% | | simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% | | simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% | | simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% | | simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% | | simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% | | simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% | I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: | statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) | | asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% | | indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% | | indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% | | indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% | | indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% | | indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% | | indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% | | indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% | | instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% | | instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% | | lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% | | licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% | | licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% | | licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% | | licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% | | licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% | | loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% | | loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% | | loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% | | loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% | | loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% | | loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% | | loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% | | loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% | | loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% | | mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% | | mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% | | mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% | | scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% | | scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% | | scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% | | simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% | | simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% | | simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% | | simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% | {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249
This commit is contained in:
parent
bb1e5399e4
commit
a26f1bf67e
|
@ -568,6 +568,11 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
|
|||
LPM1.addPass(LoopInstSimplifyPass());
|
||||
LPM1.addPass(LoopSimplifyCFGPass());
|
||||
|
||||
// Try to remove as much code from the loop header as possible,
|
||||
// to reduce amount of IR that will have to be duplicated.
|
||||
// TODO: Investigate promotion cap for O1.
|
||||
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));
|
||||
|
||||
LPM1.addPass(LoopRotatePass(/* Disable header duplication */ true,
|
||||
isLTOPreLink(Phase)));
|
||||
// TODO: Investigate promotion cap for O1.
|
||||
|
@ -736,6 +741,11 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
|
|||
LPM1.addPass(LoopInstSimplifyPass());
|
||||
LPM1.addPass(LoopSimplifyCFGPass());
|
||||
|
||||
// Try to remove as much code from the loop header as possible,
|
||||
// to reduce amount of IR that will have to be duplicated.
|
||||
// TODO: Investigate promotion cap for O1.
|
||||
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap));
|
||||
|
||||
// Disable header duplication in loop rotation at -Oz.
|
||||
LPM1.addPass(
|
||||
LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
|
||||
|
|
|
@ -431,6 +431,10 @@ void PassManagerBuilder::addFunctionSimplificationPasses(
|
|||
MPM.add(createLoopInstSimplifyPass());
|
||||
MPM.add(createLoopSimplifyCFGPass());
|
||||
}
|
||||
// Try to remove as much code from the loop header as possible,
|
||||
// to reduce amount of IR that will have to be duplicated.
|
||||
// TODO: Investigate promotion cap for O1.
|
||||
MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
|
||||
// Rotate Loop - disable header duplication at -Oz
|
||||
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1, PrepareForLTO));
|
||||
// TODO: Investigate promotion cap for O1.
|
||||
|
|
|
@ -129,16 +129,20 @@
|
|||
; GCN-O1-NEXT: Simplify the CFG
|
||||
; GCN-O1-NEXT: Reassociate expressions
|
||||
; GCN-O1-NEXT: Dominator Tree Construction
|
||||
; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O1-NEXT: Function Alias Analysis Results
|
||||
; GCN-O1-NEXT: Memory SSA
|
||||
; GCN-O1-NEXT: Natural Loop Information
|
||||
; GCN-O1-NEXT: Canonicalize natural loops
|
||||
; GCN-O1-NEXT: LCSSA Verifier
|
||||
; GCN-O1-NEXT: Loop-Closed SSA Form Pass
|
||||
; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O1-NEXT: Function Alias Analysis Results
|
||||
; GCN-O1-NEXT: Scalar Evolution Analysis
|
||||
; GCN-O1-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O1-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O1-NEXT: Loop Pass Manager
|
||||
; GCN-O1-NEXT: Loop Invariant Code Motion
|
||||
; GCN-O1-NEXT: Loop Pass Manager
|
||||
; GCN-O1-NEXT: Rotate Loops
|
||||
; GCN-O1-NEXT: Memory SSA
|
||||
; GCN-O1-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O1-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O1-NEXT: Loop Pass Manager
|
||||
|
@ -451,16 +455,20 @@
|
|||
; GCN-O2-NEXT: Simplify the CFG
|
||||
; GCN-O2-NEXT: Reassociate expressions
|
||||
; GCN-O2-NEXT: Dominator Tree Construction
|
||||
; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O2-NEXT: Function Alias Analysis Results
|
||||
; GCN-O2-NEXT: Memory SSA
|
||||
; GCN-O2-NEXT: Natural Loop Information
|
||||
; GCN-O2-NEXT: Canonicalize natural loops
|
||||
; GCN-O2-NEXT: LCSSA Verifier
|
||||
; GCN-O2-NEXT: Loop-Closed SSA Form Pass
|
||||
; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O2-NEXT: Function Alias Analysis Results
|
||||
; GCN-O2-NEXT: Scalar Evolution Analysis
|
||||
; GCN-O2-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O2-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O2-NEXT: Loop Pass Manager
|
||||
; GCN-O2-NEXT: Loop Invariant Code Motion
|
||||
; GCN-O2-NEXT: Loop Pass Manager
|
||||
; GCN-O2-NEXT: Rotate Loops
|
||||
; GCN-O2-NEXT: Memory SSA
|
||||
; GCN-O2-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O2-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O2-NEXT: Loop Pass Manager
|
||||
|
@ -810,16 +818,20 @@
|
|||
; GCN-O3-NEXT: Simplify the CFG
|
||||
; GCN-O3-NEXT: Reassociate expressions
|
||||
; GCN-O3-NEXT: Dominator Tree Construction
|
||||
; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O3-NEXT: Function Alias Analysis Results
|
||||
; GCN-O3-NEXT: Memory SSA
|
||||
; GCN-O3-NEXT: Natural Loop Information
|
||||
; GCN-O3-NEXT: Canonicalize natural loops
|
||||
; GCN-O3-NEXT: LCSSA Verifier
|
||||
; GCN-O3-NEXT: Loop-Closed SSA Form Pass
|
||||
; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; GCN-O3-NEXT: Function Alias Analysis Results
|
||||
; GCN-O3-NEXT: Scalar Evolution Analysis
|
||||
; GCN-O3-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O3-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O3-NEXT: Loop Pass Manager
|
||||
; GCN-O3-NEXT: Loop Invariant Code Motion
|
||||
; GCN-O3-NEXT: Loop Pass Manager
|
||||
; GCN-O3-NEXT: Rotate Loops
|
||||
; GCN-O3-NEXT: Memory SSA
|
||||
; GCN-O3-NEXT: Lazy Branch Probability Analysis
|
||||
; GCN-O3-NEXT: Lazy Block Frequency Analysis
|
||||
; GCN-O3-NEXT: Loop Pass Manager
|
||||
|
|
|
@ -113,9 +113,9 @@
|
|||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph::SCC{{.*}}>
|
||||
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
|
||||
; CHECK-O-NEXT: Starting CGSCC pass manager run.
|
||||
|
@ -156,6 +156,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -98,9 +98,9 @@
|
|||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-PRELINK-O-NEXT: Running analysis: ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running pass: DevirtSCCRepeatedPass
|
||||
; CHECK-O-NEXT: Starting CGSCC pass manager run.
|
||||
|
@ -139,6 +139,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -68,10 +68,10 @@
|
|||
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
|
||||
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
|
||||
; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
|
@ -112,6 +112,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -78,9 +78,9 @@
|
|||
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
|
||||
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
|
||||
; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
||||
|
@ -121,6 +121,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -93,10 +93,10 @@
|
|||
; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis on foo
|
||||
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
|
||||
; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on foo
|
||||
|
@ -158,6 +158,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -73,8 +73,8 @@
|
|||
; CHECK-O-NEXT: Running pass: ModuleInlinerWrapperPass
|
||||
; CHECK-O-NEXT: Running analysis: InlineAdvisorAnalysis
|
||||
; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}GlobalsAA
|
||||
; CHECK-O-NEXT: Running analysis: GlobalsAA
|
||||
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
|
||||
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
||||
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
||||
|
@ -116,6 +116,7 @@
|
|||
; CHECK-O-NEXT: Starting Loop pass manager run.
|
||||
; CHECK-O-NEXT: Running pass: LoopInstSimplifyPass
|
||||
; CHECK-O-NEXT: Running pass: LoopSimplifyCFGPass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: LoopRotatePass
|
||||
; CHECK-O-NEXT: Running pass: LICM
|
||||
; CHECK-O-NEXT: Running pass: SimpleLoopUnswitchPass
|
||||
|
|
|
@ -101,16 +101,20 @@
|
|||
; CHECK-NEXT: Simplify the CFG
|
||||
; CHECK-NEXT: Reassociate expressions
|
||||
; CHECK-NEXT: Dominator Tree Construction
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Natural Loop Information
|
||||
; CHECK-NEXT: Canonicalize natural loops
|
||||
; CHECK-NEXT: LCSSA Verifier
|
||||
; CHECK-NEXT: Loop-Closed SSA Form Pass
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Scalar Evolution Analysis
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Loop Invariant Code Motion
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Rotate Loops
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
|
|
|
@ -106,16 +106,20 @@
|
|||
; CHECK-NEXT: Simplify the CFG
|
||||
; CHECK-NEXT: Reassociate expressions
|
||||
; CHECK-NEXT: Dominator Tree Construction
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Natural Loop Information
|
||||
; CHECK-NEXT: Canonicalize natural loops
|
||||
; CHECK-NEXT: LCSSA Verifier
|
||||
; CHECK-NEXT: Loop-Closed SSA Form Pass
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Scalar Evolution Analysis
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Loop Invariant Code Motion
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Rotate Loops
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
|
|
|
@ -106,16 +106,20 @@
|
|||
; CHECK-NEXT: Simplify the CFG
|
||||
; CHECK-NEXT: Reassociate expressions
|
||||
; CHECK-NEXT: Dominator Tree Construction
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Natural Loop Information
|
||||
; CHECK-NEXT: Canonicalize natural loops
|
||||
; CHECK-NEXT: LCSSA Verifier
|
||||
; CHECK-NEXT: Loop-Closed SSA Form Pass
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Scalar Evolution Analysis
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Loop Invariant Code Motion
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Rotate Loops
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
|
|
|
@ -87,16 +87,20 @@
|
|||
; CHECK-NEXT: Simplify the CFG
|
||||
; CHECK-NEXT: Reassociate expressions
|
||||
; CHECK-NEXT: Dominator Tree Construction
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Natural Loop Information
|
||||
; CHECK-NEXT: Canonicalize natural loops
|
||||
; CHECK-NEXT: LCSSA Verifier
|
||||
; CHECK-NEXT: Loop-Closed SSA Form Pass
|
||||
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
|
||||
; CHECK-NEXT: Function Alias Analysis Results
|
||||
; CHECK-NEXT: Scalar Evolution Analysis
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Loop Invariant Code Motion
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
; CHECK-NEXT: Rotate Loops
|
||||
; CHECK-NEXT: Memory SSA
|
||||
; CHECK-NEXT: Lazy Branch Probability Analysis
|
||||
; CHECK-NEXT: Lazy Block Frequency Analysis
|
||||
; CHECK-NEXT: Loop Pass Manager
|
||||
|
|
|
@ -53,6 +53,9 @@
|
|||
; CHECK-O2-NEXT: FunctionPass Manager
|
||||
; CHECK-O2-NOT: Manager
|
||||
; CHECK-O2: Loop Pass Manager
|
||||
; CHECK-O2-NOT: Manager
|
||||
; CHECK-O2: Loop Pass Manager
|
||||
; CHECK-O2-NOT: Manager
|
||||
; CHECK-O2: Loop Pass Manager
|
||||
; CHECK-O2-NOT: Manager
|
||||
; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and
|
||||
|
|
|
@ -22,30 +22,33 @@ define dso_local i32 @main() {
|
|||
; CHECK-NEXT: bb:
|
||||
; CHECK-NEXT: [[I6:%.*]] = load i32, i32* @a, align 4
|
||||
; CHECK-NEXT: [[I24:%.*]] = load i32, i32* @b, align 4
|
||||
; CHECK-NEXT: [[D_PROMOTED9:%.*]] = load i32, i32* @d, align 4
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED9]], [[I6]]
|
||||
; CHECK-NEXT: [[D_PROMOTED7:%.*]] = load i32, i32* @d, align 4
|
||||
; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[D_PROMOTED7]], [[I6]]
|
||||
; CHECK-NEXT: [[I21:%.*]] = icmp eq i32 [[TMP0]], 0
|
||||
; CHECK-NEXT: br label [[BB1:%.*]]
|
||||
; CHECK: bb1:
|
||||
; CHECK-NEXT: br i1 [[I21]], label [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE:%.*]], label [[BB19_PREHEADER:%.*]]
|
||||
; CHECK: bb19.preheader:
|
||||
; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD:%.*]], label [[BB27_PREHEADER:%.*]]
|
||||
; CHECK: bb27.preheader:
|
||||
; CHECK-NEXT: [[I26:%.*]] = urem i32 [[I24]], [[TMP0]]
|
||||
; CHECK-NEXT: store i32 [[I26]], i32* @e, align 4
|
||||
; CHECK-NEXT: [[I30_NOT:%.*]] = icmp eq i32 [[I26]], 0
|
||||
; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB1]]
|
||||
; CHECK: bb13.preheader.bb27.thread.split_crit_edge:
|
||||
; CHECK-NEXT: store i32 -1, i32* @f, align 4
|
||||
; CHECK-NEXT: br label [[BB27:%.*]]
|
||||
; CHECK: bb27.thread:
|
||||
; CHECK-NEXT: store i32 0, i32* @d, align 4
|
||||
; CHECK-NEXT: store i32 -1, i32* @f, align 4
|
||||
; CHECK-NEXT: store i32 0, i32* @c, align 4
|
||||
; CHECK-NEXT: br label [[BB32:%.*]]
|
||||
; CHECK: bb27:
|
||||
; CHECK-NEXT: br i1 [[I30_NOT]], label [[BB32_LOOPEXIT:%.*]], label [[BB36:%.*]]
|
||||
; CHECK: bb32.loopexit:
|
||||
; CHECK-NEXT: store i32 -1, i32* @f, align 4
|
||||
; CHECK-NEXT: store i32 [[TMP0]], i32* @d, align 4
|
||||
; CHECK-NEXT: store i32 -1, i32* @f, align 4
|
||||
; CHECK-NEXT: br label [[BB32]]
|
||||
; CHECK: bb32:
|
||||
; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB13_PREHEADER_BB27_THREAD_SPLIT_CRIT_EDGE]] ]
|
||||
; CHECK-NEXT: [[C_SINK:%.*]] = phi i32* [ @c, [[BB32_LOOPEXIT]] ], [ @e, [[BB27_THREAD]] ]
|
||||
; CHECK-NEXT: store i32 0, i32* [[C_SINK]], align 4
|
||||
; CHECK-NEXT: ret i32 0
|
||||
; CHECK: bb36:
|
||||
; CHECK-NEXT: store i32 1, i32* @c, align 4
|
||||
; CHECK-NEXT: br i1 [[I21]], label [[BB27_THREAD]], label [[BB27]]
|
||||
;
|
||||
bb:
|
||||
%i = alloca i32, align 4
|
||||
|
|
|
@ -16,32 +16,28 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV
|
|||
; OLDPM-NEXT: entry:
|
||||
; OLDPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0
|
||||
; OLDPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
|
||||
; OLDPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
|
||||
; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
|
||||
; OLDPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
|
||||
; OLDPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]]
|
||||
; OLDPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
|
||||
; OLDPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
|
||||
; OLDPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
|
||||
; OLDPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]]
|
||||
; OLDPM: for.body7.lr.ph.i:
|
||||
; OLDPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0
|
||||
; OLDPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]
|
||||
; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0
|
||||
; OLDPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
|
||||
; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef
|
||||
; OLDPM-NEXT: [[BASE_I6_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0
|
||||
; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I6_PEEL_I]], align 8, !tbaa [[TBAA8]]
|
||||
; OLDPM-NEXT: [[ARRAYIDX_I7_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef
|
||||
; OLDPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
|
||||
; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]
|
||||
; OLDPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]]
|
||||
; OLDPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]
|
||||
; OLDPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1
|
||||
; OLDPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I:%.*]]
|
||||
; OLDPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0
|
||||
; OLDPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I6_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
|
||||
; OLDPM-NEXT: [[ARRAYIDX_I7_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef
|
||||
; OLDPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I4_I]], align 8, !tbaa [[TBAA0]]
|
||||
; OLDPM-NEXT: [[BASE_I2_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0
|
||||
; OLDPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I2_I]], align 8, !tbaa [[TBAA8]]
|
||||
; OLDPM-NEXT: [[ARRAYIDX_I3_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef
|
||||
; OLDPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
|
||||
; OLDPM-NEXT: br label [[FOR_BODY7_I:%.*]]
|
||||
; OLDPM: for.body7.i:
|
||||
; OLDPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I]] ]
|
||||
; OLDPM-NEXT: [[J_012_I:%.*]] = phi i32 [ [[INC_I:%.*]], [[FOR_BODY7_I]] ], [ 1, [[FOR_BODY7_LR_PH_I]] ]
|
||||
; OLDPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I7_PEEL_I]], align 4, !tbaa [[TBAA9]]
|
||||
; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]]
|
||||
; OLDPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ]
|
||||
; OLDPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ]
|
||||
; OLDPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I7_I]], align 4, !tbaa [[TBAA9]]
|
||||
; OLDPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]]
|
||||
; OLDPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I3_I]], align 4, !tbaa [[TBAA9]]
|
||||
; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_012_I]], 1
|
||||
; OLDPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1
|
||||
; OLDPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]]
|
||||
; OLDPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]]
|
||||
; OLDPM: _ZN12FloatVecPair6vecIncEv.exit:
|
||||
|
@ -51,39 +47,30 @@ define dso_local void @_Z13vecIncFromPtrP12FloatVecPair(%class.FloatVecPair* %FV
|
|||
; NEWPM-NEXT: entry:
|
||||
; NEWPM-NEXT: [[BASE_I_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR:%.*]], %class.FloatVecPair* [[FVP:%.*]], i64 0, i32 1, i32 0
|
||||
; NEWPM-NEXT: [[TMP0:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I_I]], align 8, !tbaa [[TBAA0:![0-9]+]]
|
||||
; NEWPM-NEXT: [[SIZE410_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
|
||||
; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE410_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
|
||||
; NEWPM-NEXT: [[CMP511_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
|
||||
; NEWPM-NEXT: br i1 [[CMP511_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]]
|
||||
; NEWPM-NEXT: [[SIZE4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0:%.*]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 1
|
||||
; NEWPM-NEXT: [[TMP1:%.*]] = load i32, i32* [[SIZE4_I]], align 8, !tbaa [[TBAA6:![0-9]+]]
|
||||
; NEWPM-NEXT: [[CMP510_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 0
|
||||
; NEWPM-NEXT: br i1 [[CMP510_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT:%.*]], label [[FOR_BODY7_LR_PH_I:%.*]]
|
||||
; NEWPM: for.body7.lr.ph.i:
|
||||
; NEWPM-NEXT: [[BASE_I6_I:%.*]] = getelementptr inbounds [[CLASS_FLOATVECPAIR]], %class.FloatVecPair* [[FVP]], i64 0, i32 0, i32 0
|
||||
; NEWPM-NEXT: [[TMP2:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]
|
||||
; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP2]], i64 undef, i32 0
|
||||
; NEWPM-NEXT: [[TMP3:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
|
||||
; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP3]], i64 undef
|
||||
; NEWPM-NEXT: [[BASE_I4_PEEL_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0
|
||||
; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I4_PEEL_I]], align 8, !tbaa [[TBAA8]]
|
||||
; NEWPM-NEXT: [[ARRAYIDX_I5_PEEL_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef
|
||||
; NEWPM-NEXT: [[TMP5:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
|
||||
; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
|
||||
; NEWPM-NEXT: [[ADD_PEEL_I:%.*]] = fadd float [[TMP5]], [[TMP6]]
|
||||
; NEWPM-NEXT: store float [[ADD_PEEL_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
|
||||
; NEWPM-NEXT: [[EXITCOND_PEEL_NOT_I:%.*]] = icmp eq i32 [[TMP1]], 1
|
||||
; NEWPM-NEXT: br i1 [[EXITCOND_PEEL_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE:%.*]]
|
||||
; NEWPM: for.body7.lr.ph.i.for.body7.i_crit_edge:
|
||||
; NEWPM-NEXT: [[INC_I_1:%.*]] = add nuw i32 1, 1
|
||||
; NEWPM-NEXT: [[BASE_I4_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP0]], i64 undef, i32 0
|
||||
; NEWPM-NEXT: [[TMP2:%.*]] = load float*, float** [[BASE_I4_I]], align 8, !tbaa [[TBAA8:![0-9]+]]
|
||||
; NEWPM-NEXT: [[ARRAYIDX_I5_I:%.*]] = getelementptr inbounds float, float* [[TMP2]], i64 undef
|
||||
; NEWPM-NEXT: [[TMP3:%.*]] = load %class.HomemadeVector.0*, %class.HomemadeVector.0** [[BASE_I6_I]], align 8, !tbaa [[TBAA0]]
|
||||
; NEWPM-NEXT: [[BASE_I8_I:%.*]] = getelementptr inbounds [[CLASS_HOMEMADEVECTOR_0]], %class.HomemadeVector.0* [[TMP3]], i64 undef, i32 0
|
||||
; NEWPM-NEXT: [[TMP4:%.*]] = load float*, float** [[BASE_I8_I]], align 8, !tbaa [[TBAA8]]
|
||||
; NEWPM-NEXT: [[ARRAYIDX_I9_I:%.*]] = getelementptr inbounds float, float* [[TMP4]], i64 undef
|
||||
; NEWPM-NEXT: [[DOTPRE_I:%.*]] = load float, float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9:![0-9]+]]
|
||||
; NEWPM-NEXT: br label [[FOR_BODY7_I:%.*]]
|
||||
; NEWPM: for.body7.i:
|
||||
; NEWPM-NEXT: [[TMP7:%.*]] = phi float [ [[ADD_I:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE:%.*]] ], [ [[ADD_PEEL_I]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ]
|
||||
; NEWPM-NEXT: [[INC_I_PHI:%.*]] = phi i32 [ [[INC_I_0:%.*]], [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]] ], [ [[INC_I_1]], [[FOR_BODY7_LR_PH_I_FOR_BODY7_I_CRIT_EDGE]] ]
|
||||
; NEWPM-NEXT: [[TMP8:%.*]] = load float, float* [[ARRAYIDX_I5_PEEL_I]], align 4, !tbaa [[TBAA9]]
|
||||
; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP7]], [[TMP8]]
|
||||
; NEWPM-NEXT: [[TMP5:%.*]] = phi float [ [[DOTPRE_I]], [[FOR_BODY7_LR_PH_I]] ], [ [[ADD_I:%.*]], [[FOR_BODY7_I]] ]
|
||||
; NEWPM-NEXT: [[J_011_I:%.*]] = phi i32 [ 0, [[FOR_BODY7_LR_PH_I]] ], [ [[INC_I:%.*]], [[FOR_BODY7_I]] ]
|
||||
; NEWPM-NEXT: [[TMP6:%.*]] = load float, float* [[ARRAYIDX_I5_I]], align 4, !tbaa [[TBAA9]]
|
||||
; NEWPM-NEXT: [[ADD_I]] = fadd float [[TMP5]], [[TMP6]]
|
||||
; NEWPM-NEXT: store float [[ADD_I]], float* [[ARRAYIDX_I9_I]], align 4, !tbaa [[TBAA9]]
|
||||
; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I_PHI]], [[TMP1]]
|
||||
; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I_FOR_BODY7_I_CRIT_EDGE]], !llvm.loop [[LOOP11:![0-9]+]]
|
||||
; NEWPM: for.body7.i.for.body7.i_crit_edge:
|
||||
; NEWPM-NEXT: [[INC_I_0]] = add nuw i32 [[INC_I_PHI]], 1
|
||||
; NEWPM-NEXT: br label [[FOR_BODY7_I]]
|
||||
; NEWPM-NEXT: [[INC_I]] = add nuw i32 [[J_011_I]], 1
|
||||
; NEWPM-NEXT: [[EXITCOND_NOT_I:%.*]] = icmp eq i32 [[INC_I]], [[TMP1]]
|
||||
; NEWPM-NEXT: br i1 [[EXITCOND_NOT_I]], label [[_ZN12FLOATVECPAIR6VECINCEV_EXIT]], label [[FOR_BODY7_I]], !llvm.loop [[LOOP11:![0-9]+]]
|
||||
; NEWPM: _ZN12FloatVecPair6vecIncEv.exit:
|
||||
; NEWPM-NEXT: ret void
|
||||
;
|
||||
|
|
|
@ -15,18 +15,18 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {
|
|||
; CHECK-LABEL: @vdiv(
|
||||
; CHECK-NEXT: entry:
|
||||
; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[N:%.*]], 0
|
||||
; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_END:%.*]]
|
||||
; CHECK: for.body.lr.ph:
|
||||
; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_END:%.*]]
|
||||
; CHECK: for.body.preheader:
|
||||
; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
|
||||
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4
|
||||
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER:%.*]], label [[VECTOR_MEMCHECK:%.*]]
|
||||
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[FOR_BODY_PREHEADER8:%.*]], label [[VECTOR_MEMCHECK:%.*]]
|
||||
; CHECK: vector.memcheck:
|
||||
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr double, double* [[X:%.*]], i64 [[WIDE_TRIP_COUNT]]
|
||||
; CHECK-NEXT: [[SCEVGEP6:%.*]] = getelementptr double, double* [[Y:%.*]], i64 [[WIDE_TRIP_COUNT]]
|
||||
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt double* [[SCEVGEP6]], [[X]]
|
||||
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt double* [[SCEVGEP]], [[Y]]
|
||||
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
|
||||
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_PH:%.*]]
|
||||
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY_PREHEADER8]], label [[VECTOR_PH:%.*]]
|
||||
; CHECK: vector.ph:
|
||||
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967292
|
||||
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[A:%.*]], i32 0
|
||||
|
@ -49,39 +49,39 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {
|
|||
; CHECK-NEXT: [[NITER:%.*]] = phi i64 [ [[UNROLL_ITER]], [[VECTOR_PH_NEW]] ], [ [[NITER_NSUB_3:%.*]], [[VECTOR_BODY]] ]
|
||||
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX]]
|
||||
; CHECK-NEXT: [[TMP9:%.*]] = bitcast double* [[TMP8]] to <4 x double>*
|
||||
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, [[TBAA3:!tbaa !.*]], !alias.scope !7
|
||||
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, <4 x double>* [[TMP9]], align 8, !tbaa [[TBAA3:![0-9]+]], !alias.scope !7
|
||||
; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP4]]
|
||||
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX]]
|
||||
; CHECK-NEXT: [[TMP12:%.*]] = bitcast double* [[TMP11]] to <4 x double>*
|
||||
; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP12]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX]], 4
|
||||
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT]]
|
||||
; CHECK-NEXT: [[TMP14:%.*]] = bitcast double* [[TMP13]] to <4 x double>*
|
||||
; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x double>, <4 x double>* [[TMP14]], align 8, !tbaa [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_1]], [[TMP5]]
|
||||
; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT]]
|
||||
; CHECK-NEXT: [[TMP17:%.*]] = bitcast double* [[TMP16]] to <4 x double>*
|
||||
; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: store <4 x double> [[TMP15]], <4 x double>* [[TMP17]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX]], 8
|
||||
; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_1]]
|
||||
; CHECK-NEXT: [[TMP19:%.*]] = bitcast double* [[TMP18]] to <4 x double>*
|
||||
; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x double>, <4 x double>* [[TMP19]], align 8, !tbaa [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[TMP20:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_2]], [[TMP6]]
|
||||
; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_1]]
|
||||
; CHECK-NEXT: [[TMP22:%.*]] = bitcast double* [[TMP21]] to <4 x double>*
|
||||
; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: store <4 x double> [[TMP20]], <4 x double>* [[TMP22]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX]], 12
|
||||
; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_NEXT_2]]
|
||||
; CHECK-NEXT: [[TMP24:%.*]] = bitcast double* [[TMP23]] to <4 x double>*
|
||||
; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x double>, <4 x double>* [[TMP24]], align 8, !tbaa [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[TMP25:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_3]], [[TMP7]]
|
||||
; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_NEXT_2]]
|
||||
; CHECK-NEXT: [[TMP27:%.*]] = bitcast double* [[TMP26]] to <4 x double>*
|
||||
; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: store <4 x double> [[TMP25]], <4 x double>* [[TMP27]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: [[INDEX_NEXT_3]] = add i64 [[INDEX]], 16
|
||||
; CHECK-NEXT: [[NITER_NSUB_3]] = add i64 [[NITER]], -4
|
||||
; CHECK-NEXT: [[NITER_NCMP_3:%.*]] = icmp eq i64 [[NITER_NSUB_3]], 0
|
||||
; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]]
|
||||
; CHECK-NEXT: br i1 [[NITER_NCMP_3]], label [[MIDDLE_BLOCK_UNR_LCSSA]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
|
||||
; CHECK: middle.block.unr-lcssa:
|
||||
; CHECK-NEXT: [[INDEX_UNR:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT_3]], [[VECTOR_BODY]] ]
|
||||
; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
|
||||
|
@ -94,78 +94,78 @@ define void @vdiv(double* %x, double* %y, double %a, i32 %N) #0 {
|
|||
; CHECK-NEXT: [[EPIL_ITER:%.*]] = phi i64 [ [[XTRAITER]], [[VECTOR_BODY_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.*]], [[VECTOR_BODY_EPIL]] ]
|
||||
; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDEX_EPIL]]
|
||||
; CHECK-NEXT: [[TMP30:%.*]] = bitcast double* [[TMP29]] to <4 x double>*
|
||||
; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[WIDE_LOAD_EPIL:%.*]] = load <4 x double>, <4 x double>* [[TMP30]], align 8, !tbaa [[TBAA3]], !alias.scope !7
|
||||
; CHECK-NEXT: [[TMP31:%.*]] = fmul fast <4 x double> [[WIDE_LOAD_EPIL]], [[TMP28]]
|
||||
; CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDEX_EPIL]]
|
||||
; CHECK-NEXT: [[TMP33:%.*]] = bitcast double* [[TMP32]] to <4 x double>*
|
||||
; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: store <4 x double> [[TMP31]], <4 x double>* [[TMP33]], align 8, !tbaa [[TBAA3]], !alias.scope !10, !noalias !7
|
||||
; CHECK-NEXT: [[INDEX_NEXT_EPIL]] = add i64 [[INDEX_EPIL]], 4
|
||||
; CHECK-NEXT: [[EPIL_ITER_SUB]] = add i64 [[EPIL_ITER]], -1
|
||||
; CHECK-NEXT: [[EPIL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[EPIL_ITER_SUB]], 0
|
||||
; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], [[LOOP14:!llvm.loop !.*]]
|
||||
; CHECK-NEXT: br i1 [[EPIL_ITER_CMP_NOT]], label [[MIDDLE_BLOCK]], label [[VECTOR_BODY_EPIL]], !llvm.loop [[LOOP14:![0-9]+]]
|
||||
; CHECK: middle.block:
|
||||
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
|
||||
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER]]
|
||||
; CHECK: for.body.preheader:
|
||||
; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_LR_PH]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
|
||||
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8]]
|
||||
; CHECK: for.body.preheader8:
|
||||
; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
|
||||
; CHECK-NEXT: [[TMP34:%.*]] = xor i64 [[INDVARS_IV_PH]], -1
|
||||
; CHECK-NEXT: [[TMP35:%.*]] = add nsw i64 [[TMP34]], [[WIDE_TRIP_COUNT]]
|
||||
; CHECK-NEXT: [[XTRAITER8:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3
|
||||
; CHECK-NEXT: [[LCMP_MOD9_NOT:%.*]] = icmp eq i64 [[XTRAITER8]], 0
|
||||
; CHECK-NEXT: br i1 [[LCMP_MOD9_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]]
|
||||
; CHECK-NEXT: [[XTRAITER9:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 3
|
||||
; CHECK-NEXT: [[LCMP_MOD10_NOT:%.*]] = icmp eq i64 [[XTRAITER9]], 0
|
||||
; CHECK-NEXT: br i1 [[LCMP_MOD10_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL_PREHEADER:%.*]]
|
||||
; CHECK: for.body.prol.preheader:
|
||||
; CHECK-NEXT: [[TMP36:%.*]] = fdiv fast double 1.000000e+00, [[A]]
|
||||
; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]
|
||||
; CHECK: for.body.prol:
|
||||
; CHECK-NEXT: [[INDVARS_IV_PROL:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.*]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]
|
||||
; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER8]], [[FOR_BODY_PROL_PREHEADER]] ]
|
||||
; CHECK-NEXT: [[PROL_ITER:%.*]] = phi i64 [ [[PROL_ITER_SUB:%.*]], [[FOR_BODY_PROL]] ], [ [[XTRAITER9]], [[FOR_BODY_PROL_PREHEADER]] ]
|
||||
; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_PROL]]
|
||||
; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: [[T0_PROL:%.*]] = load double, double* [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_PROL]], [[TMP36]]
|
||||
; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_PROL]]
|
||||
; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: store double [[TMP37]], double* [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1
|
||||
; CHECK-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1
|
||||
; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0
|
||||
; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], [[LOOP16:!llvm.loop !.*]]
|
||||
; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP16:![0-9]+]]
|
||||
; CHECK: for.body.prol.loopexit:
|
||||
; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]
|
||||
; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER8]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]
|
||||
; CHECK-NEXT: [[TMP38:%.*]] = icmp ult i64 [[TMP35]], 3
|
||||
; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER_NEW:%.*]]
|
||||
; CHECK: for.body.preheader.new:
|
||||
; CHECK-NEXT: br i1 [[TMP38]], label [[FOR_END]], label [[FOR_BODY_PREHEADER8_NEW:%.*]]
|
||||
; CHECK: for.body.preheader8.new:
|
||||
; CHECK-NEXT: [[TMP39:%.*]] = fdiv fast double 1.000000e+00, [[A]]
|
||||
; CHECK-NEXT: [[TMP40:%.*]] = fdiv fast double 1.000000e+00, [[A]]
|
||||
; CHECK-NEXT: [[TMP41:%.*]] = fdiv fast double 1.000000e+00, [[A]]
|
||||
; CHECK-NEXT: [[TMP42:%.*]] = fdiv fast double 1.000000e+00, [[A]]
|
||||
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
|
||||
; CHECK: for.body:
|
||||
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ]
|
||||
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER8_NEW]] ], [ [[INDVARS_IV_NEXT_3:%.*]], [[FOR_BODY]] ]
|
||||
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV]]
|
||||
; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: [[T0:%.*]] = load double, double* [[ARRAYIDX]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[TMP43:%.*]] = fmul fast double [[T0]], [[TMP39]]
|
||||
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV]]
|
||||
; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: store double [[TMP43]], double* [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
|
||||
; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT]]
|
||||
; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: [[T0_1:%.*]] = load double, double* [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[TMP44:%.*]] = fmul fast double [[T0_1]], [[TMP40]]
|
||||
; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT]]
|
||||
; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: store double [[TMP44]], double* [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
|
||||
; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_1]]
|
||||
; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: [[T0_2:%.*]] = load double, double* [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[TMP45:%.*]] = fmul fast double [[T0_2]], [[TMP41]]
|
||||
; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_1]]
|
||||
; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: store double [[TMP45]], double* [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
|
||||
; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, double* [[Y]], i64 [[INDVARS_IV_NEXT_2]]
|
||||
; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: [[T0_3:%.*]] = load double, double* [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[TMP46:%.*]] = fmul fast double [[T0_3]], [[TMP42]]
|
||||
; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, double* [[X]], i64 [[INDVARS_IV_NEXT_2]]
|
||||
; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, [[TBAA3]]
|
||||
; CHECK-NEXT: store double [[TMP46]], double* [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]]
|
||||
; CHECK-NEXT: [[INDVARS_IV_NEXT_3]] = add nuw nsw i64 [[INDVARS_IV]], 4
|
||||
; CHECK-NEXT: [[EXITCOND_NOT_3:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_3]], [[WIDE_TRIP_COUNT]]
|
||||
; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]
|
||||
; CHECK-NEXT: br i1 [[EXITCOND_NOT_3]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
|
||||
; CHECK: for.end:
|
||||
; CHECK-NEXT: ret void
|
||||
;
|
||||
|
|
|
@ -76,18 +76,20 @@ define void @_Z4loopi(i32 %width) {
|
|||
; ROTATED_LATER_OLDPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1
|
||||
; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]
|
||||
; ROTATED_LATER_OLDPM: for.cond.preheader:
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1
|
||||
; ROTATED_LATER_OLDPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]]
|
||||
; ROTATED_LATER_OLDPM: for.body.preheader:
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0
|
||||
; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]]
|
||||
; ROTATED_LATER_OLDPM-NEXT: br label [[FOR_BODY:%.*]]
|
||||
; ROTATED_LATER_OLDPM: for.cond.cleanup:
|
||||
; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()
|
||||
; ROTATED_LATER_OLDPM-NEXT: tail call void @f2()
|
||||
; ROTATED_LATER_OLDPM-NEXT: br label [[RETURN]]
|
||||
; ROTATED_LATER_OLDPM: for.body:
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ]
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
|
||||
; ROTATED_LATER_OLDPM-NEXT: tail call void @f0()
|
||||
; ROTATED_LATER_OLDPM-NEXT: tail call void @f1()
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw i32 [[I_04]], 1
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[INC]] = add nuw nsw i32 [[I_04]], 1
|
||||
; ROTATED_LATER_OLDPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]]
|
||||
; ROTATED_LATER_OLDPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
|
||||
; ROTATED_LATER_OLDPM: return:
|
||||
|
@ -98,24 +100,24 @@ define void @_Z4loopi(i32 %width) {
|
|||
; ROTATED_LATER_NEWPM-NEXT: [[CMP:%.*]] = icmp slt i32 [[WIDTH:%.*]], 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]
|
||||
; ROTATED_LATER_NEWPM: for.cond.preheader:
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]]
|
||||
; ROTATED_LATER_NEWPM: for.body.preheader:
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0
|
||||
; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]]
|
||||
; ROTATED_LATER_NEWPM: for.cond.preheader.for.body_crit_edge:
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw i32 0, 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_1:%.*]] = add nuw nsw i32 0, 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY:%.*]]
|
||||
; ROTATED_LATER_NEWPM: for.cond.cleanup:
|
||||
; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()
|
||||
; ROTATED_LATER_NEWPM-NEXT: tail call void @f2()
|
||||
; ROTATED_LATER_NEWPM-NEXT: br label [[RETURN]]
|
||||
; ROTATED_LATER_NEWPM: for.body:
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ]
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ]
|
||||
; ROTATED_LATER_NEWPM-NEXT: tail call void @f0()
|
||||
; ROTATED_LATER_NEWPM-NEXT: tail call void @f1()
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]]
|
||||
; ROTATED_LATER_NEWPM-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]
|
||||
; ROTATED_LATER_NEWPM: for.body.for.body_crit_edge:
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw i32 [[INC_PHI]], 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: [[INC_0]] = add nuw nsw i32 [[INC_PHI]], 1
|
||||
; ROTATED_LATER_NEWPM-NEXT: br label [[FOR_BODY]]
|
||||
; ROTATED_LATER_NEWPM: return:
|
||||
; ROTATED_LATER_NEWPM-NEXT: ret void
|
||||
|
|
Loading…
Reference in New Issue