2016-02-19 12:09:40 +08:00
|
|
|
; Test the particular pass pipelines have the expected structure. This is
|
|
|
|
; particularly important in order to check that the implicit scheduling of the
|
|
|
|
; legacy pass manager doesn't introduce unexpected structural changes in the
|
|
|
|
; pass pipeline.
|
|
|
|
;
|
2020-08-18 08:48:04 +08:00
|
|
|
; RUN: opt -enable-new-pm=0 -disable-output -disable-verify -debug-pass=Structure \
|
2016-02-19 12:09:40 +08:00
|
|
|
; RUN: -O2 %s 2>&1 \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O2
|
2019-02-06 12:29:39 +08:00
|
|
|
; RUN: llvm-profdata merge %S/Inputs/pass-pipelines.proftext -o %t.profdata
|
2020-08-18 08:48:04 +08:00
|
|
|
; RUN: opt -enable-new-pm=0 -disable-output -disable-verify -debug-pass=Structure \
|
2019-02-06 12:29:39 +08:00
|
|
|
; RUN: -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' \
|
|
|
|
; RUN: -O2 %s 2>&1 \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O2 --check-prefix=PGOUSE
|
2020-10-19 18:23:22 +08:00
|
|
|
; RUN: opt -enable-new-pm=0 -disable-output -disable-verify -debug-pass=Structure \
|
|
|
|
; RUN: -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' \
|
|
|
|
; RUN: -hot-cold-split \
|
|
|
|
; RUN: -O2 %s 2>&1 \
|
|
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O2 --check-prefix=PGOUSE --check-prefix=SPLIT
|
2016-02-19 12:09:40 +08:00
|
|
|
;
|
|
|
|
; In the first pipeline there should just be a function pass manager, no other
|
|
|
|
; pass managers.
|
|
|
|
; CHECK-O2: Pass Arguments:
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: FunctionPass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
;
|
|
|
|
; CHECK-O2: Pass Arguments:
|
|
|
|
; CHECK-O2: ModulePass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; First function pass pipeline just does early opts.
|
2019-11-01 19:43:51 +08:00
|
|
|
; CHECK-O2-COUNT-3: FunctionPass Manager
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; FIXME: It's a bit odd to do dead arg elim in the middle of early opts...
|
|
|
|
; CHECK-O2: Dead Argument Elimination
|
|
|
|
; CHECK-O2-NEXT: FunctionPass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
2017-07-11 14:04:59 +08:00
|
|
|
; Very carefully assert the CGSCC pass pipeline as it is fragile and unusually
|
2016-02-19 12:09:40 +08:00
|
|
|
; susceptible to phase ordering issues.
|
|
|
|
; CHECK-O2: CallGraph Construction
|
2019-02-06 12:29:39 +08:00
|
|
|
; PGOUSE: Call Graph SCC Pass Manager
|
|
|
|
; PGOUSE: Function Integration/Inlining
|
|
|
|
; PGOUSE: PGOInstrumentationUsePass
|
|
|
|
; PGOUSE: PGOIndirectCallPromotion
|
|
|
|
; PGOUSE: CallGraph Construction
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NEXT: Globals Alias Analysis
|
|
|
|
; CHECK-O2-NEXT: Call Graph SCC Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Remove unused exception handling info
|
|
|
|
; CHECK-O2-NEXT: Function Integration/Inlining
|
2019-11-07 13:20:06 +08:00
|
|
|
; CHECK-O2-NEXT: OpenMP specific optimizations
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NEXT: Deduce function attributes
|
|
|
|
; Next up is the main function pass pipeline. It shouldn't be split up and
|
|
|
|
; should contain the main loop pass pipeline as well.
|
|
|
|
; CHECK-O2-NEXT: FunctionPass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.
>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp
Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.
We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) |
| asm-printer.EmittedInsts | 9015930 | 9015799 | -131 | 0.00% | 0.00% |
| indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% |
| indvars.NumElimExt | 36725 | 36580 | -145 | -0.39% | 0.39% |
| indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% |
| indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% |
| indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% |
| indvars.NumLFTR | 29842 | 29890 | 48 | 0.16% | 0.16% |
| indvars.NumReplaced | 2293 | 2227 | -66 | -2.88% | 2.88% |
| indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% |
| indvars.NumWidened | 26438 | 26329 | -109 | -0.41% | 0.41% |
| instcount.TotalBlocks | 1178338 | 1173840 | -4498 | -0.38% | 0.38% |
| instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% |
| instcount.TotalInsts | 9905442 | 9896139 | -9303 | -0.09% | 0.09% |
| lcssa.NumLCSSA | 425871 | 423961 | -1910 | -0.45% | 0.45% |
| licm.NumHoisted | 378357 | 378753 | 396 | 0.10% | 0.10% |
| licm.NumMovedCalls | 2193 | 2208 | 15 | 0.68% | 0.68% |
| licm.NumMovedLoads | 35899 | 31821 | -4078 | -11.36% | 11.36% |
| licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% |
| licm.NumSunk | 13359 | 13587 | 228 | 1.71% | 1.71% |
| loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% |
| loop-instsimplify.NumSimplified | 12876 | 11890 | -986 | -7.66% | 7.66% |
| loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% |
| loop-rotate.NumRotated | 42015 | 42003 | -12 | -0.03% | 0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% |
| loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% |
| loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% |
| mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% |
| mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% |
| mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% |
| scalar-evolution.NumTripCountsComputed | 283108 | 282934 | -174 | -0.06% | 0.06% |
| scalar-evolution.NumTripCountsNotComputed | 106712 | 106718 | 6 | 0.01% | 0.01% |
| simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% |
... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).
What if we instead have LICM both before and after LoopRotate?
| statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | abs(%) |
| asm-printer.EmittedInsts | 9015930 | 9014474 | -1456 | -0.02% | 0.02% |
| indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% |
| indvars.NumElimExt | 36725 | 36681 | -44 | -0.12% | 0.12% |
| indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% |
| indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% |
| indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% |
| indvars.NumLFTR | 29842 | 29899 | 57 | 0.19% | 0.19% |
| indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% |
| indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% |
| indvars.NumWidened | 26438 | 26404 | -34 | -0.13% | 0.13% |
| instcount.TotalBlocks | 1178338 | 1173652 | -4686 | -0.40% | 0.40% |
| instcount.TotalFuncs | 111825 | 111829 | 4 | 0.00% | 0.00% |
| instcount.TotalInsts | 9905442 | 9895452 | -9990 | -0.10% | 0.10% |
| lcssa.NumLCSSA | 425871 | 425373 | -498 | -0.12% | 0.12% |
| licm.NumHoisted | 378357 | 383352 | 4995 | 1.32% | 1.32% |
| licm.NumMovedCalls | 2193 | 2204 | 11 | 0.50% | 0.50% |
| licm.NumMovedLoads | 35899 | 35755 | -144 | -0.40% | 0.40% |
| licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% |
| licm.NumSunk | 13359 | 14321 | 962 | 7.20% | 7.20% |
| loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% |
| loop-instsimplify.NumSimplified | 12876 | 12041 | -835 | -6.48% | 6.48% |
| loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% |
| loop-rotate.NumRotated | 42015 | 42005 | -10 | -0.02% | 0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% |
| loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% |
| loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% |
| loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% |
| mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% |
| mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% |
| scalar-evolution.NumTripCountsComputed | 283108 | 282998 | -110 | -0.04% | 0.04% |
| scalar-evolution.NumTripCountsNotComputed | 106712 | 106691 | -21 | -0.02% | 0.02% |
| simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% |
| simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% |
| simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% |
I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:
| statistic name | LICM-LoopRotate | LICM-LoopRotate-LICM | Δ | % | abs(%) |
| asm-printer.EmittedInsts | 9015799 | 9014474 | -1325 | -0.01% | 0.01% |
| indvars.NumElimCmp | 3544 | 3546 | 2 | 0.06% | 0.06% |
| indvars.NumElimExt | 36580 | 36681 | 101 | 0.28% | 0.28% |
| indvars.NumElimIV | 1187 | 1185 | -2 | -0.17% | 0.17% |
| indvars.NumElimIdentity | 136 | 146 | 10 | 7.35% | 7.35% |
| indvars.NumLFTR | 29890 | 29899 | 9 | 0.03% | 0.03% |
| indvars.NumReplaced | 2227 | 2299 | 72 | 3.23% | 3.23% |
| indvars.NumWidened | 26329 | 26404 | 75 | 0.28% | 0.28% |
| instcount.TotalBlocks | 1173840 | 1173652 | -188 | -0.02% | 0.02% |
| instcount.TotalInsts | 9896139 | 9895452 | -687 | -0.01% | 0.01% |
| lcssa.NumLCSSA | 423961 | 425373 | 1412 | 0.33% | 0.33% |
| licm.NumHoisted | 378753 | 383352 | 4599 | 1.21% | 1.21% |
| licm.NumMovedCalls | 2208 | 2204 | -4 | -0.18% | 0.18% |
| licm.NumMovedLoads | 31821 | 35755 | 3934 | 12.36% | 12.36% |
| licm.NumPromoted | 11154 | 11163 | 9 | 0.08% | 0.08% |
| licm.NumSunk | 13587 | 14321 | 734 | 5.40% | 5.40% |
| loop-delete.NumDeleted | 8402 | 8538 | 136 | 1.62% | 1.62% |
| loop-instsimplify.NumSimplified | 11890 | 12041 | 151 | 1.27% | 1.27% |
| loop-peel.NumPeeled | 925 | 924 | -1 | -0.11% | 0.11% |
| loop-rotate.NumRotated | 42003 | 42005 | 2 | 0.00% | 0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted | 242 | 241 | -1 | -0.41% | 0.41% |
| loop-simplifycfg.NumLoopExitsDeleted | 20 | 497 | 477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded | 336 | 619 | 283 | 84.23% | 84.23% |
| loop-unroll.NumCompletelyUnrolled | 11032 | 11029 | -3 | -0.03% | 0.03% |
| loop-unroll.NumUnrolled | 12529 | 12525 | -4 | -0.03% | 0.03% |
| mem2reg.NumDeadAlloca | 10221 | 10222 | 1 | 0.01% | 0.01% |
| mem2reg.NumPHIInsert | 192106 | 192073 | -33 | -0.02% | 0.02% |
| mem2reg.NumSingleStore | 637643 | 637652 | 9 | 0.00% | 0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812 | 814 | 2 | 0.25% | 0.25% |
| scalar-evolution.NumTripCountsComputed | 282934 | 282998 | 64 | 0.02% | 0.02% |
| scalar-evolution.NumTripCountsNotComputed | 106718 | 106691 | -27 | -0.03% | 0.03% |
| simple-loop-unswitch.NumBranches | 4752 | 5185 | 433 | 9.11% | 9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 503 | 925 | 422 | 83.90% | 83.90% |
| simple-loop-unswitch.NumSwitches | 18 | 20 | 2 | 11.11% | 11.11% |
| simple-loop-unswitch.NumTrivial | 95 | 179 | 84 | 88.42% | 88.42% |
{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)
As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K
This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.
Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 15:40:12 +08:00
|
|
|
; CHECK-O2-NOT: Manager
|
2021-07-08 20:24:03 +08:00
|
|
|
; FIXME: We shouldn't be pulling out to simplifycfg and instcombine and
|
2016-02-19 12:09:40 +08:00
|
|
|
; causing new loop pass managers.
|
|
|
|
; CHECK-O2: Simplify the CFG
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: Combine redundant instructions
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; FIXME: It isn't clear that we need yet another loop pass pipeline
|
|
|
|
; and run of LICM here.
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Loop Invariant Code Motion
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; Next we break out of the main Function passes inside the CGSCC pipeline with
|
|
|
|
; a barrier pass.
|
|
|
|
; CHECK-O2: A No-Op Barrier Pass
|
2016-05-03 23:46:00 +08:00
|
|
|
; CHECK-O2-NEXT: Eliminate Available Externally
|
|
|
|
; Inferring function attribute should be right after the CGSCC pipeline, before
|
|
|
|
; any other optimizations/analyses.
|
2016-05-03 00:53:16 +08:00
|
|
|
; CHECK-O2-NEXT: CallGraph
|
|
|
|
; CHECK-O2-NEXT: Deduce function attributes in RPO
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NOT: Manager
|
2017-10-06 02:23:25 +08:00
|
|
|
; Reduce the size of the IR ASAP after the inliner.
|
|
|
|
; CHECK-O2-NEXT: Global Variable Optimizer
|
|
|
|
; CHECK-O2: Dead Global Elimination
|
2016-02-19 12:09:40 +08:00
|
|
|
; Next is the late function pass pipeline.
|
|
|
|
; CHECK-O2: FunctionPass Manager
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; We rotate loops prior to vectorization.
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Rotate Loops
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: Loop Vectorization
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; CHECK-O2: SLP Vectorizer
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; After vectorization we do partial unrolling.
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Unroll loops
|
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
; After vectorization and unrolling we try to do any cleanup of inserted code,
|
|
|
|
; including a run of LICM. This shouldn't run in the same loop pass manager as
|
|
|
|
; the runtime unrolling though.
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Loop Invariant Code Motion
|
2020-10-19 18:23:22 +08:00
|
|
|
; SPLIT: Hot Cold Splitting
|
2016-11-09 08:58:19 +08:00
|
|
|
; CHECK-O2: FunctionPass Manager
|
|
|
|
; CHECK-O2: Loop Pass Manager
|
|
|
|
; CHECK-O2-NEXT: Loop Sink
|
2017-04-26 20:02:41 +08:00
|
|
|
; CHECK-O2: Simplify the CFG
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NOT: Manager
|
|
|
|
;
|
|
|
|
; FIXME: There really shouldn't be another pass manager, especially one that
|
|
|
|
; just builds the domtree. It doesn't even run the verifier.
|
|
|
|
; CHECK-O2: Pass Arguments:
|
2018-01-31 00:17:22 +08:00
|
|
|
; CHECK-O2: FunctionPass Manager
|
2016-02-19 12:09:40 +08:00
|
|
|
; CHECK-O2-NEXT: Dominator Tree Construction
|
|
|
|
|
|
|
|
define void @foo() {
|
|
|
|
ret void
|
|
|
|
}
|