llvm-project/llvm/test/Transforms/PhaseOrdering
Roman Lebedev 9c2469c1dd
[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/

We manage to deduce that the answer does not require looping,
but we do that after the last `LoopDeletion` pass run,
so we end up being stuck with a dead loop.

Now, as with all things SCEV, this has
a very expected ~`+0.12%` compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
(for comparison, doing that in function simplification pipeline
would have been ~`+0.5` compile time performance regression, D112840)

Looking at the transformation stats over vanilla test-suite, i think it's rather expected:
```
| statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 |    99 | 12.55% | 12.55% |
| scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 | 12308 | 11.66% | 11.66% |
| loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
| regalloc.numExtends                              |        81 |        79 |    -2 | -2.47% |  2.47% |
| indvars.NumFoldedUser                            |       408 |       400 |    -8 | -1.96% |  1.96% |
| indvars.NumElimCmp                               |      3831 |      3758 |   -73 | -1.91% |  1.91% |
| scalar-evolution.NumTripCountsComputed           |    299759 |    304278 |  4519 |  1.51% |  1.51% |
| loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |
| machine-cse.NumCommutes                          |       111 |       110 |    -1 | -0.90% |  0.90% |
| globaldce.NumFunctions                           |      1187 |      1192 |     5 |  0.42% |  0.42% |
| codegenprepare.NumSelectsExpanded                |       277 |       278 |     1 |  0.36% |  0.36% |
| loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 |   -50 | -0.36% |  0.36% |
| machinelicm.NumPostRAHoisted                     |      1168 |      1172 |     4 |  0.34% |  0.34% |
| phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 |  -175 | -0.21% |  0.21% |
| machine-cse.NumPREs                              |      3085 |      3079 |    -6 | -0.19% |  0.19% |
| branch-folder.NumBranchOpts                      |    108122 |    107942 |  -180 | -0.17% |  0.17% |
| loop-unroll.NumUnrolled                          |     40136 |     40067 |   -69 | -0.17% |  0.17% |
| branch-folder.NumDeadBlocks                      |    130818 |    130607 |  -211 | -0.16% |  0.16% |
| codegenprepare.NumBlocksElim                     |     92856 |     92714 |  -142 | -0.15% |  0.15% |
| instsimplify.NumSimplified                       |    103263 |    103129 |  -134 | -0.13% |  0.13% |
| instcombine.NumConstProp                         |     26070 |     26102 |    32 |  0.12% |  0.12% |
| instsimplify.NumExpand                           |      1716 |      1718 |     2 |  0.12% |  0.12% |
| loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 |   -11 | -0.12% |  0.12% |
| branch-folder.NumHoist                           |      2773 |      2770 |    -3 | -0.11% |  0.11% |
| regalloc.NumReloadsRemoved                       |     10822 |     10834 |    12 |  0.11% |  0.11% |
| regalloc.NumSnippets                             |     11394 |     11406 |    12 |  0.11% |  0.11% |
| machine-cse.NumCrossBBCSEs                       |      1052 |      1053 |     1 |  0.10% |  0.10% |
| machinelicm.NumCSEed                             |     99887 |     99784 |  -103 | -0.10% |  0.10% |
| branch-folder.NumTailMerge                       |     72501 |     72435 |   -66 | -0.09% |  0.09% |
| codegenprepare.NumExtUses                        |     22007 |     21987 |   -20 | -0.09% |  0.09% |
| local.NumRemoved                                 |     68232 |     68294 |    62 |  0.09% |  0.09% |
| loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 |   -70 | -0.09% |  0.09% |
```

Note that i'm only changing current PM, and not touching obsolete PM.

This is an alternative to the function simplification pipeline variant
of the same change, D112840. It has both less compile time impact
(since the additional number of SCEV trip count calculations
is way lass less than with the D112840), and it is
much more powerful/impactful (almost 2x more loops deleted).

I have checked, and doing this after loop rotation
is favorable (more loops deleted).

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D112851
2021-11-03 19:24:49 +03:00
..
AArch64 [VectorCombine] Add option to only run scalarization transforms. 2021-10-15 20:35:58 +01:00
ARM [ARM] Workaround tailpredication min/max costmodel 2021-08-30 19:19:51 +01:00
X86 [PhaseOrdering] add tests for x86 abs/max using SSE intrinsics (PR34047); NFC 2021-11-03 09:13:23 -04:00
2010-03-22-empty-baseclass.ll
PR6627.ll
assume-explosion.ll Revert "[NFC] remove explicit default value for strboolattr attribute in tests" 2021-05-24 19:43:40 +02:00
basic.ll
bitfield-bittests.ll
d83507-knowledge-retention-bug.ll [SimplifyCFG] Look for control flow changes instead of side effects. 2021-05-03 13:32:22 -07:00
deletion-of-loops-that-became-side-effect-free.ll [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes 2021-11-03 19:24:49 +03:00
expect.ll [PassManager][PhaseOrdering] lower expects before running simplifyCFG 2021-04-12 15:07:53 -04:00
gdce.ll
globalaa-retained.ll
inlining-alignment-assumptions.ll Re-apply "[JumpThreading] Ignore free instructions" 2021-09-24 18:52:30 +02:00
instcombine-sroa-inttoptr.ll
lifetime-sanitizer.ll
loop-rotation-vs-common-code-hoisting.ll [NewPM] Remove SpeculateAroundPHIs pass 2021-06-15 20:35:55 +03:00
lto-licm.ll [opt] Remove some legacy PM flags 2021-09-13 15:50:03 -07:00
min-max-abs-cse.ll
minmax.ll
openmp-opt-module.ll [NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM 2021-10-20 15:16:17 +02:00
partialord-ule.ll
pr32544.ll [PhaseOrdering] Add PR32544 test coverage 2021-04-25 11:05:32 +01:00
pr36760.ll [PhaseOrdering] Add second test case for PR36760 2021-04-20 17:27:24 +01:00
pr39116.ll [NFC] Added testcase for PR39116 2021-09-04 10:52:46 +02:00
pr39282.ll
pr40750.ll [NFC] Added testcase for PR40750 2021-09-02 22:44:03 +02:00
pr44461-br-to-switch-rotate.ll
pr45682.ll [PhaseOrdering] Add test case for PR45682 2021-04-21 15:07:00 +01:00
pr45687.ll
reassociate-after-unroll.ll
reassociate-instcombine.ll Add test to check we can instcombine after reassociate. NFC. 2021-10-21 12:27:26 -07:00
rotate.ll
scev-custom-dl.ll [test] Fixup tests with -analyze in llvm/test/Transforms 2021-09-04 16:45:51 -07:00
scev.ll [test] Fixup tests with -analyze in llvm/test/Transforms 2021-09-04 16:45:51 -07:00
simplifycfg-options.ll
two-shifts-by-sext.ll
unsigned-multiply-overflow-check.ll [InstCombine] Fully disable select to and/or i1 folding 2021-05-06 09:29:52 +09:00
vector-trunc-inseltpoison.ll
vector-trunc.ll