Commit Graph

106 Commits

Author SHA1 Message Date
Arthur Eubanks 0c509dbc7e [NewPM] Add options to PrintPassInstrumentation
To bring D99599's implementation in line with the existing
PrintPassInstrumentation, and to fix a FIXME, add more customizability
to PrintPassInstrumentation.

Introduce three new options. The first takes over the existing
"-debug-pass-manager-verbose" cl::opt.

The second and third option are specific to -fdebug-pass-structure. They
allow indentation, and also don't print analysis queries.

To avoid more golden file tests than necessary, prune down the
-fdebug-pass-structure tests.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102196
2021-05-18 20:59:35 -07:00
Arthur Eubanks d14d84af2f [NewPM] Only invalidate modified functions' analyses in CGSCC passes
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved.

So far this only touches the inliner, argpromotion, funcattrs, and
updateCGAndAnalysisManager(), since they are the most used.

Slight compile time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100917
2021-05-03 17:21:44 -07:00
Arthur Eubanks 2df3426fd1 [NewPM] Invalidate AAManager after populating GlobalsAA
GlobalsAA is only created at the beginning of the inliner pipeline.  If
an AAManager is cached from previous passes, it won't get rebuilt to
include the newly created GlobalsAA.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101379
2021-05-03 16:37:32 -07:00
Evgeny Leviant 6a0283d0d2 [NewPM] Add an option to dump pass structure
Patch adds -debug-pass-structure option to dump pass structure when
new pass manager is used.

Differential revision: https://reviews.llvm.org/D99599
2021-04-29 10:29:42 +03:00
Sanjay Patel 661cc71a1c [PassManager][PhaseOrdering] lower expects before running simplifyCFG
Retry of 330619a3a6 that includes a clang test update.

Original commit message:

If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 15:07:53 -04:00
Sanjay Patel 23ac9d1e6e Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG"
This reverts commit 330619a3a6.
There are clang tests that also need to be updated.
2021-04-12 13:58:54 -04:00
Sanjay Patel 330619a3a6 [PassManager][PhaseOrdering] lower expects before running simplifyCFG
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 12:23:31 -04:00
Nikita Popov 59a2f67011 [LoopRotate] Don't split loop pass manager
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.

My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.

Differential Revision: https://reviews.llvm.org/D99843
2021-04-08 22:05:18 +02:00
Roman Lebedev a26f1bf67e
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                         | 9015930         | 9015799         |  -131 |   0.00% |  0.00% |
| indvars.NumElimCmp                               | 3536            | 3544            |     8 |   0.23% |  0.23% |
| indvars.NumElimExt                               | 36725           | 36580           |  -145 |  -0.39% |  0.39% |
| indvars.NumElimIV                                | 1197            | 1187            |   -10 |  -0.84% |  0.84% |
| indvars.NumElimIdentity                          | 143             | 136             |    -7 |  -4.90% |  4.90% |
| indvars.NumElimRem                               | 4               | 5               |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                                  | 29842           | 29890           |    48 |   0.16% |  0.16% |
| indvars.NumReplaced                              | 2293            | 2227            |   -66 |  -2.88% |  2.88% |
| indvars.NumSimplifiedSDiv                        | 6               | 8               |     2 |  33.33% | 33.33% |
| indvars.NumWidened                               | 26438           | 26329           |  -109 |  -0.41% |  0.41% |
| instcount.TotalBlocks                            | 1178338         | 1173840         | -4498 |  -0.38% |  0.38% |
| instcount.TotalFuncs                             | 111825          | 111829          |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                             | 9905442         | 9896139         | -9303 |  -0.09% |  0.09% |
| lcssa.NumLCSSA                                   | 425871          | 423961          | -1910 |  -0.45% |  0.45% |
| licm.NumHoisted                                  | 378357          | 378753          |   396 |   0.10% |  0.10% |
| licm.NumMovedCalls                               | 2193            | 2208            |    15 |   0.68% |  0.68% |
| licm.NumMovedLoads                               | 35899           | 31821           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           |   -24 |  -0.21% |  0.21% |
| licm.NumSunk                                     | 13359           | 13587           |   228 |   1.71% |  1.71% |
| loop-delete.NumDeleted                           | 8547            | 8402            |  -145 |  -1.70% |  1.70% |
| loop-instsimplify.NumSimplified                  | 12876           | 11890           |  -986 |  -7.66% |  7.66% |
| loop-peel.NumPeeled                              | 1008            | 925             |   -83 |  -8.23% |  8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                           | 42015           | 42003           |   -12 |  -0.03% |  0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             |     2 |   0.83% |  0.83% |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              |  -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             |  -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           |     4 |   0.04% |  0.04% |
| loop-unroll.NumUnrolled                          | 12608           | 12529           |   -79 |  -0.63% |  0.63% |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           |    -1 |  -0.01% |  0.01% |
| mem2reg.NumPHIInsert                             | 192110          | 192106          |    -4 |   0.00% |  0.00% |
| mem2reg.NumSingleStore                           | 637650          | 637643          |    -7 |   0.00% |  0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             |    -2 |  -0.25% |  0.25% |
| scalar-evolution.NumTripCountsComputed           | 283108          | 282934          |  -174 |  -0.06% |  0.06% |
| scalar-evolution.NumTripCountsNotComputed        | 106712          | 106718          |     6 |   0.01% |  0.01% |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              |   -88 | -48.09% | 48.09% |

... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).

What if we instead have LICM both before and after LoopRotate?
| statistic name                                | LoopRotate-LICM | LICM-LoopRotate-LICM |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                      | 9015930         | 9014474              | -1456 |  -0.02% |  0.02% |
| indvars.NumElimCmp                            | 3536            | 3546                 |    10 |   0.28% |  0.28% |
| indvars.NumElimExt                            | 36725           | 36681                |   -44 |  -0.12% |  0.12% |
| indvars.NumElimIV                             | 1197            | 1185                 |   -12 |  -1.00% |  1.00% |
| indvars.NumElimIdentity                       | 143             | 146                  |     3 |   2.10% |  2.10% |
| indvars.NumElimRem                            | 4               | 5                    |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                               | 29842           | 29899                |    57 |   0.19% |  0.19% |
| indvars.NumReplaced                           | 2293            | 2299                 |     6 |   0.26% |  0.26% |
| indvars.NumSimplifiedSDiv                     | 6               | 8                    |     2 |  33.33% | 33.33% |
| indvars.NumWidened                            | 26438           | 26404                |   -34 |  -0.13% |  0.13% |
| instcount.TotalBlocks                         | 1178338         | 1173652              | -4686 |  -0.40% |  0.40% |
| instcount.TotalFuncs                          | 111825          | 111829               |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                          | 9905442         | 9895452              | -9990 |  -0.10% |  0.10% |
| lcssa.NumLCSSA                                | 425871          | 425373               |  -498 |  -0.12% |  0.12% |
| licm.NumHoisted                               | 378357          | 383352               |  4995 |   1.32% |  1.32% |
| licm.NumMovedCalls                            | 2193            | 2204                 |    11 |   0.50% |  0.50% |
| licm.NumMovedLoads                            | 35899           | 35755                |  -144 |  -0.40% |  0.40% |
| licm.NumPromoted                              | 11178           | 11163                |   -15 |  -0.13% |  0.13% |
| licm.NumSunk                                  | 13359           | 14321                |   962 |   7.20% |  7.20% |
| loop-delete.NumDeleted                        | 8547            | 8538                 |    -9 |  -0.11% |  0.11% |
| loop-instsimplify.NumSimplified               | 12876           | 12041                |  -835 |  -6.48% |  6.48% |
| loop-peel.NumPeeled                           | 1008            | 924                  |   -84 |  -8.33% |  8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize      | 368             | 365                  |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                        | 42015           | 42005                |   -10 |  -0.02% |  0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted         | 240             | 241                  |     1 |   0.42% |  0.42% |
| loop-simplifycfg.NumTerminatorsFolded         | 618             | 619                  |     1 |   0.16% |  0.16% |
| loop-unroll.NumCompletelyUnrolled             | 11028           | 11029                |     1 |   0.01% |  0.01% |
| loop-unroll.NumUnrolled                       | 12608           | 12525                |   -83 |  -0.66% |  0.66% |
| mem2reg.NumPHIInsert                          | 192110          | 192073               |   -37 |  -0.02% |  0.02% |
| mem2reg.NumSingleStore                        | 637650          | 637652               |     2 |   0.00% |  0.00% |
| scalar-evolution.NumTripCountsComputed        | 283108          | 282998               |  -110 |  -0.04% |  0.04% |
| scalar-evolution.NumTripCountsNotComputed     | 106712          | 106691               |   -21 |  -0.02% |  0.02% |
| simple-loop-unswitch.NumBranches              | 5178            | 5185                 |     7 |   0.14% |  0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 925                  |    11 |   1.20% |  1.20% |
| simple-loop-unswitch.NumTrivial               | 183             | 179                  |    -4 |  -2.19% |  2.19% |
| simple-loop-unswitch.NumBranches              | 5178            | 4752                 |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 503                  |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches              | 20              | 18                   |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial               | 183             | 95                   |   -88 | -48.09% | 48.09% |

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

| statistic name                                   | LICM-LoopRotate | LICM-LoopRotate-LICM |     Δ |        % |   abs(%) |
| asm-printer.EmittedInsts                         | 9015799         | 9014474              | -1325 |   -0.01% |    0.01% |
| indvars.NumElimCmp                               | 3544            | 3546                 |     2 |    0.06% |    0.06% |
| indvars.NumElimExt                               | 36580           | 36681                |   101 |    0.28% |    0.28% |
| indvars.NumElimIV                                | 1187            | 1185                 |    -2 |   -0.17% |    0.17% |
| indvars.NumElimIdentity                          | 136             | 146                  |    10 |    7.35% |    7.35% |
| indvars.NumLFTR                                  | 29890           | 29899                |     9 |    0.03% |    0.03% |
| indvars.NumReplaced                              | 2227            | 2299                 |    72 |    3.23% |    3.23% |
| indvars.NumWidened                               | 26329           | 26404                |    75 |    0.28% |    0.28% |
| instcount.TotalBlocks                            | 1173840         | 1173652              |  -188 |   -0.02% |    0.02% |
| instcount.TotalInsts                             | 9896139         | 9895452              |  -687 |   -0.01% |    0.01% |
| lcssa.NumLCSSA                                   | 423961          | 425373               |  1412 |    0.33% |    0.33% |
| licm.NumHoisted                                  | 378753          | 383352               |  4599 |    1.21% |    1.21% |
| licm.NumMovedCalls                               | 2208            | 2204                 |    -4 |   -0.18% |    0.18% |
| licm.NumMovedLoads                               | 31821           | 35755                |  3934 |   12.36% |   12.36% |
| licm.NumPromoted                                 | 11154           | 11163                |     9 |    0.08% |    0.08% |
| licm.NumSunk                                     | 13587           | 14321                |   734 |    5.40% |    5.40% |
| loop-delete.NumDeleted                           | 8402            | 8538                 |   136 |    1.62% |    1.62% |
| loop-instsimplify.NumSimplified                  | 11890           | 12041                |   151 |    1.27% |    1.27% |
| loop-peel.NumPeeled                              | 925             | 924                  |    -1 |   -0.11% |    0.11% |
| loop-rotate.NumRotated                           | 42003           | 42005                |     2 |    0.00% |    0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 242             | 241                  |    -1 |   -0.41% |    0.41% |
| loop-simplifycfg.NumLoopExitsDeleted             | 20              | 497                  |   477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded            | 336             | 619                  |   283 |   84.23% |   84.23% |
| loop-unroll.NumCompletelyUnrolled                | 11032           | 11029                |    -3 |   -0.03% |    0.03% |
| loop-unroll.NumUnrolled                          | 12529           | 12525                |    -4 |   -0.03% |    0.03% |
| mem2reg.NumDeadAlloca                            | 10221           | 10222                |     1 |    0.01% |    0.01% |
| mem2reg.NumPHIInsert                             | 192106          | 192073               |   -33 |   -0.02% |    0.02% |
| mem2reg.NumSingleStore                           | 637643          | 637652               |     9 |    0.00% |    0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812             | 814                  |     2 |    0.25% |    0.25% |
| scalar-evolution.NumTripCountsComputed           | 282934          | 282998               |    64 |    0.02% |    0.02% |
| scalar-evolution.NumTripCountsNotComputed        | 106718          | 106691               |   -27 |   -0.03% |    0.03% |
| simple-loop-unswitch.NumBranches                 | 4752            | 5185                 |   433 |    9.11% |    9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 503             | 925                  |   422 |   83.90% |   83.90% |
| simple-loop-unswitch.NumSwitches                 | 18              | 20                   |     2 |   11.11% |   11.11% |
| simple-loop-unswitch.NumTrivial                  | 95              | 179                  |    84 |   88.42% |   88.42% |

{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)

As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.

Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 11:11:42 +03:00
Krasimir Georgiev c51e91e046 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Gulfem Savrun Yeniceri 5178ffc7cf [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Gulfem Savrun Yeniceri 5fbe1fdf17 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5fd001a5ff
because it broke clang-with-thin-lto-ubuntu bot.
2021-03-24 18:59:33 +00:00
Gulfem Savrun Yeniceri 5fd001a5ff [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-24 17:31:18 +00:00
Gulfem Savrun Yeniceri e3a6d70c68 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Gulfem Savrun Yeniceri 78a65cd945 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Roman Lebedev a78d8feb48
[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable 2021-01-30 01:14:50 +03:00
Roman Lebedev a8d74517dc
[PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.

LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.

Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.

While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91800
2020-11-25 19:20:07 +03:00
Florian Hahn 8dbe44cb29 Add pass to add !annotate metadata from @llvm.global.annotations.
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated  using
__attribute__((annotate("_name"))) on functions in Clang.

This has been discussed on llvm-dev as part of
    RFC: Combining Annotation Metadata and Remarks
    http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D91195
2020-11-16 14:57:11 +00:00
Florian Hahn 8bb6347939
Add !annotation metadata and remarks pass.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.

It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.

The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.

To motivate this, consider these specific questions we would like to get answered:

* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?

Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

Reviewed By: thegameg, jdoerfert

Differential Revision: https://reviews.llvm.org/D91188
2020-11-13 13:24:10 +00:00
Florian Hahn 88241ffb56 [Passes] Move ADCE before DSE & LICM.
The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322
2020-10-21 10:30:56 +01:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Florian Hahn 51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14.
2020-10-16 09:02:53 +01:00
Vedant Kumar 273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Florian Hahn 9d172c8e9c Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.

Notable fixes fc82006331, a0017c2bc2.

This reverts commit 3a59628f3c.
2020-09-18 11:05:00 +01:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
David Green 245f846c4e [MemCpyOptimizer] Change required analysis order for BasicAA/PhiValuesAnalysis
This is a followup to 1ccfb52a61, which made a number of changes
including the apparently innocuous reordering of required passes in
MemCpyOptimizer. This however altered the creation order of BasicAA vs
Phi Values analysis, meaning BasicAA did not pick up PhiValues as a
cached result. Instead if we require MemoryDependence first it will
require PhiValuesAnalysis allowing BasicAA to use it for better results.

I don't claim this is an excellent design, but it fixes a nasty little
regressions where a query later in JumpThreading was getting worse
results.

Differential Revision: https://reviews.llvm.org/D87027
2020-09-03 12:01:51 +01:00
Alina Sbirlea 1ccfb52a61 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Arthur Eubanks 2af4c2b2b1 [NewPM] Pin various tests under Other/ to legacy PM
These all are legacy PM-specific or have a corresponding NPM RUN line.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D86124
2020-08-17 18:58:08 -07:00
Zequan Wu 1fbb719470 [LPM] Port CGProfilePass from NPM to LPM
Reviewers: hans, chandlerc!, asbirlea, nikic

Reviewed By: hans, nikic

Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D83013
2020-07-10 09:04:51 -07:00
Fangrui Song c025bdf25a Revert D83013 "[LPM] Port CGProfilePass from NPM to LPM"
This reverts commit c92a8c0a0f.

It breaks builds and has unaddressed review comments.
2020-07-09 13:34:04 -07:00
Zequan Wu c92a8c0a0f [LPM] Port CGProfilePass from NPM to LPM
Reviewers: hans, chandlerc!, asbirlea, nikic

Reviewed By: hans, nikic

Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D83013
2020-07-09 13:03:42 -07:00
Sanjay Patel 57bb4787d7 [Pass Manager] remove EarlyCSE as clean-up for VectorCombine
EarlyCSE was added with D75145, but the motivating test is
not regressed by removing the extra pass now. That might be
because VectorCombine altered the way it processes instructions,
or it might be from (re)moving VectorCombine in the pipeline.

The extra round of EarlyCSE appears to cost approximately
0.26% in compile-time as discussed in D80236, so we need some
evidence to justify its inclusion here, but we do not have
that (yet).

I suspect that between SLP and VectorCombine, we are creating
patterns that InstCombine and/or codegen are not prepared for,
but we will need to reduce those examples and include them as
PhaseOrdering and/or test-suite benchmarks.
2020-05-24 12:36:21 -04:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Sanjay Patel 6438ea45e0 [VectorCombine] position pass after SLP in the optimization pipeline rather than before
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.

SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.

This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.

In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".

Differential Revision: https://reviews.llvm.org/D80236
2020-05-22 12:22:44 -04:00
Evgeniy Brevnov 3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
Tarindu Jayatilaka b43b59fcc0 Expose `attributor-disable` to the new and old pass managers
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76871
2020-04-05 22:29:34 -05:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Sanjay Patel 71a316883d [PassManager] adjust VectorCombine placement
The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs:
https://bugs.llvm.org/show_bug.cgi?id=45015
https://bugs.llvm.org/show_bug.cgi?id=42022

This patch contains a few independent changes:

1. Move the pass up in the pipeline, so it happens just after loop-vectorization.
   This is only to keep vectorization passes together in the pipeline at the moment.
   I don't have evidence of interaction between these yet.
2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was
   partly proposed as far back as rL219644 (which is why it's effectively being moved
   in the old PM code). This is important because the subsequent -instcombine doesn't
   work as well without EarlyCSE. With the CSE, -instcombine is able to squash
   shuffles together in 1 of the tests (because those are simple "select" shuffles).
3. Remove the -vector-combine pass that was running after SLP. We may want to do that
   eventually, but I don't have a test case to support it yet.

Differential Revision: https://reviews.llvm.org/D75145
2020-03-04 11:10:49 -05:00
Alina Sbirlea 1326a5a4cf [LoopRotate] Get and update MSSA only if available in legacy pass manager.
Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408

In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.

This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.

Reviewers: dmgreen, fedor.sergeev, nikic

Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74574
2020-02-14 10:47:26 -08:00
Alina Sbirlea 0cecafd647 [BasicAA] Make BasicAA a cfg pass.
Summary:
Part of the changes in D44564 made BasicAA not CFG only due to it using
PhiAnalysisValues which may have values invalidated.
Subsequent patches (rL340613) appear to have addressed this limitation.

BasicAA should not be invalidated by non-CFG-altering passes.
A concrete example is MemCpyOpt which preserves CFG, but we are testing
it invalidates BasicAA.

llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM

Reviewers: john.brawn, sebpop, hfinkel, brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74353
2020-02-11 11:30:08 -08:00
Sanjay Patel a17f03bd93 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Johannes Doerfert b0c77c36d2 [Attributor] Add an Attributor CGSCC pass and run it
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.

The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D70767
2020-02-08 21:27:34 -06:00
Johannes Doerfert 9548b74a83 [OpenMP] Introduce the OpenMPOpt transformation pass
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.

The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.

This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69930
2020-02-08 14:47:03 -06:00
Francesco Petrogalli 66c120f025 [VectorUtils] Rework the Vector Function Database (VFDatabase).
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.

The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.

Reviewers: uabelho, fhahn, sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72734
2020-01-16 15:08:26 +00:00