This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.
This allows reproducing #54023 using opt.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D121944
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121327
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
This is a clean-up patch. The functional pass was rolled into the module pass in D112732.
Reviewed By: vitalybuka, aeubanks
Differential Revision: https://reviews.llvm.org/D120674
This PR adds two extension points to the default LTO pipeline in PassBuilder, one at the beginning and one at the end. These two extension points already existed in the old pass manager, the aim is to replicate the same functionality in the new one.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120491
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.
This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D119965
The LTO support for OpenMP offloading allows us to run the OpenMPOpt
pass during the LTO pipeline. This patch introduces an early run of the
Module pass and a late run of the CGSCC pass. These are quick no-ops if
there is no OpenMP in the module.
Depends on D118198
Differential Revision: https://reviews.llvm.org/D118611
Adding -debugify and -check-debugify in the PassRegistry will make
sure the passes are listed properly by -print-pipeline-passes as
well as -print-passes.
It also allows removal of the custom pipeline parsing callback that
has been used in the NewPMDriver.
Differential Revision: https://reviews.llvm.org/D118369
In D110057 we moved LoopFlatten to a LoopPassManager. This caused a performance
regression for our 64-bit targets (the 32-bit were unaffected), the pass is no
longer triggering for a motivating example. The reason is that the IR is just
very different than expected; we try to match loop statements and particular
uses of induction variables. The easiest is to just move LoopFlatten to a place
in the pipeline where the IR is as expected, which is just before
IndVarSimplify. This means we move it from LPM2 to LPM1, so that it actually
runs just a bit earlier from where it was running before. IndVarSimplify is
responsible for significant rewrites that are difficult to "look through" in
LoopFlatten.
Differential Revision: https://reviews.llvm.org/D116612
This was reverted because of a performance regression, which is fixed by
D116612 that I will commit directly after this change.
This reverts commit e92d63b467.
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
This creates a way to configure MSAN to for eager checks that will be leveraged
by the introduction of a clang flag (-fsanitize-memory-param-retval).
This is redundant with the existing flag: -mllvm -msan-eager-checks.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D116855
This commit caused performance regressions due to differences in the
expected code during loop flattening. Reverting it until the fix is
ready, which hopefully wont take too long.
This reverts commit 86825fc2fb.
This patch adds a couple of NewPM function passes (dot-dom and
dot-dom-only) that dump DomTree into .dot files.
Reviewed-By: aeubanks
Differential Revision: https://reviews.llvm.org/D116629
In D109958 it was noticed that we could optimise the pipeline and avoid
rerunning LoopSimplify/LCSSA for LoopFlatten by moving it to a LoopPassManager.
Differential Revision: https://reviews.llvm.org/D110057
Summary:
A new option exec-on-ir-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The exec-on-ir-change=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected. Note that
this is only supported with the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
This patch uses a similar trick as in D113947 to only run the extra
passes after vectorization on functions where loops have been
vectorized.
The reason for running the 'extra vector passes' is
simplification/unswitching of the runtime checks created by LV, there
should be no need to run them if nothing got vectorized
To do that, a new dummy analysis ShouldRunExtraVectorPasses has been
added. If loops have been vectorized for a function, LV will cache the
analysis. At the moment it uses MadeCFGChanges as proxy for loop
vectorized, which isn't perfect (it could be too aggressive, e.g.
because no runtime checks have been added), but should be good enough
for now.
The extra passes are now managed by a new FunctionPassManager that
runs its passes only if ShouldRunExtraVectorPasses has been cached.
Without this patch, `-extra-vectorizer-passes` has the following
compile-time impact:
NewPM-O3: +4.86%
NewPM-ReleaseThinLTO: +3.56%
NewPM-ReleaseLTO-g: +7.17%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions
With this patch, that gets reduced to
NewPM-O3: +1.43%
NewPM-ReleaseThinLTO: +1.00%
NewPM-ReleaseLTO-g: +1.58%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions
It is probably still too high to enable by default, but much better.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115052
Reverts 02940d6d22. Fixes breakage in the modules build.
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
Summary:
A new option test-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The test-changed=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
Summary:
The Colours array is apparently the source of TSAN errors. It is
unnecessary and was there to ease readability of the code. Remove it to
clean up the TSAN errors.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D115175
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
MergeFunctions (as well as HotColdSplitting an IROutliner) are
incorrectly scheduled under the new pass manager. The code makes
it look like they run towards the end of the module optimization
pipeline (as they should), while in reality the run at the start.
This is because the OptimizePM populated around them is only
scheduled later.
I'm fixing this by moving these three passes until after OptimizePM
to avoid splitting the function pass pipeline. It doesn't seem
important to me that some of the function passes run after these
late module passes.
Differential Revision: https://reviews.llvm.org/D115098
Swap AIC and IC neighbouring in pipeline. This looks more natural and even
almost has no effect for now (three slightly touched tests of test-suite). Also
this could be the first step towards merging AIC (or its part) to -O2 pipeline.
After several changes in AIC (like D108091, D108201, D107766, D109515, D109236)
there've been observed several regressions (like PR52078, PR52253, PR52289)
that were fixed in different passes (see D111330, D112721) by extending their
functionality, but these regressions were exposed since changed AIC prevents IC
from making some of early optimizations.
This is common problem and it should be fixed by just moving AIC after IC
which looks more logically by itself: make aggressive instruction combining
only after failed ordinary one.
Fixes PR52289
Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D113179
Add an -enable-merge-functions option to allow testing of function
merging as it will actually happen in the optimization pipeline.
Based on that add a test where we currently produce two identical
functions without merging them due to incorrect pass scheduling
under the new pass manager.
The FunctionSimplificationPipeline could effectively reduce the size of .text section when module inliner is enabled.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D114704
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.
We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.
The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.
Heavily inspired by D98103.
Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113947
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.
So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.
This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.
However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.
Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).
Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructionshttps://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss
D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation. D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113304
To be more consistent with other pass struct names.
There are still more passes that don't end with "Pass", but these are the important ones.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112935