VectorizerStart extension is module callback in old PM, but is function
callback in new PM. We lack a module extension point between end of
buildModuleSimplificationPipeline and the function optimization
(including vectorizer) pipeline. So this patch adds a new module
extension point before the function optimization pipeline.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D122296
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and
CGSCC passes don't run on unreachable functions. Normally GlobalDCE will
come along and delete unreachable functions, but we don't run GlobalDCE
under -O0, so an unreachable function with coroutine intrinsics may
never have CoroSplit run on it.
This patch adds GlobalDCE when coroutines intrinsics are present. It
also now runs all coroutine passes conditional when coroutine intrinsics
are present. This should also solve the -O0 regression reported in
D105877 due to LazyCallGraph construction.
Fixes https://github.com/llvm/llvm-project/issues/54117
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D122275
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.
This allows reproducing #54023 using opt.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D121944
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121327
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
This is a clean-up patch. The functional pass was rolled into the module pass in D112732.
Reviewed By: vitalybuka, aeubanks
Differential Revision: https://reviews.llvm.org/D120674
This PR adds two extension points to the default LTO pipeline in PassBuilder, one at the beginning and one at the end. These two extension points already existed in the old pass manager, the aim is to replicate the same functionality in the new one.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120491
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.
This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D119965
The LTO support for OpenMP offloading allows us to run the OpenMPOpt
pass during the LTO pipeline. This patch introduces an early run of the
Module pass and a late run of the CGSCC pass. These are quick no-ops if
there is no OpenMP in the module.
Depends on D118198
Differential Revision: https://reviews.llvm.org/D118611
Adding -debugify and -check-debugify in the PassRegistry will make
sure the passes are listed properly by -print-pipeline-passes as
well as -print-passes.
It also allows removal of the custom pipeline parsing callback that
has been used in the NewPMDriver.
Differential Revision: https://reviews.llvm.org/D118369
In D110057 we moved LoopFlatten to a LoopPassManager. This caused a performance
regression for our 64-bit targets (the 32-bit were unaffected), the pass is no
longer triggering for a motivating example. The reason is that the IR is just
very different than expected; we try to match loop statements and particular
uses of induction variables. The easiest is to just move LoopFlatten to a place
in the pipeline where the IR is as expected, which is just before
IndVarSimplify. This means we move it from LPM2 to LPM1, so that it actually
runs just a bit earlier from where it was running before. IndVarSimplify is
responsible for significant rewrites that are difficult to "look through" in
LoopFlatten.
Differential Revision: https://reviews.llvm.org/D116612
This was reverted because of a performance regression, which is fixed by
D116612 that I will commit directly after this change.
This reverts commit e92d63b467.
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
This creates a way to configure MSAN to for eager checks that will be leveraged
by the introduction of a clang flag (-fsanitize-memory-param-retval).
This is redundant with the existing flag: -mllvm -msan-eager-checks.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D116855
This commit caused performance regressions due to differences in the
expected code during loop flattening. Reverting it until the fix is
ready, which hopefully wont take too long.
This reverts commit 86825fc2fb.
This patch adds a couple of NewPM function passes (dot-dom and
dot-dom-only) that dump DomTree into .dot files.
Reviewed-By: aeubanks
Differential Revision: https://reviews.llvm.org/D116629
In D109958 it was noticed that we could optimise the pipeline and avoid
rerunning LoopSimplify/LCSSA for LoopFlatten by moving it to a LoopPassManager.
Differential Revision: https://reviews.llvm.org/D110057
Summary:
A new option exec-on-ir-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The exec-on-ir-change=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected. Note that
this is only supported with the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
This patch uses a similar trick as in D113947 to only run the extra
passes after vectorization on functions where loops have been
vectorized.
The reason for running the 'extra vector passes' is
simplification/unswitching of the runtime checks created by LV, there
should be no need to run them if nothing got vectorized
To do that, a new dummy analysis ShouldRunExtraVectorPasses has been
added. If loops have been vectorized for a function, LV will cache the
analysis. At the moment it uses MadeCFGChanges as proxy for loop
vectorized, which isn't perfect (it could be too aggressive, e.g.
because no runtime checks have been added), but should be good enough
for now.
The extra passes are now managed by a new FunctionPassManager that
runs its passes only if ShouldRunExtraVectorPasses has been cached.
Without this patch, `-extra-vectorizer-passes` has the following
compile-time impact:
NewPM-O3: +4.86%
NewPM-ReleaseThinLTO: +3.56%
NewPM-ReleaseLTO-g: +7.17%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions
With this patch, that gets reduced to
NewPM-O3: +1.43%
NewPM-ReleaseThinLTO: +1.00%
NewPM-ReleaseLTO-g: +1.58%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions
It is probably still too high to enable by default, but much better.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115052
Reverts 02940d6d22. Fixes breakage in the modules build.
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
Summary:
A new option test-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The test-changed=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
Summary:
The Colours array is apparently the source of TSAN errors. It is
unnecessary and was there to ease readability of the code. Remove it to
clean up the TSAN errors.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D115175
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
MergeFunctions (as well as HotColdSplitting an IROutliner) are
incorrectly scheduled under the new pass manager. The code makes
it look like they run towards the end of the module optimization
pipeline (as they should), while in reality the run at the start.
This is because the OptimizePM populated around them is only
scheduled later.
I'm fixing this by moving these three passes until after OptimizePM
to avoid splitting the function pass pipeline. It doesn't seem
important to me that some of the function passes run after these
late module passes.
Differential Revision: https://reviews.llvm.org/D115098
Swap AIC and IC neighbouring in pipeline. This looks more natural and even
almost has no effect for now (three slightly touched tests of test-suite). Also
this could be the first step towards merging AIC (or its part) to -O2 pipeline.
After several changes in AIC (like D108091, D108201, D107766, D109515, D109236)
there've been observed several regressions (like PR52078, PR52253, PR52289)
that were fixed in different passes (see D111330, D112721) by extending their
functionality, but these regressions were exposed since changed AIC prevents IC
from making some of early optimizations.
This is common problem and it should be fixed by just moving AIC after IC
which looks more logically by itself: make aggressive instruction combining
only after failed ordinary one.
Fixes PR52289
Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D113179
Add an -enable-merge-functions option to allow testing of function
merging as it will actually happen in the optimization pipeline.
Based on that add a test where we currently produce two identical
functions without merging them due to incorrect pass scheduling
under the new pass manager.
The FunctionSimplificationPipeline could effectively reduce the size of .text section when module inliner is enabled.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D114704