Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.
Note that this patch does not yet enable much in terms of post-always
inline function optimization.
Differential Revision: https://reviews.llvm.org/D91567
This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.
The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86502
Reverting commit due to address sanitizer errors.
> Extracting the similar regions is the first step in the IROutliner.
>
> Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
> sort them by how many instructions will be removed. Each
> IRSimilarityCandidate is used to define an OutlinableRegion. Each
> region is ordered by their occurrence in the Module and the regions that
> are not compatible with previously outlined regions are discarded.
>
> Each region is then extracted with the CodeExtractor into its own
> function.
>
> We test that correctly extract in:
> test/Transforms/IROutliner/extraction.ll
> test/Transforms/IROutliner/address-taken.ll
> test/Transforms/IROutliner/outlining-same-globals.ll
> test/Transforms/IROutliner/outlining-same-constants.ll
> test/Transforms/IROutliner/outlining-different-structure.ll
>
> Reviewers: paquette, jroelofs, yroux
>
> Differential Revision: https://reviews.llvm.org/D86975
This reverts commit bf899e8913.
Extracting the similar regions is the first step in the IROutliner.
Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed. Each
IRSimilarityCandidate is used to define an OutlinableRegion. Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.
Each region is then extracted with the CodeExtractor into its own
function.
We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: paquette, jroelofs, yroux
Differential Revision: https://reviews.llvm.org/D86975
Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
as it can be seen in the phase-ordering test.
LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
For uncountable ones, IndVars obviously didn't make any change to them,
since they are uncountable, so for them the order should be irrelevant.
For countable ones, well, they should have been countable before IndVars
for IndVars to make any change to them, and since SCEV is used on them,
it shouldn't matter if IndVars have already canonicalized them.
So i don't really see why we'd want the current ordering.
Should this cause issues, it will give us a reproducer test case
that shows flaws in this logic, and we then could adjust accordingly.
While this is quite likely beneficial in-the-wild already,
it's a required part for the full motivational pattern
behind `left-shift-until-bittest` loop idiom (D91038).
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91800
This matches the legacy PM's EP_ModuleOptimizerEarly. Some backends use
this extension point and adding the pass somewhere else like
PipelineStartEPCallback doesn't work.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D91804
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.
This fixes some tests testing this exact behavior in the legacy PM.
Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.
check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).
http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.
The initial land assumed CallBase WeakTrackingVHs would always be
CallBases, but they can be RAUW'd with undef.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89587
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.
This fixes some tests testing this exact behavior in the legacy PM.
Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.
check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).
http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89587
Summary:
[NFC intended] Refactor the code for printChanged for reuse and to facilitate
subsequent reporters of changes to the IR in the new pass manager.
Create abstract template base classes for common functionality and give
classes more appropriate names. The base classes handle all of the
determination of when a function or pass is "interesting" and should be
reported or filtered out. They have pure virtual functions which are called
when a change by a pass has been recognized so the derived class need only
provide the overrides to present the information about the changing IR.
There are at least 2 more change reporters to come (which were presented
in my tutorial at the 2020 llvm developer's meeting) that derive from
these classes.
Respond to review comments: move function out of line, remove inline keyword,
remove unneeded qualifiers, simplify comparison.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks), madhur13490 (Madhur Amilkanthwar)
Differential Revision: https://reviews.llvm.org/D87000
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.
Use this new variant in the NPM -O0 pipeline.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91811
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.
The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.
[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D91585
The design of the PreservedCFG Checker (landed with the commit
28012e00d8) has a fundamental flaw which makes it incorrect.
The checker is based on the PreservedAnalyses result returned
by functional passes: if CFGAnalyses is in the returned
PreservedAnalyses set, then the checker asserts that the CFG
snapshot saved before the pass is equal to the CFG snapshot
taken after the the pass. The problem is in passes that change
CFG and invalidate CFGAnalyses on their own. Such passes do not
return CFGanalyses in the returned PreservedAnalyses. So the
checker mistakenly expects CFG unchanged. As an example see the
class TestSimplifyCFGInvalidatingAnalysisPass in the new tests.
It is interesting that the bug was not found in LLVM. That is
because the CFG checker ran only if CFGAnalyses was checked
incorrectly:
if (!PassPA.allAnalysesInSetPreserved<CFGAnalyses>())
return;
but must be checked as follows:
auto PAC = PA.getChecker<PreservedCFGCheckerAnalysis>();
if (!(PAC.preserved() ||
PAC.preservedSet<AllAnalysesOn<Function>>() ||
PAC.preservedSet<CFGAnalyses>())
return;
A fully redesigned checker will be sent as a separate follow-up
patch.
Reviewed By: Serguei Katkov, Jakub Kuderski
Differential Revision: https://reviews.llvm.org/D91324
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated using
__attribute__((annotate("_name"))) on functions in Clang.
This has been discussed on llvm-dev as part of
RFC: Combining Annotation Metadata and Remarks
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D91195
This is used to test RemoveRedundantDbgInstrs(), which is used by other
passes.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D91477
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Summary:
Add an option -print-before-changed that modifies the print-changed
behaviour so that it prints the IR before a pass that changed it in
addition to printing the IR after the pass. Note that the option
does nothing in isolation. The filtering options work as expected.
Lit tests are included.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D88757
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.
Differential Revision: https://reviews.llvm.org/D90402
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.
Differential Revision: https://reviews.llvm.org/D90940
This instruments a should-run-optional-pass callback using the existing
OptBisect class to decide if new passes should be skipped. Passes that
force isRequired never reach this at all, so they are not included in
"BISECT:" output nor its pass count.
The test case is resurrected from r267022, an early version of D19172
that had new pass manager support (later reverted and redone without).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D87951
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
In order to prevent TargetMachines from adding unnecessary optimization
passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
changed to take an OptimizationLevel, but that will be done separately.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89158
This allows those instrumentation to log when they decide to skip a
pass. This provides extra helpful info for optnone functions and also
will help with opt-bisect.
Have OptNoneInstrumentation print when it skips due to seeing optnone.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90545
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.
Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90437
GVN Load PRE can split the backedge causing breaking the loop structure where the latch
contains the conditional branch with for example induction variable.
Different optimizations expect this form of the loop, so it is better to preserve it for some time.
This CL adds an option to control an ability to split backedge.
Default value is true so technically it is NFC and current behavior is not changed.
Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn
Reviewed By: mkazasntsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89854
`-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D90149
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.
The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.
This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89026
It was already disabled under -Oz in
buildFunctionSimplificationPipeline(), but not in
buildModuleOptimizationPipeline()/addPGOInstrPasses().
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D89927