2014-04-21 16:08:50 +08:00
|
|
|
//===- PassRegistry.def - Registry of passes --------------------*- C++ -*-===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file is used as the registry of passes that are part of the core LLVM
|
|
|
|
// libraries. This file describes both transformation passes and analyses
|
|
|
|
// Analyses are registered while transformation passes have names registered
|
|
|
|
// that can be used when providing a textual pass pipeline.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
// NOTE: NO INCLUDE GUARD DESIRED!
|
|
|
|
|
2014-04-21 16:20:10 +08:00
|
|
|
#ifndef MODULE_ANALYSIS
|
|
|
|
#define MODULE_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-03-10 19:24:11 +08:00
|
|
|
MODULE_ANALYSIS("callgraph", CallGraphAnalysis())
|
2014-04-21 16:20:10 +08:00
|
|
|
MODULE_ANALYSIS("lcg", LazyCallGraphAnalysis())
|
2016-08-12 21:53:02 +08:00
|
|
|
MODULE_ANALYSIS("module-summary", ModuleSummaryIndexAnalysis())
|
2015-01-06 10:50:06 +08:00
|
|
|
MODULE_ANALYSIS("no-op-module", NoOpModuleAnalysis())
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_ANALYSIS("profile-summary", ProfileSummaryAnalysis())
|
2018-11-27 07:05:48 +08:00
|
|
|
MODULE_ANALYSIS("stack-safety", StackSafetyGlobalAnalysis())
|
2015-01-15 19:39:46 +08:00
|
|
|
MODULE_ANALYSIS("targetlibinfo", TargetLibraryAnalysis())
|
2016-05-10 03:57:29 +08:00
|
|
|
MODULE_ANALYSIS("verify", VerifierAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
MODULE_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2016-03-11 17:15:11 +08:00
|
|
|
|
|
|
|
#ifndef MODULE_ALIAS_ANALYSIS
|
2016-07-06 08:26:41 +08:00
|
|
|
#define MODULE_ALIAS_ANALYSIS(NAME, CREATE_PASS) \
|
2016-03-11 17:15:11 +08:00
|
|
|
MODULE_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
MODULE_ALIAS_ANALYSIS("globals-aa", GlobalsAA())
|
|
|
|
#undef MODULE_ALIAS_ANALYSIS
|
2014-04-21 16:20:10 +08:00
|
|
|
#undef MODULE_ANALYSIS
|
|
|
|
|
2014-04-21 16:08:50 +08:00
|
|
|
#ifndef MODULE_PASS
|
|
|
|
#define MODULE_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
[PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
2016-08-17 10:56:20 +08:00
|
|
|
MODULE_PASS("always-inline", AlwaysInlinerPass())
|
2017-10-25 21:40:08 +08:00
|
|
|
MODULE_PASS("called-value-propagation", CalledValuePropagationPass())
|
2018-07-16 08:28:24 +08:00
|
|
|
MODULE_PASS("cg-profile", CGProfilePass())
|
2016-05-05 08:51:09 +08:00
|
|
|
MODULE_PASS("constmerge", ConstantMergePass())
|
2016-07-09 11:25:35 +08:00
|
|
|
MODULE_PASS("cross-dso-cfi", CrossDSOCFIPass())
|
2016-06-12 17:16:39 +08:00
|
|
|
MODULE_PASS("deadargelim", DeadArgumentEliminationPass())
|
2016-05-05 10:37:32 +08:00
|
|
|
MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
|
2015-12-27 16:13:45 +08:00
|
|
|
MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
|
2016-07-19 05:22:24 +08:00
|
|
|
MODULE_PASS("function-import", FunctionImportPass())
|
2016-05-04 03:39:15 +08:00
|
|
|
MODULE_PASS("globaldce", GlobalDCEPass())
|
2016-04-26 08:28:01 +08:00
|
|
|
MODULE_PASS("globalopt", GlobalOptPass())
|
2016-11-21 08:28:23 +08:00
|
|
|
MODULE_PASS("globalsplit", GlobalSplitPass())
|
2018-10-03 13:55:20 +08:00
|
|
|
MODULE_PASS("hotcoldsplit", HotColdSplittingPass())
|
2015-12-27 16:41:34 +08:00
|
|
|
MODULE_PASS("inferattrs", InferFunctionAttrsPass())
|
2016-06-05 13:12:23 +08:00
|
|
|
MODULE_PASS("insert-gcov-profiling", GCOVProfilerPass())
|
2016-04-19 01:47:38 +08:00
|
|
|
MODULE_PASS("instrprof", InstrProfiling())
|
2016-06-05 13:15:45 +08:00
|
|
|
MODULE_PASS("internalize", InternalizePass())
|
2015-01-06 17:06:35 +08:00
|
|
|
MODULE_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-05-06 05:05:36 +08:00
|
|
|
MODULE_PASS("ipsccp", IPSCCPPass())
|
2018-07-19 22:51:32 +08:00
|
|
|
MODULE_PASS("lowertypetests", LowerTypeTestsPass(nullptr, nullptr))
|
2016-09-17 01:18:16 +08:00
|
|
|
MODULE_PASS("name-anon-globals", NameAnonGlobalPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
MODULE_PASS("no-op-module", NoOpModulePass())
|
2016-06-28 00:50:18 +08:00
|
|
|
MODULE_PASS("partial-inliner", PartialInlinerPass())
|
2016-05-17 00:31:07 +08:00
|
|
|
MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
|
2016-05-06 13:49:19 +08:00
|
|
|
MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
|
2016-05-11 05:59:52 +08:00
|
|
|
MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
|
2016-06-25 04:13:42 +08:00
|
|
|
MODULE_PASS("pre-isel-intrinsic-lowering", PreISelIntrinsicLoweringPass())
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_PASS("print-profile-summary", ProfileSummaryPrinterPass(dbgs()))
|
2016-06-04 05:14:26 +08:00
|
|
|
MODULE_PASS("print-callgraph", CallGraphPrinterPass(dbgs()))
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_PASS("print", PrintModulePass(dbgs()))
|
2016-03-10 19:24:06 +08:00
|
|
|
MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))
|
2016-06-18 17:17:32 +08:00
|
|
|
MODULE_PASS("print-lcg-dot", LazyCallGraphDOTPrinterPass(dbgs()))
|
2018-11-27 07:05:48 +08:00
|
|
|
MODULE_PASS("print-stack-safety", StackSafetyGlobalPrinterPass(dbgs()))
|
2017-12-15 17:32:11 +08:00
|
|
|
MODULE_PASS("rewrite-statepoints-for-gc", RewriteStatepointsForGC())
|
2016-07-26 04:52:00 +08:00
|
|
|
MODULE_PASS("rewrite-symbols", RewriteSymbolPass())
|
[PM] Port ReversePostOrderFunctionAttrs to the new PM
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
2016-06-12 15:48:51 +08:00
|
|
|
MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())
|
2016-05-28 07:20:16 +08:00
|
|
|
MODULE_PASS("sample-profile", SampleProfileLoaderPass())
|
2015-10-31 07:28:12 +08:00
|
|
|
MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())
|
2018-01-10 03:39:35 +08:00
|
|
|
MODULE_PASS("synthetic-counts-propagation", SyntheticCountsPropagation())
|
2018-07-19 22:51:32 +08:00
|
|
|
MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass(nullptr, nullptr))
|
2015-01-05 08:08:53 +08:00
|
|
|
MODULE_PASS("verify", VerifierPass())
|
2014-04-21 16:08:50 +08:00
|
|
|
#undef MODULE_PASS
|
|
|
|
|
2014-04-21 19:12:00 +08:00
|
|
|
#ifndef CGSCC_ANALYSIS
|
|
|
|
#define CGSCC_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2015-01-06 10:50:06 +08:00
|
|
|
CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())
|
[PM] Support invalidation of inner analysis managers from a pass over the outer IR unit.
Summary:
This never really got implemented, and was very hard to test before
a lot of the refactoring changes to make things more robust. But now we
can test it thoroughly and cleanly, especially at the CGSCC level.
The core idea is that when an inner analysis manager proxy receives the
invalidation event for the outer IR unit, it needs to walk the inner IR
units and propagate it to the inner analysis manager for each of those
units. For example, each function in the SCC needs to get an
invalidation event when the SCC gets one.
The function / module interaction is somewhat boring here. This really
becomes interesting in the face of analysis-backed IR units. This patch
effectively handles all of the CGSCC layer's needs -- both invalidating
SCC analysis and invalidating function analysis when an SCC gets
invalidated.
However, this second aspect doesn't really handle the
LoopAnalysisManager well at this point. That one will need some change
of design in order to fully integrate, because unlike the call graph,
the entire function behind a LoopAnalysis's results can vanish out from
under us, and we won't even have a cached API to access. I'd like to try
to separate solving the loop problems into a subsequent patch though in
order to keep this more focused so I've adapted them to the API and
updated the tests that immediately fail, but I've not added the level of
testing and validation at that layer that I have at the CGSCC layer.
An important aspect of this change is that the proxy for the
FunctionAnalysisManager at the SCC pass layer doesn't work like the
other proxies for an inner IR unit as it doesn't directly manage the
FunctionAnalysisManager and invalidation or clearing of it. This would
create an ever worsening problem of dual ownership of this
responsibility, split between the module-level FAM proxy and this
SCC-level FAM proxy. Instead, this patch changes the SCC-level FAM proxy
to work in terms of the module-level proxy and defer to it to handle
much of the updates. It only does SCC-specific invalidation. This will
become more important in subsequent patches that support more complex
invalidaiton scenarios.
Reviewers: jlebar
Subscribers: mehdi_amini, mcrosier, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D27197
llvm-svn: 289317
2016-12-10 14:34:44 +08:00
|
|
|
CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
|
2018-09-21 01:08:45 +08:00
|
|
|
CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2014-04-21 19:12:00 +08:00
|
|
|
#undef CGSCC_ANALYSIS
|
|
|
|
|
|
|
|
#ifndef CGSCC_PASS
|
|
|
|
#define CGSCC_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2017-02-10 07:46:27 +08:00
|
|
|
CGSCC_PASS("argpromotion", ArgumentPromotionPass())
|
2015-01-06 17:06:35 +08:00
|
|
|
CGSCC_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-02-18 19:03:11 +08:00
|
|
|
CGSCC_PASS("function-attrs", PostOrderFunctionAttrsPass())
|
[PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
2016-12-20 11:15:32 +08:00
|
|
|
CGSCC_PASS("inline", InlinerPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
CGSCC_PASS("no-op-cgscc", NoOpCGSCCPass())
|
2014-04-21 19:12:00 +08:00
|
|
|
#undef CGSCC_PASS
|
|
|
|
|
2014-04-21 16:20:10 +08:00
|
|
|
#ifndef FUNCTION_ANALYSIS
|
|
|
|
#define FUNCTION_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-02-14 07:32:00 +08:00
|
|
|
FUNCTION_ANALYSIS("aa", AAManager())
|
2016-12-19 16:22:17 +08:00
|
|
|
FUNCTION_ANALYSIS("assumptions", AssumptionAnalysis())
|
2016-05-06 05:13:27 +08:00
|
|
|
FUNCTION_ANALYSIS("block-freq", BlockFrequencyAnalysis())
|
2016-05-05 10:59:57 +08:00
|
|
|
FUNCTION_ANALYSIS("branch-prob", BranchProbabilityAnalysis())
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_ANALYSIS("domtree", DominatorTreeAnalysis())
|
2016-02-26 01:54:07 +08:00
|
|
|
FUNCTION_ANALYSIS("postdomtree", PostDominatorTreeAnalysis())
|
2016-04-19 07:55:01 +08:00
|
|
|
FUNCTION_ANALYSIS("demanded-bits", DemandedBitsAnalysis())
|
2016-02-26 01:54:15 +08:00
|
|
|
FUNCTION_ANALYSIS("domfrontier", DominanceFrontierAnalysis())
|
2015-01-20 18:58:50 +08:00
|
|
|
FUNCTION_ANALYSIS("loops", LoopAnalysis())
|
2016-06-14 06:01:25 +08:00
|
|
|
FUNCTION_ANALYSIS("lazy-value-info", LazyValueAnalysis())
|
2016-05-13 06:19:39 +08:00
|
|
|
FUNCTION_ANALYSIS("da", DependenceAnalysis())
|
2016-03-10 08:55:30 +08:00
|
|
|
FUNCTION_ANALYSIS("memdep", MemoryDependenceAnalysis())
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_ANALYSIS("memoryssa", MemorySSAAnalysis())
|
2018-06-28 22:13:06 +08:00
|
|
|
FUNCTION_ANALYSIS("phi-values", PhiValuesAnalysis())
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_ANALYSIS("regions", RegionInfoAnalysis())
|
2015-01-06 10:50:06 +08:00
|
|
|
FUNCTION_ANALYSIS("no-op-function", NoOpFunctionAnalysis())
|
2016-07-19 00:29:21 +08:00
|
|
|
FUNCTION_ANALYSIS("opt-remark-emit", OptimizationRemarkEmitterAnalysis())
|
[PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
2015-08-17 10:08:17 +08:00
|
|
|
FUNCTION_ANALYSIS("scalar-evolution", ScalarEvolutionAnalysis())
|
2018-11-27 05:57:47 +08:00
|
|
|
FUNCTION_ANALYSIS("stack-safety-local", StackSafetyAnalysis())
|
[PM] Rework how the TargetLibraryInfo pass integrates with the new pass
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
2015-01-24 10:06:09 +08:00
|
|
|
FUNCTION_ANALYSIS("targetlibinfo", TargetLibraryAnalysis())
|
2015-02-01 18:11:22 +08:00
|
|
|
FUNCTION_ANALYSIS("targetir",
|
|
|
|
TM ? TM->getTargetIRAnalysis() : TargetIRAnalysis())
|
2016-05-10 03:57:29 +08:00
|
|
|
FUNCTION_ANALYSIS("verify", VerifierAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
FUNCTION_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2016-02-18 17:45:17 +08:00
|
|
|
|
|
|
|
#ifndef FUNCTION_ALIAS_ANALYSIS
|
|
|
|
#define FUNCTION_ALIAS_ANALYSIS(NAME, CREATE_PASS) \
|
|
|
|
FUNCTION_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
FUNCTION_ALIAS_ANALYSIS("basic-aa", BasicAA())
|
2016-07-06 08:26:41 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("cfl-anders-aa", CFLAndersAA())
|
|
|
|
FUNCTION_ALIAS_ANALYSIS("cfl-steens-aa", CFLSteensAA())
|
2016-02-20 12:01:45 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("scev-aa", SCEVAA())
|
2016-02-20 12:03:06 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("scoped-noalias-aa", ScopedNoAliasAA())
|
2016-02-20 12:04:52 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("type-based-aa", TypeBasedAA())
|
2016-02-18 17:45:17 +08:00
|
|
|
#undef FUNCTION_ALIAS_ANALYSIS
|
2014-04-21 16:20:10 +08:00
|
|
|
#undef FUNCTION_ANALYSIS
|
|
|
|
|
2014-04-21 16:08:50 +08:00
|
|
|
#ifndef FUNCTION_PASS
|
|
|
|
#define FUNCTION_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-02-20 11:46:03 +08:00
|
|
|
FUNCTION_PASS("aa-eval", AAEvaluator())
|
2015-10-31 07:13:18 +08:00
|
|
|
FUNCTION_PASS("adce", ADCEPass())
|
2016-06-16 05:51:30 +08:00
|
|
|
FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())
|
2018-01-25 20:06:32 +08:00
|
|
|
FUNCTION_PASS("aggressive-instcombine", AggressiveInstCombinePass())
|
2016-06-15 14:18:01 +08:00
|
|
|
FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())
|
2016-05-25 09:57:04 +08:00
|
|
|
FUNCTION_PASS("bdce", BDCEPass())
|
2017-11-14 09:30:04 +08:00
|
|
|
FUNCTION_PASS("bounds-checking", BoundsCheckingPass())
|
2016-07-23 02:04:25 +08:00
|
|
|
FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())
|
Recommit r317351 : Add CallSiteSplitting pass
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
2017-11-04 04:41:16 +08:00
|
|
|
FUNCTION_PASS("callsite-splitting", CallSiteSplittingPass())
|
2016-07-02 08:16:47 +08:00
|
|
|
FUNCTION_PASS("consthoist", ConstantHoistingPass())
|
2018-09-05 01:19:13 +08:00
|
|
|
FUNCTION_PASS("chr", ControlHeightReductionPass())
|
2016-07-07 07:26:29 +08:00
|
|
|
FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
|
2016-04-23 03:40:41 +08:00
|
|
|
FUNCTION_PASS("dce", DCEPass())
|
2017-09-09 21:38:18 +08:00
|
|
|
FUNCTION_PASS("div-rem-pairs", DivRemPairsPass())
|
2016-05-18 05:38:13 +08:00
|
|
|
FUNCTION_PASS("dse", DSEPass())
|
2018-06-30 01:48:58 +08:00
|
|
|
FUNCTION_PASS("dot-cfg", CFGPrinterPass())
|
|
|
|
FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())
|
2016-09-01 03:24:10 +08:00
|
|
|
FUNCTION_PASS("early-cse", EarlyCSEPass(/*UseMemorySSA=*/false))
|
|
|
|
FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/*UseMemorySSA=*/true))
|
2017-11-15 05:09:45 +08:00
|
|
|
FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass(/*PostInlining=*/false))
|
|
|
|
FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass(/*PostInlining=*/true))
|
2016-07-15 21:45:20 +08:00
|
|
|
FUNCTION_PASS("gvn-hoist", GVNHoistPass())
|
2015-01-24 12:19:17 +08:00
|
|
|
FUNCTION_PASS("instcombine", InstCombinePass())
|
2018-06-30 07:36:03 +08:00
|
|
|
FUNCTION_PASS("instsimplify", InstSimplifyPass())
|
2015-01-06 17:06:35 +08:00
|
|
|
FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-06-25 07:32:02 +08:00
|
|
|
FUNCTION_PASS("float2int", Float2IntPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
FUNCTION_PASS("no-op-function", NoOpFunctionPass())
|
Conditionally eliminate library calls where the result value is not used
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
sqrt(val);
is transformed to
if (val < 0)
sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.
These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.
Reviewers: hfinkel, mehdi_amini, davidxl
Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
Differential Revision: https://reviews.llvm.org/D24414
llvm-svn: 284542
2016-10-19 05:36:27 +08:00
|
|
|
FUNCTION_PASS("libcalls-shrinkwrap", LibCallsShrinkWrapPass())
|
2016-05-14 06:52:35 +08:00
|
|
|
FUNCTION_PASS("loweratomic", LowerAtomicPass())
|
2015-01-24 19:13:02 +08:00
|
|
|
FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
|
2016-07-29 06:08:41 +08:00
|
|
|
FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass())
|
2016-05-19 06:55:34 +08:00
|
|
|
FUNCTION_PASS("guard-widening", GuardWideningPass())
|
2016-03-11 16:50:55 +08:00
|
|
|
FUNCTION_PASS("gvn", GVN())
|
2016-07-09 11:03:01 +08:00
|
|
|
FUNCTION_PASS("loop-simplify", LoopSimplifyPass())
|
2017-01-20 16:42:19 +08:00
|
|
|
FUNCTION_PASS("loop-sink", LoopSinkPass())
|
2016-08-13 01:28:27 +08:00
|
|
|
FUNCTION_PASS("lowerinvoke", LowerInvokePass())
|
2016-06-14 11:22:22 +08:00
|
|
|
FUNCTION_PASS("mem2reg", PromotePass())
|
2016-06-14 10:44:55 +08:00
|
|
|
FUNCTION_PASS("memcpyopt", MemCpyOptPass())
|
2016-06-18 03:10:09 +08:00
|
|
|
FUNCTION_PASS("mldst-motion", MergedLoadStoreMotionPass())
|
2016-07-22 06:28:52 +08:00
|
|
|
FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
|
2016-12-23 00:35:02 +08:00
|
|
|
FUNCTION_PASS("newgvn", NewGVNPass())
|
2016-06-14 08:51:09 +08:00
|
|
|
FUNCTION_PASS("jump-threading", JumpThreadingPass())
|
2016-05-26 07:38:53 +08:00
|
|
|
FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())
|
2016-06-10 03:44:46 +08:00
|
|
|
FUNCTION_PASS("lcssa", LCSSAPass())
|
2016-08-13 12:11:27 +08:00
|
|
|
FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())
|
2017-01-27 09:32:26 +08:00
|
|
|
FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())
|
2016-07-19 00:29:27 +08:00
|
|
|
FUNCTION_PASS("loop-distribute", LoopDistributePass())
|
2016-07-10 06:56:50 +08:00
|
|
|
FUNCTION_PASS("loop-vectorize", LoopVectorizePass())
|
2017-04-05 00:42:20 +08:00
|
|
|
FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
|
2014-04-21 16:08:50 +08:00
|
|
|
FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
|
2016-12-19 16:22:17 +08:00
|
|
|
FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
|
2016-05-06 05:13:27 +08:00
|
|
|
FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))
|
2016-05-05 10:59:57 +08:00
|
|
|
FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))
|
2016-02-26 01:54:07 +08:00
|
|
|
FUNCTION_PASS("print<postdomtree>", PostDominatorTreePrinterPass(dbgs()))
|
2016-04-19 07:55:01 +08:00
|
|
|
FUNCTION_PASS("print<demanded-bits>", DemandedBitsPrinterPass(dbgs()))
|
2016-02-26 01:54:15 +08:00
|
|
|
FUNCTION_PASS("print<domfrontier>", DominanceFrontierPrinterPass(dbgs()))
|
2015-01-20 18:58:50 +08:00
|
|
|
FUNCTION_PASS("print<loops>", LoopPrinterPass(dbgs()))
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_PASS("print<memoryssa>", MemorySSAPrinterPass(dbgs()))
|
2018-06-28 22:13:06 +08:00
|
|
|
FUNCTION_PASS("print<phi-values>", PhiValuesPrinterPass(dbgs()))
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_PASS("print<regions>", RegionInfoPrinterPass(dbgs()))
|
[PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
2015-08-17 10:08:17 +08:00
|
|
|
FUNCTION_PASS("print<scalar-evolution>", ScalarEvolutionPrinterPass(dbgs()))
|
2018-11-27 05:57:47 +08:00
|
|
|
FUNCTION_PASS("print<stack-safety-local>", StackSafetyPrinterPass(dbgs()))
|
2016-04-27 07:39:29 +08:00
|
|
|
FUNCTION_PASS("reassociate", ReassociatePass())
|
2018-11-21 22:00:17 +08:00
|
|
|
FUNCTION_PASS("scalarizer", ScalarizerPass())
|
2016-05-18 23:18:25 +08:00
|
|
|
FUNCTION_PASS("sccp", SCCPPass())
|
2015-02-01 19:34:21 +08:00
|
|
|
FUNCTION_PASS("simplify-cfg", SimplifyCFGPass())
|
2016-04-23 03:54:10 +08:00
|
|
|
FUNCTION_PASS("sink", SinkingPass())
|
2016-06-15 16:43:40 +08:00
|
|
|
FUNCTION_PASS("slp-vectorizer", SLPVectorizerPass())
|
2016-08-02 05:48:33 +08:00
|
|
|
FUNCTION_PASS("speculative-execution", SpeculativeExecutionPass())
|
Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand
As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.
This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.
The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.
There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.
One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.
I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).
I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.
Differential revision: https://reviews.llvm.org/D37467
llvm-svn: 319164
2017-11-28 19:32:31 +08:00
|
|
|
FUNCTION_PASS("spec-phis", SpeculateAroundPHIsPass())
|
2015-09-12 17:09:14 +08:00
|
|
|
FUNCTION_PASS("sroa", SROA())
|
2016-07-07 07:48:41 +08:00
|
|
|
FUNCTION_PASS("tailcallelim", TailCallElimPass())
|
2016-07-08 11:32:49 +08:00
|
|
|
FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass())
|
2017-08-03 04:35:29 +08:00
|
|
|
FUNCTION_PASS("unroll", LoopUnrollPass())
|
2018-10-31 22:33:14 +08:00
|
|
|
FUNCTION_PASS("unroll<peeling;no-runtime>",LoopUnrollPass(LoopUnrollOptions().setPeeling(true).setRuntime(false)))
|
2015-01-05 08:08:53 +08:00
|
|
|
FUNCTION_PASS("verify", VerifierPass())
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())
|
2016-07-20 07:54:23 +08:00
|
|
|
FUNCTION_PASS("verify<loops>", LoopVerifierPass())
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())
|
2018-06-30 01:48:58 +08:00
|
|
|
FUNCTION_PASS("view-cfg", CFGViewerPass())
|
|
|
|
FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())
|
2014-04-21 16:08:50 +08:00
|
|
|
#undef FUNCTION_PASS
|
2016-02-25 15:23:08 +08:00
|
|
|
|
|
|
|
#ifndef LOOP_ANALYSIS
|
|
|
|
#define LOOP_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
|
2016-07-09 05:21:44 +08:00
|
|
|
LOOP_ANALYSIS("access-info", LoopAccessAnalysis())
|
2016-07-17 06:51:33 +08:00
|
|
|
LOOP_ANALYSIS("ivusers", IVUsersAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2016-02-25 15:23:08 +08:00
|
|
|
#undef LOOP_ANALYSIS
|
|
|
|
|
|
|
|
#ifndef LOOP_PASS
|
|
|
|
#define LOOP_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-07-13 06:42:24 +08:00
|
|
|
LOOP_PASS("licm", LICMPass())
|
2016-07-13 02:45:51 +08:00
|
|
|
LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
|
2018-05-25 09:32:36 +08:00
|
|
|
LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
|
2016-05-04 06:02:31 +08:00
|
|
|
LOOP_PASS("rotate", LoopRotatePass())
|
2016-02-25 15:23:08 +08:00
|
|
|
LOOP_PASS("no-op-loop", NoOpLoopPass())
|
|
|
|
LOOP_PASS("print", PrintLoopPass(dbgs()))
|
2016-07-15 02:28:29 +08:00
|
|
|
LOOP_PASS("loop-deletion", LoopDeletionPass())
|
2016-05-04 05:47:32 +08:00
|
|
|
LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())
|
2016-07-19 05:41:50 +08:00
|
|
|
LOOP_PASS("strength-reduce", LoopStrengthReducePass())
|
2016-06-06 02:01:19 +08:00
|
|
|
LOOP_PASS("indvars", IndVarSimplifyPass())
|
2018-03-15 19:01:19 +08:00
|
|
|
LOOP_PASS("irce", IRCEPass())
|
2018-07-01 20:47:30 +08:00
|
|
|
LOOP_PASS("unroll-and-jam", LoopUnrollAndJamPass())
|
2017-08-03 04:35:29 +08:00
|
|
|
LOOP_PASS("unroll-full", LoopFullUnrollPass())
|
[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
2017-04-28 02:45:20 +08:00
|
|
|
LOOP_PASS("unswitch", SimpleLoopUnswitchPass())
|
2016-07-03 05:18:40 +08:00
|
|
|
LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
|
2016-07-17 06:51:33 +08:00
|
|
|
LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))
|
2017-01-26 00:00:44 +08:00
|
|
|
LOOP_PASS("loop-predication", LoopPredicationPass())
|
2016-02-25 15:23:08 +08:00
|
|
|
#undef LOOP_PASS
|