2014-04-21 16:08:50 +08:00
|
|
|
//===- PassRegistry.def - Registry of passes --------------------*- C++ -*-===//
|
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2014-04-21 16:08:50 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file is used as the registry of passes that are part of the core LLVM
|
|
|
|
// libraries. This file describes both transformation passes and analyses
|
|
|
|
// Analyses are registered while transformation passes have names registered
|
|
|
|
// that can be used when providing a textual pass pipeline.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
// NOTE: NO INCLUDE GUARD DESIRED!
|
|
|
|
|
2014-04-21 16:20:10 +08:00
|
|
|
#ifndef MODULE_ANALYSIS
|
|
|
|
#define MODULE_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-03-10 19:24:11 +08:00
|
|
|
MODULE_ANALYSIS("callgraph", CallGraphAnalysis())
|
2014-04-21 16:20:10 +08:00
|
|
|
MODULE_ANALYSIS("lcg", LazyCallGraphAnalysis())
|
2016-08-12 21:53:02 +08:00
|
|
|
MODULE_ANALYSIS("module-summary", ModuleSummaryIndexAnalysis())
|
2015-01-06 10:50:06 +08:00
|
|
|
MODULE_ANALYSIS("no-op-module", NoOpModuleAnalysis())
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_ANALYSIS("profile-summary", ProfileSummaryAnalysis())
|
2018-11-27 07:05:48 +08:00
|
|
|
MODULE_ANALYSIS("stack-safety", StackSafetyGlobalAnalysis())
|
2016-05-10 03:57:29 +08:00
|
|
|
MODULE_ANALYSIS("verify", VerifierAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
MODULE_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2019-02-14 06:22:48 +08:00
|
|
|
MODULE_ANALYSIS("asan-globals-md", ASanGlobalsMetadataAnalysis())
|
2016-03-11 17:15:11 +08:00
|
|
|
|
|
|
|
#ifndef MODULE_ALIAS_ANALYSIS
|
2016-07-06 08:26:41 +08:00
|
|
|
#define MODULE_ALIAS_ANALYSIS(NAME, CREATE_PASS) \
|
2016-03-11 17:15:11 +08:00
|
|
|
MODULE_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
MODULE_ALIAS_ANALYSIS("globals-aa", GlobalsAA())
|
|
|
|
#undef MODULE_ALIAS_ANALYSIS
|
2014-04-21 16:20:10 +08:00
|
|
|
#undef MODULE_ANALYSIS
|
|
|
|
|
2014-04-21 16:08:50 +08:00
|
|
|
#ifndef MODULE_PASS
|
|
|
|
#define MODULE_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
[PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
2016-08-17 10:56:20 +08:00
|
|
|
MODULE_PASS("always-inline", AlwaysInlinerPass())
|
[Attributor] Pass infrastructure and fixpoint framework
NOTE: Note that no attributes are derived yet. This patch will not go in
alone but only with others that derive attributes. The framework is
split for review purposes.
This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.
In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.
Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.
Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames
Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59918
llvm-svn: 362578
2019-06-05 11:02:24 +08:00
|
|
|
MODULE_PASS("attributor", AttributorPass())
|
2017-10-25 21:40:08 +08:00
|
|
|
MODULE_PASS("called-value-propagation", CalledValuePropagationPass())
|
[ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
2019-01-05 03:04:54 +08:00
|
|
|
MODULE_PASS("canonicalize-aliases", CanonicalizeAliasesPass())
|
2018-07-16 08:28:24 +08:00
|
|
|
MODULE_PASS("cg-profile", CGProfilePass())
|
2016-05-05 08:51:09 +08:00
|
|
|
MODULE_PASS("constmerge", ConstantMergePass())
|
2016-07-09 11:25:35 +08:00
|
|
|
MODULE_PASS("cross-dso-cfi", CrossDSOCFIPass())
|
2016-06-12 17:16:39 +08:00
|
|
|
MODULE_PASS("deadargelim", DeadArgumentEliminationPass())
|
2016-05-05 10:37:32 +08:00
|
|
|
MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
|
2015-12-27 16:13:45 +08:00
|
|
|
MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
|
2016-07-19 05:22:24 +08:00
|
|
|
MODULE_PASS("function-import", FunctionImportPass())
|
2016-05-04 03:39:15 +08:00
|
|
|
MODULE_PASS("globaldce", GlobalDCEPass())
|
2016-04-26 08:28:01 +08:00
|
|
|
MODULE_PASS("globalopt", GlobalOptPass())
|
2016-11-21 08:28:23 +08:00
|
|
|
MODULE_PASS("globalsplit", GlobalSplitPass())
|
2018-10-03 13:55:20 +08:00
|
|
|
MODULE_PASS("hotcoldsplit", HotColdSplittingPass())
|
2019-07-18 05:45:19 +08:00
|
|
|
MODULE_PASS("hwasan", HWAddressSanitizerPass(false, false))
|
|
|
|
MODULE_PASS("khwasan", HWAddressSanitizerPass(true, true))
|
2015-12-27 16:41:34 +08:00
|
|
|
MODULE_PASS("inferattrs", InferFunctionAttrsPass())
|
2016-06-05 13:12:23 +08:00
|
|
|
MODULE_PASS("insert-gcov-profiling", GCOVProfilerPass())
|
2019-03-01 04:13:38 +08:00
|
|
|
MODULE_PASS("instrorderfile", InstrOrderFilePass())
|
2016-04-19 01:47:38 +08:00
|
|
|
MODULE_PASS("instrprof", InstrProfiling())
|
2016-06-05 13:15:45 +08:00
|
|
|
MODULE_PASS("internalize", InternalizePass())
|
2015-01-06 17:06:35 +08:00
|
|
|
MODULE_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-05-06 05:05:36 +08:00
|
|
|
MODULE_PASS("ipsccp", IPSCCPPass())
|
2018-07-19 22:51:32 +08:00
|
|
|
MODULE_PASS("lowertypetests", LowerTypeTestsPass(nullptr, nullptr))
|
2020-01-11 04:52:19 +08:00
|
|
|
MODULE_PASS("mergefunc", MergeFunctionsPass())
|
2016-09-17 01:18:16 +08:00
|
|
|
MODULE_PASS("name-anon-globals", NameAnonGlobalPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
MODULE_PASS("no-op-module", NoOpModulePass())
|
2016-06-28 00:50:18 +08:00
|
|
|
MODULE_PASS("partial-inliner", PartialInlinerPass())
|
2016-05-17 00:31:07 +08:00
|
|
|
MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
|
2016-05-06 13:49:19 +08:00
|
|
|
MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
|
2016-05-11 05:59:52 +08:00
|
|
|
MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
|
2016-06-25 04:13:42 +08:00
|
|
|
MODULE_PASS("pre-isel-intrinsic-lowering", PreISelIntrinsicLoweringPass())
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_PASS("print-profile-summary", ProfileSummaryPrinterPass(dbgs()))
|
2016-06-04 05:14:26 +08:00
|
|
|
MODULE_PASS("print-callgraph", CallGraphPrinterPass(dbgs()))
|
2016-06-04 06:54:26 +08:00
|
|
|
MODULE_PASS("print", PrintModulePass(dbgs()))
|
2016-03-10 19:24:06 +08:00
|
|
|
MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))
|
2016-06-18 17:17:32 +08:00
|
|
|
MODULE_PASS("print-lcg-dot", LazyCallGraphDOTPrinterPass(dbgs()))
|
2018-11-27 07:05:48 +08:00
|
|
|
MODULE_PASS("print-stack-safety", StackSafetyGlobalPrinterPass(dbgs()))
|
2017-12-15 17:32:11 +08:00
|
|
|
MODULE_PASS("rewrite-statepoints-for-gc", RewriteStatepointsForGC())
|
2016-07-26 04:52:00 +08:00
|
|
|
MODULE_PASS("rewrite-symbols", RewriteSymbolPass())
|
[PM] Port ReversePostOrderFunctionAttrs to the new PM
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
2016-06-12 15:48:51 +08:00
|
|
|
MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())
|
2016-05-28 07:20:16 +08:00
|
|
|
MODULE_PASS("sample-profile", SampleProfileLoaderPass())
|
2015-10-31 07:28:12 +08:00
|
|
|
MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())
|
2018-01-10 03:39:35 +08:00
|
|
|
MODULE_PASS("synthetic-counts-propagation", SyntheticCountsPropagation())
|
2018-07-19 22:51:32 +08:00
|
|
|
MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass(nullptr, nullptr))
|
2015-01-05 08:08:53 +08:00
|
|
|
MODULE_PASS("verify", VerifierPass())
|
2019-05-09 14:09:35 +08:00
|
|
|
MODULE_PASS("asan-module", ModuleAddressSanitizerPass(/*CompileKernel=*/false, false, true, false))
|
2019-10-11 16:47:03 +08:00
|
|
|
MODULE_PASS("msan-module", MemorySanitizerPass({}))
|
|
|
|
MODULE_PASS("tsan-module", ThreadSanitizerPass())
|
2019-05-09 14:09:35 +08:00
|
|
|
MODULE_PASS("kasan-module", ModuleAddressSanitizerPass(/*CompileKernel=*/true, false, true, false))
|
2019-07-26 04:53:15 +08:00
|
|
|
MODULE_PASS("sancov-module", ModuleSanitizerCoveragePass())
|
2019-07-10 02:49:29 +08:00
|
|
|
MODULE_PASS("poison-checking", PoisonCheckingPass())
|
2014-04-21 16:08:50 +08:00
|
|
|
#undef MODULE_PASS
|
|
|
|
|
2014-04-21 19:12:00 +08:00
|
|
|
#ifndef CGSCC_ANALYSIS
|
|
|
|
#define CGSCC_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2015-01-06 10:50:06 +08:00
|
|
|
CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())
|
[PM] Support invalidation of inner analysis managers from a pass over the outer IR unit.
Summary:
This never really got implemented, and was very hard to test before
a lot of the refactoring changes to make things more robust. But now we
can test it thoroughly and cleanly, especially at the CGSCC level.
The core idea is that when an inner analysis manager proxy receives the
invalidation event for the outer IR unit, it needs to walk the inner IR
units and propagate it to the inner analysis manager for each of those
units. For example, each function in the SCC needs to get an
invalidation event when the SCC gets one.
The function / module interaction is somewhat boring here. This really
becomes interesting in the face of analysis-backed IR units. This patch
effectively handles all of the CGSCC layer's needs -- both invalidating
SCC analysis and invalidating function analysis when an SCC gets
invalidated.
However, this second aspect doesn't really handle the
LoopAnalysisManager well at this point. That one will need some change
of design in order to fully integrate, because unlike the call graph,
the entire function behind a LoopAnalysis's results can vanish out from
under us, and we won't even have a cached API to access. I'd like to try
to separate solving the loop problems into a subsequent patch though in
order to keep this more focused so I've adapted them to the API and
updated the tests that immediately fail, but I've not added the level of
testing and validation at that layer that I have at the CGSCC layer.
An important aspect of this change is that the proxy for the
FunctionAnalysisManager at the SCC pass layer doesn't work like the
other proxies for an inner IR unit as it doesn't directly manage the
FunctionAnalysisManager and invalidation or clearing of it. This would
create an ever worsening problem of dual ownership of this
responsibility, split between the module-level FAM proxy and this
SCC-level FAM proxy. Instead, this patch changes the SCC-level FAM proxy
to work in terms of the module-level proxy and defer to it to handle
much of the updates. It only does SCC-specific invalidation. This will
become more important in subsequent patches that support more complex
invalidaiton scenarios.
Reviewers: jlebar
Subscribers: mehdi_amini, mcrosier, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D27197
llvm-svn: 289317
2016-12-10 14:34:44 +08:00
|
|
|
CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
|
2018-09-21 01:08:45 +08:00
|
|
|
CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2014-04-21 19:12:00 +08:00
|
|
|
#undef CGSCC_ANALYSIS
|
|
|
|
|
|
|
|
#ifndef CGSCC_PASS
|
|
|
|
#define CGSCC_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2017-02-10 07:46:27 +08:00
|
|
|
CGSCC_PASS("argpromotion", ArgumentPromotionPass())
|
2015-01-06 17:06:35 +08:00
|
|
|
CGSCC_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-02-18 19:03:11 +08:00
|
|
|
CGSCC_PASS("function-attrs", PostOrderFunctionAttrsPass())
|
[PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
2016-12-20 11:15:32 +08:00
|
|
|
CGSCC_PASS("inline", InlinerPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
CGSCC_PASS("no-op-cgscc", NoOpCGSCCPass())
|
2014-04-21 19:12:00 +08:00
|
|
|
#undef CGSCC_PASS
|
|
|
|
|
2014-04-21 16:20:10 +08:00
|
|
|
#ifndef FUNCTION_ANALYSIS
|
|
|
|
#define FUNCTION_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-02-14 07:32:00 +08:00
|
|
|
FUNCTION_ANALYSIS("aa", AAManager())
|
2016-12-19 16:22:17 +08:00
|
|
|
FUNCTION_ANALYSIS("assumptions", AssumptionAnalysis())
|
2016-05-06 05:13:27 +08:00
|
|
|
FUNCTION_ANALYSIS("block-freq", BlockFrequencyAnalysis())
|
2016-05-05 10:59:57 +08:00
|
|
|
FUNCTION_ANALYSIS("branch-prob", BranchProbabilityAnalysis())
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_ANALYSIS("domtree", DominatorTreeAnalysis())
|
2016-02-26 01:54:07 +08:00
|
|
|
FUNCTION_ANALYSIS("postdomtree", PostDominatorTreeAnalysis())
|
2016-04-19 07:55:01 +08:00
|
|
|
FUNCTION_ANALYSIS("demanded-bits", DemandedBitsAnalysis())
|
2016-02-26 01:54:15 +08:00
|
|
|
FUNCTION_ANALYSIS("domfrontier", DominanceFrontierAnalysis())
|
2015-01-20 18:58:50 +08:00
|
|
|
FUNCTION_ANALYSIS("loops", LoopAnalysis())
|
2016-06-14 06:01:25 +08:00
|
|
|
FUNCTION_ANALYSIS("lazy-value-info", LazyValueAnalysis())
|
2016-05-13 06:19:39 +08:00
|
|
|
FUNCTION_ANALYSIS("da", DependenceAnalysis())
|
2016-03-10 08:55:30 +08:00
|
|
|
FUNCTION_ANALYSIS("memdep", MemoryDependenceAnalysis())
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_ANALYSIS("memoryssa", MemorySSAAnalysis())
|
2018-06-28 22:13:06 +08:00
|
|
|
FUNCTION_ANALYSIS("phi-values", PhiValuesAnalysis())
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_ANALYSIS("regions", RegionInfoAnalysis())
|
2015-01-06 10:50:06 +08:00
|
|
|
FUNCTION_ANALYSIS("no-op-function", NoOpFunctionAnalysis())
|
2016-07-19 00:29:21 +08:00
|
|
|
FUNCTION_ANALYSIS("opt-remark-emit", OptimizationRemarkEmitterAnalysis())
|
[PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
2015-08-17 10:08:17 +08:00
|
|
|
FUNCTION_ANALYSIS("scalar-evolution", ScalarEvolutionAnalysis())
|
2018-11-27 05:57:47 +08:00
|
|
|
FUNCTION_ANALYSIS("stack-safety-local", StackSafetyAnalysis())
|
[PM] Rework how the TargetLibraryInfo pass integrates with the new pass
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
2015-01-24 10:06:09 +08:00
|
|
|
FUNCTION_ANALYSIS("targetlibinfo", TargetLibraryAnalysis())
|
2015-02-01 18:11:22 +08:00
|
|
|
FUNCTION_ANALYSIS("targetir",
|
|
|
|
TM ? TM->getTargetIRAnalysis() : TargetIRAnalysis())
|
2016-05-10 03:57:29 +08:00
|
|
|
FUNCTION_ANALYSIS("verify", VerifierAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
FUNCTION_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2016-02-18 17:45:17 +08:00
|
|
|
|
|
|
|
#ifndef FUNCTION_ALIAS_ANALYSIS
|
|
|
|
#define FUNCTION_ALIAS_ANALYSIS(NAME, CREATE_PASS) \
|
|
|
|
FUNCTION_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
FUNCTION_ALIAS_ANALYSIS("basic-aa", BasicAA())
|
2016-07-06 08:26:41 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("cfl-anders-aa", CFLAndersAA())
|
|
|
|
FUNCTION_ALIAS_ANALYSIS("cfl-steens-aa", CFLSteensAA())
|
2016-02-20 12:01:45 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("scev-aa", SCEVAA())
|
2016-02-20 12:03:06 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("scoped-noalias-aa", ScopedNoAliasAA())
|
2016-02-20 12:04:52 +08:00
|
|
|
FUNCTION_ALIAS_ANALYSIS("type-based-aa", TypeBasedAA())
|
2016-02-18 17:45:17 +08:00
|
|
|
#undef FUNCTION_ALIAS_ANALYSIS
|
2014-04-21 16:20:10 +08:00
|
|
|
#undef FUNCTION_ANALYSIS
|
|
|
|
|
2014-04-21 16:08:50 +08:00
|
|
|
#ifndef FUNCTION_PASS
|
|
|
|
#define FUNCTION_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
2016-02-20 11:46:03 +08:00
|
|
|
FUNCTION_PASS("aa-eval", AAEvaluator())
|
2015-10-31 07:13:18 +08:00
|
|
|
FUNCTION_PASS("adce", ADCEPass())
|
2016-06-16 05:51:30 +08:00
|
|
|
FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())
|
2018-01-25 20:06:32 +08:00
|
|
|
FUNCTION_PASS("aggressive-instcombine", AggressiveInstCombinePass())
|
2020-02-02 21:46:59 +08:00
|
|
|
FUNCTION_PASS("assume-builder", AssumeBuilderPass())
|
2016-06-15 14:18:01 +08:00
|
|
|
FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())
|
2016-05-25 09:57:04 +08:00
|
|
|
FUNCTION_PASS("bdce", BDCEPass())
|
2017-11-14 09:30:04 +08:00
|
|
|
FUNCTION_PASS("bounds-checking", BoundsCheckingPass())
|
2016-07-23 02:04:25 +08:00
|
|
|
FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())
|
Recommit r317351 : Add CallSiteSplitting pass
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
2017-11-04 04:41:16 +08:00
|
|
|
FUNCTION_PASS("callsite-splitting", CallSiteSplittingPass())
|
2016-07-02 08:16:47 +08:00
|
|
|
FUNCTION_PASS("consthoist", ConstantHoistingPass())
|
2018-09-05 01:19:13 +08:00
|
|
|
FUNCTION_PASS("chr", ControlHeightReductionPass())
|
2016-07-07 07:26:29 +08:00
|
|
|
FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
|
2016-04-23 03:40:41 +08:00
|
|
|
FUNCTION_PASS("dce", DCEPass())
|
2017-09-09 21:38:18 +08:00
|
|
|
FUNCTION_PASS("div-rem-pairs", DivRemPairsPass())
|
2016-05-18 05:38:13 +08:00
|
|
|
FUNCTION_PASS("dse", DSEPass())
|
2018-06-30 01:48:58 +08:00
|
|
|
FUNCTION_PASS("dot-cfg", CFGPrinterPass())
|
|
|
|
FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())
|
2016-09-01 03:24:10 +08:00
|
|
|
FUNCTION_PASS("early-cse", EarlyCSEPass(/*UseMemorySSA=*/false))
|
|
|
|
FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/*UseMemorySSA=*/true))
|
2017-11-15 05:09:45 +08:00
|
|
|
FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass(/*PostInlining=*/false))
|
Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.
We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:
- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
as regular throwing call and have no idea that the condition of guard may be used to prove
something. One well-known place which had caused us troubles in the past is explicit loop
iteration count calculation in SCEV. Another example is new loop unswitching which is not
aware of guards. Whenever a new pass appears, we potentially have this problem there.
Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.
This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:
%widenable_condition = call i1 @llvm.experimental.widenable.condition()
%new_condition = and i1 %cond, %widenable_condition
br i1 %new_condition, label %guarded, label %deopt
guarded:
; Guarded instructions
deopt:
call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]
The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).
For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".
This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.
The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.
Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207
llvm-svn: 348593
2018-12-07 22:39:46 +08:00
|
|
|
FUNCTION_PASS("make-guards-explicit", MakeGuardsExplicitPass())
|
2017-11-15 05:09:45 +08:00
|
|
|
FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass(/*PostInlining=*/true))
|
2016-07-15 21:45:20 +08:00
|
|
|
FUNCTION_PASS("gvn-hoist", GVNHoistPass())
|
2015-01-24 12:19:17 +08:00
|
|
|
FUNCTION_PASS("instcombine", InstCombinePass())
|
2018-06-30 07:36:03 +08:00
|
|
|
FUNCTION_PASS("instsimplify", InstSimplifyPass())
|
2015-01-06 17:06:35 +08:00
|
|
|
FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2020-01-28 05:33:34 +08:00
|
|
|
FUNCTION_PASS("irce", IRCEPass())
|
2016-06-25 07:32:02 +08:00
|
|
|
FUNCTION_PASS("float2int", Float2IntPass())
|
2015-01-06 10:37:55 +08:00
|
|
|
FUNCTION_PASS("no-op-function", NoOpFunctionPass())
|
Conditionally eliminate library calls where the result value is not used
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
sqrt(val);
is transformed to
if (val < 0)
sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.
These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.
Reviewers: hfinkel, mehdi_amini, davidxl
Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
Differential Revision: https://reviews.llvm.org/D24414
llvm-svn: 284542
2016-10-19 05:36:27 +08:00
|
|
|
FUNCTION_PASS("libcalls-shrinkwrap", LibCallsShrinkWrapPass())
|
2019-11-12 03:42:18 +08:00
|
|
|
FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())
|
2016-05-14 06:52:35 +08:00
|
|
|
FUNCTION_PASS("loweratomic", LowerAtomicPass())
|
2015-01-24 19:13:02 +08:00
|
|
|
FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
|
2016-07-29 06:08:41 +08:00
|
|
|
FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass())
|
2019-10-15 00:15:14 +08:00
|
|
|
FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass())
|
[Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.
Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.
For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.
This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
* Shape propagation to eliminate the embedding/splitting for each
intrinsic.
* Fused & tiled lowering of multiply and other operations.
* Optimization remarks highlighting matrix expressions and costs.
* Generate loops for operations on large matrixes.
* More general block processing for operation on large vectors,
exploiting shape information.
We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
(1) become unwieldy for larger matrixes (even for 16x16 matrixes,
the resulting shufflevector masks would be huge),
(2) risk instcombine making small changes, causing us to fail to
detect the transpose, preventing better lowerings
For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.
Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 23:27:28 +08:00
|
|
|
FUNCTION_PASS("lower-matrix-intrinsics", LowerMatrixIntrinsicsPass())
|
2019-01-31 17:10:17 +08:00
|
|
|
FUNCTION_PASS("lower-widenable-condition", LowerWidenableConditionPass())
|
2016-05-19 06:55:34 +08:00
|
|
|
FUNCTION_PASS("guard-widening", GuardWideningPass())
|
2018-12-07 16:23:37 +08:00
|
|
|
FUNCTION_PASS("load-store-vectorizer", LoadStoreVectorizerPass())
|
2016-07-09 11:03:01 +08:00
|
|
|
FUNCTION_PASS("loop-simplify", LoopSimplifyPass())
|
2017-01-20 16:42:19 +08:00
|
|
|
FUNCTION_PASS("loop-sink", LoopSinkPass())
|
2016-08-13 01:28:27 +08:00
|
|
|
FUNCTION_PASS("lowerinvoke", LowerInvokePass())
|
2016-06-14 11:22:22 +08:00
|
|
|
FUNCTION_PASS("mem2reg", PromotePass())
|
2016-06-14 10:44:55 +08:00
|
|
|
FUNCTION_PASS("memcpyopt", MemCpyOptPass())
|
2019-05-23 20:35:26 +08:00
|
|
|
FUNCTION_PASS("mergeicmps", MergeICmpsPass())
|
2016-07-22 06:28:52 +08:00
|
|
|
FUNCTION_PASS("nary-reassociate", NaryReassociatePass())
|
2016-12-23 00:35:02 +08:00
|
|
|
FUNCTION_PASS("newgvn", NewGVNPass())
|
2016-06-14 08:51:09 +08:00
|
|
|
FUNCTION_PASS("jump-threading", JumpThreadingPass())
|
2016-05-26 07:38:53 +08:00
|
|
|
FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass())
|
2016-06-10 03:44:46 +08:00
|
|
|
FUNCTION_PASS("lcssa", LCSSAPass())
|
2016-08-13 12:11:27 +08:00
|
|
|
FUNCTION_PASS("loop-data-prefetch", LoopDataPrefetchPass())
|
2017-01-27 09:32:26 +08:00
|
|
|
FUNCTION_PASS("loop-load-elim", LoopLoadEliminationPass())
|
2019-04-18 02:53:27 +08:00
|
|
|
FUNCTION_PASS("loop-fuse", LoopFusePass())
|
2016-07-19 00:29:27 +08:00
|
|
|
FUNCTION_PASS("loop-distribute", LoopDistributePass())
|
2017-04-05 00:42:20 +08:00
|
|
|
FUNCTION_PASS("pgo-memop-opt", PGOMemOPSizeOpt())
|
2014-04-21 16:08:50 +08:00
|
|
|
FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
|
2016-12-19 16:22:17 +08:00
|
|
|
FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
|
2016-05-06 05:13:27 +08:00
|
|
|
FUNCTION_PASS("print<block-freq>", BlockFrequencyPrinterPass(dbgs()))
|
2016-05-05 10:59:57 +08:00
|
|
|
FUNCTION_PASS("print<branch-prob>", BranchProbabilityPrinterPass(dbgs()))
|
2019-01-08 22:06:58 +08:00
|
|
|
FUNCTION_PASS("print<da>", DependenceAnalysisPrinterPass(dbgs()))
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_PASS("print<domtree>", DominatorTreePrinterPass(dbgs()))
|
2016-02-26 01:54:07 +08:00
|
|
|
FUNCTION_PASS("print<postdomtree>", PostDominatorTreePrinterPass(dbgs()))
|
2016-04-19 07:55:01 +08:00
|
|
|
FUNCTION_PASS("print<demanded-bits>", DemandedBitsPrinterPass(dbgs()))
|
2016-02-26 01:54:15 +08:00
|
|
|
FUNCTION_PASS("print<domfrontier>", DominanceFrontierPrinterPass(dbgs()))
|
2015-01-20 18:58:50 +08:00
|
|
|
FUNCTION_PASS("print<loops>", LoopPrinterPass(dbgs()))
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_PASS("print<memoryssa>", MemorySSAPrinterPass(dbgs()))
|
2018-06-28 22:13:06 +08:00
|
|
|
FUNCTION_PASS("print<phi-values>", PhiValuesPrinterPass(dbgs()))
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_PASS("print<regions>", RegionInfoPrinterPass(dbgs()))
|
[PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
2015-08-17 10:08:17 +08:00
|
|
|
FUNCTION_PASS("print<scalar-evolution>", ScalarEvolutionPrinterPass(dbgs()))
|
2018-11-27 05:57:47 +08:00
|
|
|
FUNCTION_PASS("print<stack-safety-local>", StackSafetyPrinterPass(dbgs()))
|
2016-04-27 07:39:29 +08:00
|
|
|
FUNCTION_PASS("reassociate", ReassociatePass())
|
2018-11-21 22:00:17 +08:00
|
|
|
FUNCTION_PASS("scalarizer", ScalarizerPass())
|
2016-05-18 23:18:25 +08:00
|
|
|
FUNCTION_PASS("sccp", SCCPPass())
|
2016-04-23 03:54:10 +08:00
|
|
|
FUNCTION_PASS("sink", SinkingPass())
|
2016-06-15 16:43:40 +08:00
|
|
|
FUNCTION_PASS("slp-vectorizer", SLPVectorizerPass())
|
2016-08-02 05:48:33 +08:00
|
|
|
FUNCTION_PASS("speculative-execution", SpeculativeExecutionPass())
|
Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand
As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.
This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.
The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.
There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.
One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.
I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).
I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.
Differential revision: https://reviews.llvm.org/D37467
llvm-svn: 319164
2017-11-28 19:32:31 +08:00
|
|
|
FUNCTION_PASS("spec-phis", SpeculateAroundPHIsPass())
|
2015-09-12 17:09:14 +08:00
|
|
|
FUNCTION_PASS("sroa", SROA())
|
2016-07-07 07:48:41 +08:00
|
|
|
FUNCTION_PASS("tailcallelim", TailCallElimPass())
|
2016-07-08 11:32:49 +08:00
|
|
|
FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass())
|
2020-01-10 00:15:53 +08:00
|
|
|
FUNCTION_PASS("unroll-and-jam", LoopUnrollAndJamPass())
|
2015-01-05 08:08:53 +08:00
|
|
|
FUNCTION_PASS("verify", VerifierPass())
|
2015-01-14 18:19:28 +08:00
|
|
|
FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())
|
2016-07-20 07:54:23 +08:00
|
|
|
FUNCTION_PASS("verify<loops>", LoopVerifierPass())
|
2016-06-02 05:30:40 +08:00
|
|
|
FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())
|
2016-02-26 01:54:25 +08:00
|
|
|
FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())
|
2019-03-31 18:15:39 +08:00
|
|
|
FUNCTION_PASS("verify<safepoint-ir>", SafepointIRVerifierPass())
|
2019-11-19 15:16:39 +08:00
|
|
|
FUNCTION_PASS("verify<scalar-evolution>", ScalarEvolutionVerifierPass())
|
2018-06-30 01:48:58 +08:00
|
|
|
FUNCTION_PASS("view-cfg", CFGViewerPass())
|
|
|
|
FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())
|
[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)
is the same as
#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Reviewed By: hfinkel, dmgreen
Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288
llvm-svn: 348944
2018-12-13 01:32:52 +08:00
|
|
|
FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
|
2019-02-14 06:22:48 +08:00
|
|
|
FUNCTION_PASS("asan", AddressSanitizerPass(false, false, false))
|
2019-05-09 14:09:35 +08:00
|
|
|
FUNCTION_PASS("kasan", AddressSanitizerPass(true, false, false))
|
2019-02-05 05:02:49 +08:00
|
|
|
FUNCTION_PASS("msan", MemorySanitizerPass({}))
|
2019-05-09 14:09:35 +08:00
|
|
|
FUNCTION_PASS("kmsan", MemorySanitizerPass({0, false, /*Kernel=*/true}))
|
2019-01-16 17:28:01 +08:00
|
|
|
FUNCTION_PASS("tsan", ThreadSanitizerPass())
|
2014-04-21 16:08:50 +08:00
|
|
|
#undef FUNCTION_PASS
|
2016-02-25 15:23:08 +08:00
|
|
|
|
2019-01-10 18:01:53 +08:00
|
|
|
#ifndef FUNCTION_PASS_WITH_PARAMS
|
|
|
|
#define FUNCTION_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)
|
|
|
|
#endif
|
|
|
|
FUNCTION_PASS_WITH_PARAMS("unroll",
|
2019-02-05 05:02:49 +08:00
|
|
|
[](LoopUnrollOptions Opts) {
|
|
|
|
return LoopUnrollPass(Opts);
|
|
|
|
},
|
|
|
|
parseLoopUnrollOptions)
|
|
|
|
FUNCTION_PASS_WITH_PARAMS("msan",
|
|
|
|
[](MemorySanitizerOptions Opts) {
|
|
|
|
return MemorySanitizerPass(Opts);
|
|
|
|
},
|
|
|
|
parseMSanPassOptions)
|
2019-04-15 16:57:53 +08:00
|
|
|
FUNCTION_PASS_WITH_PARAMS("simplify-cfg",
|
|
|
|
[](SimplifyCFGOptions Opts) {
|
|
|
|
return SimplifyCFGPass(Opts);
|
|
|
|
},
|
|
|
|
parseSimplifyCFGOptions)
|
2019-04-18 16:46:11 +08:00
|
|
|
FUNCTION_PASS_WITH_PARAMS("loop-vectorize",
|
|
|
|
[](LoopVectorizeOptions Opts) {
|
|
|
|
return LoopVectorizePass(Opts);
|
|
|
|
},
|
|
|
|
parseLoopVectorizeOptions)
|
[MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors
If we have:
bb5:
br i1 %arg3, label %bb6, label %bb7
bb6:
%tmp = getelementptr inbounds i32, i32* %arg1, i64 2
store i32 3, i32* %tmp, align 4
br label %bb9
bb7:
%tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2
store i32 3, i32* %tmp8, align 4
br label %bb9
bb9: ; preds = %bb4, %bb6, %bb7
...
We can't sink stores directly into bb9.
This patch creates new BB that is successor of %bb6 and %bb7
and sinks stores into that block.
SplitFooterBB is the parameter to the pass that controls
that behavior.
Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f
Differential: https://reviews.llvm.org/D66234
llvm-svn: 371089
2019-09-06 01:00:32 +08:00
|
|
|
FUNCTION_PASS_WITH_PARAMS("mldst-motion",
|
|
|
|
[](MergedLoadStoreMotionOptions Opts) {
|
|
|
|
return MergedLoadStoreMotionPass(Opts);
|
|
|
|
},
|
|
|
|
parseMergedLoadStoreMotionOptions)
|
2020-01-17 01:31:24 +08:00
|
|
|
FUNCTION_PASS_WITH_PARAMS("gvn",
|
|
|
|
[](GVNOptions Opts) {
|
|
|
|
return GVN(Opts);
|
|
|
|
},
|
|
|
|
parseGVNOptions)
|
2019-01-10 18:01:53 +08:00
|
|
|
#undef FUNCTION_PASS_WITH_PARAMS
|
|
|
|
|
2016-02-25 15:23:08 +08:00
|
|
|
#ifndef LOOP_ANALYSIS
|
|
|
|
#define LOOP_ANALYSIS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
|
2016-07-09 05:21:44 +08:00
|
|
|
LOOP_ANALYSIS("access-info", LoopAccessAnalysis())
|
Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
2019-09-19 01:43:45 +08:00
|
|
|
LOOP_ANALYSIS("ddg", DDGAnalysis())
|
2016-07-17 06:51:33 +08:00
|
|
|
LOOP_ANALYSIS("ivusers", IVUsersAnalysis())
|
2018-09-21 01:08:45 +08:00
|
|
|
LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
|
2016-02-25 15:23:08 +08:00
|
|
|
#undef LOOP_ANALYSIS
|
|
|
|
|
|
|
|
#ifndef LOOP_PASS
|
|
|
|
#define LOOP_PASS(NAME, CREATE_PASS)
|
|
|
|
#endif
|
|
|
|
LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())
|
2016-07-13 06:42:24 +08:00
|
|
|
LOOP_PASS("licm", LICMPass())
|
2016-07-13 02:45:51 +08:00
|
|
|
LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
|
2018-05-25 09:32:36 +08:00
|
|
|
LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
|
2016-05-04 06:02:31 +08:00
|
|
|
LOOP_PASS("rotate", LoopRotatePass())
|
2016-02-25 15:23:08 +08:00
|
|
|
LOOP_PASS("no-op-loop", NoOpLoopPass())
|
|
|
|
LOOP_PASS("print", PrintLoopPass(dbgs()))
|
2016-07-15 02:28:29 +08:00
|
|
|
LOOP_PASS("loop-deletion", LoopDeletionPass())
|
2016-05-04 05:47:32 +08:00
|
|
|
LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())
|
2016-07-19 05:41:50 +08:00
|
|
|
LOOP_PASS("strength-reduce", LoopStrengthReducePass())
|
2016-06-06 02:01:19 +08:00
|
|
|
LOOP_PASS("indvars", IndVarSimplifyPass())
|
2017-08-03 04:35:29 +08:00
|
|
|
LOOP_PASS("unroll-full", LoopFullUnrollPass())
|
2016-07-03 05:18:40 +08:00
|
|
|
LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
|
Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
2019-09-19 01:43:45 +08:00
|
|
|
LOOP_PASS("print<ddg>", DDGAnalysisPrinterPass(dbgs()))
|
2016-07-17 06:51:33 +08:00
|
|
|
LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))
|
Title: Loop Cache Analysis
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:
Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.
The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.
The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:
Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459
llvm-svn: 368439
2019-08-09 21:56:29 +08:00
|
|
|
LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))
|
2017-01-26 00:00:44 +08:00
|
|
|
LOOP_PASS("loop-predication", LoopPredicationPass())
|
2019-04-19 03:17:14 +08:00
|
|
|
LOOP_PASS("guard-widening", GuardWideningPass())
|
2016-02-25 15:23:08 +08:00
|
|
|
#undef LOOP_PASS
|
2019-04-22 18:35:07 +08:00
|
|
|
|
|
|
|
#ifndef LOOP_PASS_WITH_PARAMS
|
|
|
|
#define LOOP_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)
|
|
|
|
#endif
|
|
|
|
LOOP_PASS_WITH_PARAMS("unswitch",
|
|
|
|
[](bool NonTrivial) {
|
|
|
|
return SimpleLoopUnswitchPass(NonTrivial);
|
|
|
|
},
|
|
|
|
parseLoopUnswitchOptions)
|
|
|
|
#undef LOOP_PASS_WITH_PARAMS
|