llvm-project/llvm/lib/Analysis
Michael Zolotukhin b7b8052982 [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
Summary:
...loop after the last iteration.

This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.

This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.

The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.

Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.

Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.

We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.

Reviewers: chandlerc

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11758

llvm-svn: 269388
2016-05-13 01:42:39 +00:00
..
AliasAnalysis.cpp NFC: make AtomicOrdering an enum class 2016-04-06 21:19:33 +00:00
AliasAnalysisEvaluator.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
AliasSetTracker.cpp [AliasSetTracker] Correctly handle changing the size of an entry 2016-04-14 22:00:11 +00:00
Analysis.cpp [PM] Port of the DepndenceAnalysis to the new PM. 2016-05-12 22:19:39 +00:00
AssumptionCache.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
BasicAliasAnalysis.cpp [BasicAA] Compare GEP indices based on value (Fix PR27418) 2016-05-11 15:45:43 +00:00
BitSetUtils.cpp Re-apply r269081 and r269082 with a fix for MSVC. 2016-05-10 18:07:21 +00:00
BlockFrequencyInfo.cpp [PM] port Branch Frequency Analaysis pass to new PM 2016-05-05 21:13:27 +00:00
BlockFrequencyInfoImpl.cpp fix spelling; NFC 2016-05-09 16:07:45 +00:00
BranchProbabilityInfo.cpp [PM] Port Branch Probability Analysis pass to the new pass manager. 2016-05-05 02:59:57 +00:00
CFG.cpp Avoid overly large SmallPtrSet/SmallSet 2016-01-30 01:24:31 +00:00
CFGPrinter.cpp Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
CFLAliasAnalysis.cpp [CFLAA] Fix a use-of-invalid-pointer bug. 2016-05-02 18:09:19 +00:00
CGSCCPassManager.cpp [NFC] Header cleanup 2016-04-18 09:17:29 +00:00
CMakeLists.txt Re-apply r269081 and r269082 with a fix for MSVC. 2016-05-10 18:07:21 +00:00
CallGraph.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
CallGraphSCCPass.cpp Re-commit optimization bisect support (r267022) without new pass manager support. 2016-04-22 22:06:11 +00:00
CallPrinter.cpp [CG] Rename the DOT printing pass to actually reference "DOT". 2016-03-10 11:04:40 +00:00
CaptureTracking.cpp Fold compares irrespective of whether allocation can be elided 2016-05-03 14:58:21 +00:00
CodeMetrics.cpp use range-based for loop; NFCI 2016-03-08 20:53:48 +00:00
ConstantFolding.cpp [ConstantFolding, ValueTracking] Fold constants involving bitcasts of ConstantVector 2016-05-04 06:13:33 +00:00
CostModel.cpp [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN 2016-04-14 07:13:24 +00:00
Delinearization.cpp [NFC] Header cleanup 2016-04-18 09:17:29 +00:00
DemandedBits.cpp Port DemandedBits to the new pass manager. 2016-04-18 23:55:01 +00:00
DependenceAnalysis.cpp [PM] Port of the DepndenceAnalysis to the new PM. 2016-05-12 22:19:39 +00:00
DivergenceAnalysis.cpp DivergenceAnalysis: Fix crash with no return blocks 2016-05-09 16:57:08 +00:00
DomPrinter.cpp Introduce analysis pass to compute PostDominators in the new pass manager. NFC 2016-02-25 17:54:07 +00:00
DominanceFrontier.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
EHPersonalities.cpp Add Rust's personality function to the list of known personality functions 2016-03-15 20:35:45 +00:00
GlobalsModRef.cpp Don't IPO over functions that can be de-refined 2016-04-08 00:48:30 +00:00
IVUsers.cpp Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment. 2016-01-29 20:50:44 +00:00
InlineCost.cpp Revert r269131 2016-05-10 23:26:04 +00:00
InstCount.cpp Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
InstructionSimplify.cpp [InstSimplify] use computeKnownBits on shift amount operands 2016-05-10 20:46:54 +00:00
Interval.cpp
IntervalPartition.cpp
IteratedDominanceFrontier.cpp Correct IDF calculator for ReverseIDF 2016-04-19 06:13:28 +00:00
LLVMBuild.txt Revert r269131 2016-05-10 23:26:04 +00:00
LazyCallGraph.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
LazyValueInfo.cpp [LVI] Add an API to LazyValueInfo so that it can export ConstantRanges 2016-05-02 19:58:00 +00:00
Lint.cpp [opaque pointer types] [NFC] FindAvailableLoadedValue: take LoadInst instead of just the pointer. 2016-01-22 01:51:51 +00:00
Loads.cpp NFC. Introduce Value::isPointerDereferenceable 2016-05-11 14:43:28 +00:00
LoopAccessAnalysis.cpp [LAA] Use std::min. NFC 2016-05-12 21:41:53 +00:00
LoopInfo.cpp [LoopUnroll] Unroll loops which have exit blocks to EH pads 2016-05-03 03:57:40 +00:00
LoopPass.cpp Re-commit optimization bisect support (r267022) without new pass manager support. 2016-04-22 22:06:11 +00:00
LoopPassManager.cpp PM: Check that loop passes preserve a basic set of analyses 2016-05-03 21:35:08 +00:00
LoopUnrollAnalyzer.cpp [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... 2016-05-13 01:42:39 +00:00
MemDepPrinter.cpp [PM] Port memdep to the new pass manager. 2016-03-10 00:55:30 +00:00
MemDerefPrinter.cpp NFC. Move isDereferenceable to Loads.h/cpp 2016-02-24 12:49:04 +00:00
MemoryBuiltins.cpp Calculate __builtin_object_size when pointer depends on a condition 2016-04-13 12:25:25 +00:00
MemoryDependenceAnalysis.cpp NFC: make AtomicOrdering an enum class 2016-04-06 21:19:33 +00:00
MemoryLocation.cpp [TLI] Unify LibFunc signature checking. NFCI. 2016-04-27 19:04:35 +00:00
ModuleDebugInfoPrinter.cpp Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
ModuleSummaryAnalysis.cpp ThinLTO: fix assertion and refactor check for hidden use from inline ASM in a helper function 2016-05-06 08:25:33 +00:00
ObjCARCAliasAnalysis.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
ObjCARCAnalysisUtils.cpp [ARC] Pull the ObjC ARC components that really serve the role of 2015-08-20 08:06:03 +00:00
ObjCARCInstKind.cpp Add support for objc_unsafeClaimAutoreleasedReturnValue to the 2016-01-27 19:05:08 +00:00
OrderedBasicBlock.cpp [CaptureTracker] Provide an ordered basic block to PointerMayBeCapturedBefore 2015-07-31 14:31:35 +00:00
PHITransAddr.cpp Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment. 2016-01-29 20:50:44 +00:00
PostDominators.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
PtrUseVisitor.cpp Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> 2014-11-19 07:49:26 +00:00
README.txt
RegionInfo.cpp [NFC] Header cleanup 2016-04-18 09:17:29 +00:00
RegionPass.cpp Change range-based for-loops to be -Wrange-loop-analysis clean. 2015-04-15 01:21:15 +00:00
RegionPrinter.cpp [RegionInfo] Add debug-time region viewer functions 2015-08-10 13:21:59 +00:00
ScalarEvolution.cpp [SCEV] Be more aggressive around proving no-wrap 2016-05-11 17:41:26 +00:00
ScalarEvolutionAliasAnalysis.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
ScalarEvolutionExpander.cpp [SCEVExpander] Fix a failed cast<> assertion 2016-05-11 17:41:41 +00:00
ScalarEvolutionNormalization.cpp Remove emacs mode markers from .cpp files. NFC 2016-04-24 17:55:41 +00:00
ScopedNoAliasAA.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
SparsePropagation.cpp Analysis: Remove implicit ilist iterator conversions 2015-10-10 00:53:03 +00:00
StratifiedSets.h [NFC] Header cleanup 2016-04-18 09:17:29 +00:00
TargetLibraryInfo.cpp [X86] Promote several single precision FP libcalls on Windows 2016-05-08 08:15:50 +00:00
TargetTransformInfo.cpp [TTI] Add hook for vector extract with extension 2016-04-27 15:20:21 +00:00
Trace.cpp Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment. 2016-01-29 20:50:44 +00:00
TypeBasedAliasAnalysis.cpp [PM] Make the AnalysisManager parameter to run methods a reference. 2016-03-11 11:05:24 +00:00
ValueTracking.cpp [ValueTracking] Use guards to prove non-nullness of a value 2016-05-10 02:35:44 +00:00
VectorUtils.cpp Revert "[VectorUtils] Query number of sign bits to allow more truncations" 2016-05-10 12:27:23 +00:00

README.txt

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//