forked from OSchip/llvm-project
36d4421f50
This patch adds - New arguments to getMinPrefetchStride() to let the target decide on a per-loop basis if software prefetching should be done even with a stride within the limit of the hw prefetcher. - New TTI hook enableWritePrefetching() to let a target do write prefetching by default (defaults to false). - In LoopDataPrefetch: - A search through the whole loop to gather information before emitting any prefetches. This way the target can get information via new arguments to getMinPrefetchStride() and emit prefetches more selectively. Collected information includes: Does the loop have a call, how many memory accesses, how many of them are strided, how many prefetches will cover them. This is NFC to before as long as the target does not change its definition of getMinPrefetchStride(). - If a previous access to the same exact address was 'read', and the current one is 'write', make it a 'write' prefetch. - If two accesses that are covered by the same prefetch do not dominate each other, put the prefetch in a block that dominates both of them. - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop. - A SystemZ implementation of getMinPrefetchStride(). Review: Ulrich Weigand, Michael Kruse Differential Revision: https://reviews.llvm.org/D70228 |
||
---|---|---|
.. | ||
AliasAnalysis.cpp | ||
AliasAnalysisEvaluator.cpp | ||
AliasAnalysisSummary.cpp | ||
AliasAnalysisSummary.h | ||
AliasSetTracker.cpp | ||
Analysis.cpp | ||
AssumptionCache.cpp | ||
BasicAliasAnalysis.cpp | ||
BlockFrequencyInfo.cpp | ||
BlockFrequencyInfoImpl.cpp | ||
BranchProbabilityInfo.cpp | ||
CFG.cpp | ||
CFGPrinter.cpp | ||
CFLAndersAliasAnalysis.cpp | ||
CFLGraph.h | ||
CFLSteensAliasAnalysis.cpp | ||
CGSCCPassManager.cpp | ||
CMakeLists.txt | ||
CallGraph.cpp | ||
CallGraphSCCPass.cpp | ||
CallPrinter.cpp | ||
CaptureTracking.cpp | ||
CmpInstAnalysis.cpp | ||
CodeMetrics.cpp | ||
ConstantFolding.cpp | ||
CostModel.cpp | ||
DDG.cpp | ||
Delinearization.cpp | ||
DemandedBits.cpp | ||
DependenceAnalysis.cpp | ||
DependenceGraphBuilder.cpp | ||
DivergenceAnalysis.cpp | ||
DomPrinter.cpp | ||
DomTreeUpdater.cpp | ||
DominanceFrontier.cpp | ||
EHPersonalities.cpp | ||
GlobalsModRef.cpp | ||
GuardUtils.cpp | ||
IVDescriptors.cpp | ||
IVUsers.cpp | ||
IndirectCallPromotionAnalysis.cpp | ||
InlineCost.cpp | ||
InstCount.cpp | ||
InstructionPrecedenceTracking.cpp | ||
InstructionSimplify.cpp | ||
Interval.cpp | ||
IntervalPartition.cpp | ||
LLVMBuild.txt | ||
LazyBlockFrequencyInfo.cpp | ||
LazyBranchProbabilityInfo.cpp | ||
LazyCallGraph.cpp | ||
LazyValueInfo.cpp | ||
LegacyDivergenceAnalysis.cpp | ||
Lint.cpp | ||
Loads.cpp | ||
LoopAccessAnalysis.cpp | ||
LoopAnalysisManager.cpp | ||
LoopCacheAnalysis.cpp | ||
LoopInfo.cpp | ||
LoopNestAnalysis.cpp | ||
LoopPass.cpp | ||
LoopUnrollAnalyzer.cpp | ||
MemDepPrinter.cpp | ||
MemDerefPrinter.cpp | ||
MemoryBuiltins.cpp | ||
MemoryDependenceAnalysis.cpp | ||
MemoryLocation.cpp | ||
MemorySSA.cpp | ||
MemorySSAUpdater.cpp | ||
ModuleDebugInfoPrinter.cpp | ||
ModuleSummaryAnalysis.cpp | ||
MustExecute.cpp | ||
ObjCARCAliasAnalysis.cpp | ||
ObjCARCAnalysisUtils.cpp | ||
ObjCARCInstKind.cpp | ||
OptimizationRemarkEmitter.cpp | ||
OrderedInstructions.cpp | ||
PHITransAddr.cpp | ||
PhiValues.cpp | ||
PostDominators.cpp | ||
ProfileSummaryInfo.cpp | ||
PtrUseVisitor.cpp | ||
README.txt | ||
RegionInfo.cpp | ||
RegionPass.cpp | ||
RegionPrinter.cpp | ||
ScalarEvolution.cpp | ||
ScalarEvolutionAliasAnalysis.cpp | ||
ScalarEvolutionExpander.cpp | ||
ScalarEvolutionNormalization.cpp | ||
ScopedNoAliasAA.cpp | ||
StackSafetyAnalysis.cpp | ||
StratifiedSets.h | ||
SyncDependenceAnalysis.cpp | ||
SyntheticCountsUtils.cpp | ||
TargetLibraryInfo.cpp | ||
TargetTransformInfo.cpp | ||
Trace.cpp | ||
TypeBasedAliasAnalysis.cpp | ||
TypeMetadataUtils.cpp | ||
VFABIDemangling.cpp | ||
ValueLattice.cpp | ||
ValueLatticeUtils.cpp | ||
ValueTracking.cpp | ||
VectorUtils.cpp |
README.txt
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//