2010-12-27 03:39:38 +08:00
|
|
|
//===-- LoopIdiomRecognize.cpp - Loop idiom recognition -------------------===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This pass implements an idiom recognizer that transforms simple loops into a
|
|
|
|
// non-loop form. In cases that this kicks in, it can be a significant
|
|
|
|
// performance win.
|
|
|
|
//
|
2016-08-12 02:28:33 +08:00
|
|
|
// If compiling for code size we avoid idiom recognition if the resulting
|
|
|
|
// code could be larger than the code for the original loop. One way this could
|
|
|
|
// happen is if the loop is not removable after idiom recognition due to the
|
|
|
|
// presence of non-idiom instructions. The initial implementation of the
|
|
|
|
// heuristics applies to idioms in multi-block loops.
|
|
|
|
//
|
2010-12-27 03:39:38 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
2011-01-03 02:32:09 +08:00
|
|
|
//
|
|
|
|
// TODO List:
|
|
|
|
//
|
|
|
|
// Future loop memory idioms to recognize:
|
2012-11-02 16:33:25 +08:00
|
|
|
// memcmp, memmove, strlen, etc.
|
2011-01-03 02:32:09 +08:00
|
|
|
// Future floating point idioms to recognize in -ffast-math mode:
|
|
|
|
// fpowi
|
|
|
|
// Future integer operation idioms to recognize:
|
|
|
|
// ctpop, ctlz, cttz
|
|
|
|
//
|
|
|
|
// Beware that isel's default lowering for ctpop is highly inefficient for
|
|
|
|
// i64 and larger types when i64 is legal and the value has few bits set. It
|
|
|
|
// would be good to enhance isel to emit a loop for ctpop in this case.
|
|
|
|
//
|
2011-01-03 09:10:08 +08:00
|
|
|
// This could recognize common matrix multiplies and dot product idioms and
|
2011-01-03 07:19:45 +08:00
|
|
|
// replace them with calls to BLAS (if linked in??).
|
|
|
|
//
|
2011-01-03 02:32:09 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
|
2016-01-26 10:27:47 +08:00
|
|
|
#include "llvm/ADT/MapVector.h"
|
|
|
|
#include "llvm/ADT/SetVector.h"
|
2012-06-29 20:38:19 +08:00
|
|
|
#include "llvm/ADT/Statistic.h"
|
2010-12-28 02:39:08 +08:00
|
|
|
#include "llvm/Analysis/AliasAnalysis.h"
|
[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
2015-09-10 01:55:00 +08:00
|
|
|
#include "llvm/Analysis/BasicAliasAnalysis.h"
|
|
|
|
#include "llvm/Analysis/GlobalsModRef.h"
|
2016-01-26 10:27:47 +08:00
|
|
|
#include "llvm/Analysis/LoopAccessAnalysis.h"
|
2016-07-13 02:45:51 +08:00
|
|
|
#include "llvm/Analysis/LoopPass.h"
|
[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
2015-09-10 01:55:00 +08:00
|
|
|
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
|
2015-11-24 05:09:13 +08:00
|
|
|
#include "llvm/Analysis/ScalarEvolutionExpander.h"
|
2012-06-29 20:38:19 +08:00
|
|
|
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
|
2015-03-24 03:32:43 +08:00
|
|
|
#include "llvm/Analysis/TargetLibraryInfo.h"
|
2013-01-07 11:08:10 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
2010-12-27 04:45:45 +08:00
|
|
|
#include "llvm/Analysis/ValueTracking.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
2014-01-13 17:26:24 +08:00
|
|
|
#include "llvm/IR/Dominators.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/IRBuilder.h"
|
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
|
|
|
#include "llvm/IR/Module.h"
|
2012-06-29 20:38:19 +08:00
|
|
|
#include "llvm/Support/Debug.h"
|
|
|
|
#include "llvm/Support/raw_ostream.h"
|
2016-07-13 02:45:51 +08:00
|
|
|
#include "llvm/Transforms/Scalar.h"
|
2017-01-11 17:43:56 +08:00
|
|
|
#include "llvm/Transforms/Scalar/LoopPassManager.h"
|
2016-04-28 03:04:50 +08:00
|
|
|
#include "llvm/Transforms/Utils/BuildLibCalls.h"
|
2010-12-27 08:03:23 +08:00
|
|
|
#include "llvm/Transforms/Utils/Local.h"
|
[LPM] Factor all of the loop analysis usage updates into a common helper
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
llvm-svn: 261316
2016-02-19 18:45:18 +08:00
|
|
|
#include "llvm/Transforms/Utils/LoopUtils.h"
|
2010-12-27 03:39:38 +08:00
|
|
|
using namespace llvm;
|
|
|
|
|
2014-04-22 10:55:47 +08:00
|
|
|
#define DEBUG_TYPE "loop-idiom"
|
|
|
|
|
2012-11-02 16:33:25 +08:00
|
|
|
STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
|
|
|
|
STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2016-08-12 02:28:33 +08:00
|
|
|
static cl::opt<bool> UseLIRCodeSizeHeurs(
|
|
|
|
"use-lir-code-size-heurs",
|
|
|
|
cl::desc("Use loop idiom recognition code size heuristics when compiling"
|
|
|
|
"with -Os/-Oz"),
|
|
|
|
cl::init(true), cl::Hidden);
|
|
|
|
|
2010-12-27 03:39:38 +08:00
|
|
|
namespace {
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
class LoopIdiomRecognize {
|
2015-08-13 07:06:37 +08:00
|
|
|
Loop *CurLoop;
|
2015-08-14 08:21:10 +08:00
|
|
|
AliasAnalysis *AA;
|
2015-08-13 07:06:37 +08:00
|
|
|
DominatorTree *DT;
|
2015-08-13 17:27:01 +08:00
|
|
|
LoopInfo *LI;
|
2015-08-13 07:06:37 +08:00
|
|
|
ScalarEvolution *SE;
|
|
|
|
TargetLibraryInfo *TLI;
|
|
|
|
const TargetTransformInfo *TTI;
|
2015-11-07 00:33:57 +08:00
|
|
|
const DataLayout *DL;
|
2016-08-12 02:28:33 +08:00
|
|
|
bool ApplyCodeSizeHeuristics;
|
2015-08-13 07:06:37 +08:00
|
|
|
|
|
|
|
public:
|
2016-07-13 02:45:51 +08:00
|
|
|
explicit LoopIdiomRecognize(AliasAnalysis *AA, DominatorTree *DT,
|
|
|
|
LoopInfo *LI, ScalarEvolution *SE,
|
|
|
|
TargetLibraryInfo *TLI,
|
|
|
|
const TargetTransformInfo *TTI,
|
|
|
|
const DataLayout *DL)
|
|
|
|
: CurLoop(nullptr), AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI),
|
|
|
|
DL(DL) {}
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
bool runOnLoop(Loop *L);
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 07:06:37 +08:00
|
|
|
private:
|
2015-11-12 07:00:59 +08:00
|
|
|
typedef SmallVector<StoreInst *, 8> StoreList;
|
2016-01-26 10:27:47 +08:00
|
|
|
typedef MapVector<Value *, StoreList> StoreListMap;
|
|
|
|
StoreListMap StoreRefsForMemset;
|
|
|
|
StoreListMap StoreRefsForMemsetPattern;
|
2016-01-05 05:43:14 +08:00
|
|
|
StoreList StoreRefsForMemcpy;
|
|
|
|
bool HasMemset;
|
|
|
|
bool HasMemsetPattern;
|
|
|
|
bool HasMemcpy;
|
2015-11-12 07:00:59 +08:00
|
|
|
|
2015-08-13 08:10:03 +08:00
|
|
|
/// \name Countable Loop Idiom Handling
|
|
|
|
/// @{
|
|
|
|
|
2015-08-13 07:06:37 +08:00
|
|
|
bool runOnCountableLoop();
|
2015-08-13 08:10:03 +08:00
|
|
|
bool runOnLoopBlock(BasicBlock *BB, const SCEV *BECount,
|
|
|
|
SmallVectorImpl<BasicBlock *> &ExitBlocks);
|
|
|
|
|
2015-11-12 07:00:59 +08:00
|
|
|
void collectStores(BasicBlock *BB);
|
2016-01-26 10:27:47 +08:00
|
|
|
bool isLegalStore(StoreInst *SI, bool &ForMemset, bool &ForMemsetPattern,
|
|
|
|
bool &ForMemcpy);
|
|
|
|
bool processLoopStores(SmallVectorImpl<StoreInst *> &SL, const SCEV *BECount,
|
|
|
|
bool ForMemset);
|
2015-08-13 08:10:03 +08:00
|
|
|
bool processLoopMemSet(MemSetInst *MSI, const SCEV *BECount);
|
|
|
|
|
|
|
|
bool processLoopStridedStore(Value *DestPtr, unsigned StoreSize,
|
2016-01-05 05:43:14 +08:00
|
|
|
unsigned StoreAlignment, Value *StoredVal,
|
2016-01-26 10:27:47 +08:00
|
|
|
Instruction *TheStore,
|
|
|
|
SmallPtrSetImpl<Instruction *> &Stores,
|
|
|
|
const SCEVAddRecExpr *Ev, const SCEV *BECount,
|
2016-08-12 02:28:33 +08:00
|
|
|
bool NegStride, bool IsLoopMemset = false);
|
2016-01-05 05:43:14 +08:00
|
|
|
bool processLoopStoreOfLoopLoad(StoreInst *SI, const SCEV *BECount);
|
2016-08-12 02:28:33 +08:00
|
|
|
bool avoidLIRForMultiBlockLoop(bool IsMemset = false,
|
|
|
|
bool IsLoopMemset = false);
|
2015-08-13 08:10:03 +08:00
|
|
|
|
|
|
|
/// @}
|
|
|
|
/// \name Noncountable Loop Idiom Handling
|
|
|
|
/// @{
|
|
|
|
|
|
|
|
bool runOnNoncountableLoop();
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
bool recognizePopcount();
|
|
|
|
void transformLoopToPopcount(BasicBlock *PreCondBB, Instruction *CntInst,
|
|
|
|
PHINode *CntPhi, Value *Var);
|
|
|
|
|
2015-08-13 08:10:03 +08:00
|
|
|
/// @}
|
2015-08-13 07:06:37 +08:00
|
|
|
};
|
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
class LoopIdiomRecognizeLegacyPass : public LoopPass {
|
|
|
|
public:
|
|
|
|
static char ID;
|
|
|
|
explicit LoopIdiomRecognizeLegacyPass() : LoopPass(ID) {
|
|
|
|
initializeLoopIdiomRecognizeLegacyPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
bool runOnLoop(Loop *L, LPPassManager &LPM) override {
|
|
|
|
if (skipLoop(L))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
|
|
|
|
DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
|
|
|
|
LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
|
|
|
|
ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
|
|
|
|
TargetLibraryInfo *TLI =
|
|
|
|
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
|
|
|
|
const TargetTransformInfo *TTI =
|
|
|
|
&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
|
|
|
|
*L->getHeader()->getParent());
|
|
|
|
const DataLayout *DL = &L->getHeader()->getModule()->getDataLayout();
|
|
|
|
|
|
|
|
LoopIdiomRecognize LIR(AA, DT, LI, SE, TLI, TTI, DL);
|
|
|
|
return LIR.runOnLoop(L);
|
|
|
|
}
|
|
|
|
|
|
|
|
/// This transformation requires natural loop information & requires that
|
|
|
|
/// loop preheaders be inserted into the CFG.
|
|
|
|
///
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
|
|
|
AU.addRequired<TargetLibraryInfoWrapperPass>();
|
|
|
|
AU.addRequired<TargetTransformInfoWrapperPass>();
|
|
|
|
getLoopAnalysisUsage(AU);
|
|
|
|
}
|
|
|
|
};
|
2015-08-13 07:06:37 +08:00
|
|
|
} // End anonymous namespace.
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2017-01-11 14:23:21 +08:00
|
|
|
PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
|
|
|
|
LoopStandardAnalysisResults &AR,
|
|
|
|
LPMUpdater &) {
|
2016-07-13 02:45:51 +08:00
|
|
|
const auto *DL = &L.getHeader()->getModule()->getDataLayout();
|
|
|
|
|
2017-01-11 14:23:21 +08:00
|
|
|
LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI, DL);
|
2016-07-13 02:45:51 +08:00
|
|
|
if (!LIR.runOnLoop(&L))
|
|
|
|
return PreservedAnalyses::all();
|
|
|
|
|
|
|
|
return getLoopPassPreservedAnalyses();
|
|
|
|
}
|
|
|
|
|
|
|
|
char LoopIdiomRecognizeLegacyPass::ID = 0;
|
|
|
|
INITIALIZE_PASS_BEGIN(LoopIdiomRecognizeLegacyPass, "loop-idiom",
|
|
|
|
"Recognize loop idioms", false, false)
|
[LPM] Factor all of the loop analysis usage updates into a common helper
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
llvm-svn: 261316
2016-02-19 18:45:18 +08:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(LoopPass)
|
2015-01-15 18:41:28 +08:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
|
2016-07-13 02:45:51 +08:00
|
|
|
INITIALIZE_PASS_END(LoopIdiomRecognizeLegacyPass, "loop-idiom",
|
|
|
|
"Recognize loop idioms", false, false)
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
Pass *llvm::createLoopIdiomPass() { return new LoopIdiomRecognizeLegacyPass(); }
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2016-06-21 00:03:25 +08:00
|
|
|
static void deleteDeadInstruction(Instruction *I) {
|
2015-02-08 05:37:08 +08:00
|
|
|
I->replaceAllUsesWith(UndefValue::get(I->getType()));
|
|
|
|
I->eraseFromParent();
|
2012-11-02 16:33:25 +08:00
|
|
|
}
|
|
|
|
|
2012-12-09 11:12:46 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
2015-08-13 08:44:29 +08:00
|
|
|
// Implementation of LoopIdiomRecognize
|
2012-12-09 11:12:46 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2016-07-13 02:45:51 +08:00
|
|
|
bool LoopIdiomRecognize::runOnLoop(Loop *L) {
|
2015-08-13 08:44:29 +08:00
|
|
|
CurLoop = L;
|
|
|
|
// If the loop could not be converted to canonical form, it must have an
|
|
|
|
// indirectbr in it, just give up.
|
|
|
|
if (!L->getLoopPreheader())
|
2012-12-09 11:12:46 +08:00
|
|
|
return false;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Disable loop idiom recognition if the function's name is a common idiom.
|
|
|
|
StringRef Name = L->getHeader()->getParent()->getName();
|
|
|
|
if (Name == "memset" || Name == "memcpy")
|
2015-08-13 07:55:56 +08:00
|
|
|
return false;
|
2013-07-23 02:59:58 +08:00
|
|
|
|
2016-08-12 02:28:33 +08:00
|
|
|
// Determine if code size heuristics need to be applied.
|
|
|
|
ApplyCodeSizeHeuristics =
|
|
|
|
L->getHeader()->getParent()->optForSize() && UseLIRCodeSizeHeurs;
|
|
|
|
|
[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
2017-01-24 07:16:46 +08:00
|
|
|
HasMemset = TLI->has(LibFunc_memset);
|
|
|
|
HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
|
|
|
|
HasMemcpy = TLI->has(LibFunc_memcpy);
|
2016-01-05 05:43:14 +08:00
|
|
|
|
|
|
|
if (HasMemset || HasMemsetPattern || HasMemcpy)
|
|
|
|
if (SE->hasLoopInvariantBackedgeTakenCount(L))
|
|
|
|
return runOnCountableLoop();
|
2015-08-13 09:03:26 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
return runOnNoncountableLoop();
|
2012-12-09 11:12:46 +08:00
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
bool LoopIdiomRecognize::runOnCountableLoop() {
|
|
|
|
const SCEV *BECount = SE->getBackedgeTakenCount(CurLoop);
|
|
|
|
assert(!isa<SCEVCouldNotCompute>(BECount) &&
|
|
|
|
"runOnCountableLoop() called on a loop without a predictable"
|
|
|
|
"backedge-taken count");
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// If this loop executes exactly one time, then it should be peeled, not
|
|
|
|
// optimized by this pass.
|
|
|
|
if (const SCEVConstant *BECst = dyn_cast<SCEVConstant>(BECount))
|
2015-12-18 04:28:46 +08:00
|
|
|
if (BECst->getAPInt() == 0)
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
SmallVector<BasicBlock *, 8> ExitBlocks;
|
|
|
|
CurLoop->getUniqueExitBlocks(ExitBlocks);
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
DEBUG(dbgs() << "loop-idiom Scanning: F["
|
|
|
|
<< CurLoop->getHeader()->getParent()->getName() << "] Loop %"
|
|
|
|
<< CurLoop->getHeader()->getName() << "\n");
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
bool MadeChange = false;
|
2016-07-07 05:05:40 +08:00
|
|
|
|
|
|
|
// The following transforms hoist stores/memsets into the loop pre-header.
|
|
|
|
// Give up if the loop has instructions may throw.
|
|
|
|
LoopSafetyInfo SafetyInfo;
|
|
|
|
computeLoopSafetyInfo(&SafetyInfo, CurLoop);
|
2017-04-25 02:25:07 +08:00
|
|
|
if (SafetyInfo.MayThrow)
|
2016-07-07 05:05:40 +08:00
|
|
|
return MadeChange;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Scan all the blocks in the loop that are not in subloops.
|
|
|
|
for (auto *BB : CurLoop->getBlocks()) {
|
|
|
|
// Ignore blocks in subloops.
|
2015-08-13 17:27:01 +08:00
|
|
|
if (LI->getLoopFor(BB) != CurLoop)
|
2015-08-13 08:44:29 +08:00
|
|
|
continue;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
MadeChange |= runOnLoopBlock(BB, BECount, ExitBlocks);
|
2012-12-09 11:12:46 +08:00
|
|
|
}
|
2015-08-13 08:44:29 +08:00
|
|
|
return MadeChange;
|
|
|
|
}
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-11-13 03:09:16 +08:00
|
|
|
static unsigned getStoreSizeInBytes(StoreInst *SI, const DataLayout *DL) {
|
|
|
|
uint64_t SizeInBits = DL->getTypeSizeInBits(SI->getValueOperand()->getType());
|
|
|
|
assert(((SizeInBits & 7) || (SizeInBits >> 32) == 0) &&
|
|
|
|
"Don't overflow unsigned.");
|
|
|
|
return (unsigned)SizeInBits >> 3;
|
|
|
|
}
|
|
|
|
|
2016-02-13 03:05:27 +08:00
|
|
|
static APInt getStoreStride(const SCEVAddRecExpr *StoreEv) {
|
2015-11-13 03:09:16 +08:00
|
|
|
const SCEVConstant *ConstStride = cast<SCEVConstant>(StoreEv->getOperand(1));
|
2016-02-13 03:05:27 +08:00
|
|
|
return ConstStride->getAPInt();
|
2015-11-13 03:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-12-21 22:49:32 +08:00
|
|
|
/// getMemSetPatternValue - If a strided store of the specified value is safe to
|
|
|
|
/// turn into a memset_pattern16, return a ConstantArray of 16 bytes that should
|
|
|
|
/// be passed in. Otherwise, return null.
|
|
|
|
///
|
|
|
|
/// Note that we don't ever attempt to use memset_pattern8 or 4, because these
|
|
|
|
/// just replicate their input array and then pass on to memset_pattern16.
|
|
|
|
static Constant *getMemSetPatternValue(Value *V, const DataLayout *DL) {
|
|
|
|
// If the value isn't a constant, we can't promote it to being in a constant
|
|
|
|
// array. We could theoretically do a store to an alloca or something, but
|
|
|
|
// that doesn't seem worthwhile.
|
|
|
|
Constant *C = dyn_cast<Constant>(V);
|
|
|
|
if (!C)
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
// Only handle simple values that are a power of two bytes in size.
|
|
|
|
uint64_t Size = DL->getTypeSizeInBits(V->getType());
|
|
|
|
if (Size == 0 || (Size & 7) || (Size & (Size - 1)))
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
// Don't care enough about darwin/ppc to implement this.
|
|
|
|
if (DL->isBigEndian())
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
// Convert to size in bytes.
|
|
|
|
Size /= 8;
|
|
|
|
|
|
|
|
// TODO: If CI is larger than 16-bytes, we can try slicing it in half to see
|
|
|
|
// if the top and bottom are the same (e.g. for vectors and large integers).
|
|
|
|
if (Size > 16)
|
|
|
|
return nullptr;
|
|
|
|
|
|
|
|
// If the constant is exactly 16 bytes, just use it.
|
|
|
|
if (Size == 16)
|
|
|
|
return C;
|
|
|
|
|
|
|
|
// Otherwise, we'll use an array of the constants.
|
|
|
|
unsigned ArraySize = 16 / Size;
|
|
|
|
ArrayType *AT = ArrayType::get(V->getType(), ArraySize);
|
|
|
|
return ConstantArray::get(AT, std::vector<Constant *>(ArraySize, C));
|
|
|
|
}
|
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
bool LoopIdiomRecognize::isLegalStore(StoreInst *SI, bool &ForMemset,
|
2016-01-26 10:27:47 +08:00
|
|
|
bool &ForMemsetPattern, bool &ForMemcpy) {
|
2015-12-01 22:26:35 +08:00
|
|
|
// Don't touch volatile stores.
|
|
|
|
if (!SI->isSimple())
|
|
|
|
return false;
|
|
|
|
|
2017-04-25 04:12:10 +08:00
|
|
|
// Don't convert stores of non-integral pointer types to memsets (which stores
|
|
|
|
// integers).
|
|
|
|
if (DL->isNonIntegralPointerType(SI->getValueOperand()->getType()))
|
|
|
|
return false;
|
|
|
|
|
2016-02-18 05:00:06 +08:00
|
|
|
// Avoid merging nontemporal stores.
|
|
|
|
if (SI->getMetadata(LLVMContext::MD_nontemporal))
|
|
|
|
return false;
|
|
|
|
|
2015-11-13 03:09:16 +08:00
|
|
|
Value *StoredVal = SI->getValueOperand();
|
|
|
|
Value *StorePtr = SI->getPointerOperand();
|
|
|
|
|
|
|
|
// Reject stores that are so large that they overflow an unsigned.
|
|
|
|
uint64_t SizeInBits = DL->getTypeSizeInBits(StoredVal->getType());
|
|
|
|
if ((SizeInBits & 7) || (SizeInBits >> 32) != 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// See if the pointer expression is an AddRec like {base,+,1} on the current
|
|
|
|
// loop, which indicates a strided store. If we have something else, it's a
|
|
|
|
// random store we can't handle.
|
|
|
|
const SCEVAddRecExpr *StoreEv =
|
|
|
|
dyn_cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
|
|
|
|
if (!StoreEv || StoreEv->getLoop() != CurLoop || !StoreEv->isAffine())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// Check to see if we have a constant stride.
|
|
|
|
if (!isa<SCEVConstant>(StoreEv->getOperand(1)))
|
|
|
|
return false;
|
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
// See if the store can be turned into a memset.
|
|
|
|
|
|
|
|
// If the stored value is a byte-wise value (like i32 -1), then it may be
|
|
|
|
// turned into a memset of i8 -1, assuming that all the consecutive bytes
|
|
|
|
// are stored. A store of i32 0x01020304 can never be turned into a memset,
|
|
|
|
// but it can be turned into memset_pattern if the target supports it.
|
|
|
|
Value *SplatValue = isBytewiseValue(StoredVal);
|
|
|
|
Constant *PatternValue = nullptr;
|
|
|
|
|
|
|
|
// If we're allowed to form a memset, and the stored value would be
|
|
|
|
// acceptable for memset, use it.
|
|
|
|
if (HasMemset && SplatValue &&
|
|
|
|
// Verify that the stored value is loop invariant. If not, we can't
|
|
|
|
// promote the memset.
|
|
|
|
CurLoop->isLoopInvariant(SplatValue)) {
|
|
|
|
// It looks like we can use SplatValue.
|
|
|
|
ForMemset = true;
|
|
|
|
return true;
|
|
|
|
} else if (HasMemsetPattern &&
|
|
|
|
// Don't create memset_pattern16s with address spaces.
|
|
|
|
StorePtr->getType()->getPointerAddressSpace() == 0 &&
|
|
|
|
(PatternValue = getMemSetPatternValue(StoredVal, DL))) {
|
|
|
|
// It looks like we can use PatternValue!
|
2016-01-26 10:27:47 +08:00
|
|
|
ForMemsetPattern = true;
|
2016-01-05 05:43:14 +08:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Otherwise, see if the store can be turned into a memcpy.
|
|
|
|
if (HasMemcpy) {
|
|
|
|
// Check to see if the stride matches the size of the store. If so, then we
|
|
|
|
// know that every byte is touched in the loop.
|
2016-02-13 03:05:27 +08:00
|
|
|
APInt Stride = getStoreStride(StoreEv);
|
2016-01-05 05:43:14 +08:00
|
|
|
unsigned StoreSize = getStoreSizeInBytes(SI, DL);
|
|
|
|
if (StoreSize != Stride && StoreSize != -Stride)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// The store must be feeding a non-volatile load.
|
|
|
|
LoadInst *LI = dyn_cast<LoadInst>(SI->getValueOperand());
|
|
|
|
if (!LI || !LI->isSimple())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// See if the pointer expression is an AddRec like {base,+,1} on the current
|
|
|
|
// loop, which indicates a strided load. If we have something else, it's a
|
|
|
|
// random load we can't handle.
|
|
|
|
const SCEVAddRecExpr *LoadEv =
|
|
|
|
dyn_cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));
|
|
|
|
if (!LoadEv || LoadEv->getLoop() != CurLoop || !LoadEv->isAffine())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// The store and load must share the same stride.
|
|
|
|
if (StoreEv->getOperand(1) != LoadEv->getOperand(1))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// Success. This store can be converted into a memcpy.
|
|
|
|
ForMemcpy = true;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
// This store can't be transformed into a memset/memcpy.
|
|
|
|
return false;
|
2015-11-13 03:09:16 +08:00
|
|
|
}
|
|
|
|
|
2015-11-12 07:00:59 +08:00
|
|
|
void LoopIdiomRecognize::collectStores(BasicBlock *BB) {
|
2016-01-05 05:43:14 +08:00
|
|
|
StoreRefsForMemset.clear();
|
2016-01-26 10:27:47 +08:00
|
|
|
StoreRefsForMemsetPattern.clear();
|
2016-01-05 05:43:14 +08:00
|
|
|
StoreRefsForMemcpy.clear();
|
2015-11-12 07:00:59 +08:00
|
|
|
for (Instruction &I : *BB) {
|
|
|
|
StoreInst *SI = dyn_cast<StoreInst>(&I);
|
|
|
|
if (!SI)
|
|
|
|
continue;
|
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
bool ForMemset = false;
|
2016-01-26 10:27:47 +08:00
|
|
|
bool ForMemsetPattern = false;
|
2016-01-05 05:43:14 +08:00
|
|
|
bool ForMemcpy = false;
|
2015-11-13 03:09:16 +08:00
|
|
|
// Make sure this is a strided store with a constant stride.
|
2016-01-26 10:27:47 +08:00
|
|
|
if (!isLegalStore(SI, ForMemset, ForMemsetPattern, ForMemcpy))
|
2015-11-13 03:09:16 +08:00
|
|
|
continue;
|
|
|
|
|
2015-11-12 07:00:59 +08:00
|
|
|
// Save the store locations.
|
2016-01-26 10:27:47 +08:00
|
|
|
if (ForMemset) {
|
|
|
|
// Find the base pointer.
|
|
|
|
Value *Ptr = GetUnderlyingObject(SI->getPointerOperand(), *DL);
|
|
|
|
StoreRefsForMemset[Ptr].push_back(SI);
|
|
|
|
} else if (ForMemsetPattern) {
|
|
|
|
// Find the base pointer.
|
|
|
|
Value *Ptr = GetUnderlyingObject(SI->getPointerOperand(), *DL);
|
|
|
|
StoreRefsForMemsetPattern[Ptr].push_back(SI);
|
|
|
|
} else if (ForMemcpy)
|
2016-01-05 05:43:14 +08:00
|
|
|
StoreRefsForMemcpy.push_back(SI);
|
2015-11-12 07:00:59 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// runOnLoopBlock - Process the specified block, which lives in a counted loop
|
|
|
|
/// with the specified backedge count. This block is known to be in the current
|
|
|
|
/// loop and not in any subloops.
|
|
|
|
bool LoopIdiomRecognize::runOnLoopBlock(
|
|
|
|
BasicBlock *BB, const SCEV *BECount,
|
|
|
|
SmallVectorImpl<BasicBlock *> &ExitBlocks) {
|
|
|
|
// We can only promote stores in this block if they are unconditionally
|
|
|
|
// executed in the loop. For a block to be unconditionally executed, it has
|
|
|
|
// to dominate all the exit blocks of the loop. Verify this now.
|
|
|
|
for (unsigned i = 0, e = ExitBlocks.size(); i != e; ++i)
|
|
|
|
if (!DT->dominates(BB, ExitBlocks[i]))
|
2012-12-09 11:12:46 +08:00
|
|
|
return false;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
bool MadeChange = false;
|
2015-11-12 07:00:59 +08:00
|
|
|
// Look for store instructions, which may be optimized to memset/memcpy.
|
|
|
|
collectStores(BB);
|
2016-01-05 05:43:14 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
// Look for a single store or sets of stores with a common base, which can be
|
|
|
|
// optimized into a memset (memset_pattern). The latter most commonly happens
|
|
|
|
// with structs and handunrolled loops.
|
|
|
|
for (auto &SL : StoreRefsForMemset)
|
|
|
|
MadeChange |= processLoopStores(SL.second, BECount, true);
|
|
|
|
|
|
|
|
for (auto &SL : StoreRefsForMemsetPattern)
|
|
|
|
MadeChange |= processLoopStores(SL.second, BECount, false);
|
2015-11-12 07:00:59 +08:00
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
// Optimize the store into a memcpy, if it feeds an similarly strided load.
|
|
|
|
for (auto &SI : StoreRefsForMemcpy)
|
|
|
|
MadeChange |= processLoopStoreOfLoopLoad(SI, BECount);
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
|
2015-10-14 03:26:58 +08:00
|
|
|
Instruction *Inst = &*I++;
|
2015-08-13 08:44:29 +08:00
|
|
|
// Look for memset instructions, which may be optimized to a larger memset.
|
|
|
|
if (MemSetInst *MSI = dyn_cast<MemSetInst>(Inst)) {
|
2017-04-27 00:37:05 +08:00
|
|
|
WeakVH InstPtr(&*I);
|
2015-08-13 08:44:29 +08:00
|
|
|
if (!processLoopMemSet(MSI, BECount))
|
|
|
|
continue;
|
|
|
|
MadeChange = true;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// If processing the memset invalidated our iterator, start over from the
|
|
|
|
// top of the block.
|
|
|
|
if (!InstPtr)
|
|
|
|
I = BB->begin();
|
|
|
|
continue;
|
2012-12-09 11:12:46 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
return MadeChange;
|
|
|
|
}
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
/// processLoopStores - See if this store(s) can be promoted to a memset.
|
|
|
|
bool LoopIdiomRecognize::processLoopStores(SmallVectorImpl<StoreInst *> &SL,
|
|
|
|
const SCEV *BECount,
|
|
|
|
bool ForMemset) {
|
|
|
|
// Try to find consecutive stores that can be transformed into memsets.
|
|
|
|
SetVector<StoreInst *> Heads, Tails;
|
|
|
|
SmallDenseMap<StoreInst *, StoreInst *> ConsecutiveChain;
|
|
|
|
|
|
|
|
// Do a quadratic search on all of the given stores and find
|
|
|
|
// all of the pairs of stores that follow each other.
|
|
|
|
SmallVector<unsigned, 16> IndexQueue;
|
|
|
|
for (unsigned i = 0, e = SL.size(); i < e; ++i) {
|
|
|
|
assert(SL[i]->isSimple() && "Expected only non-volatile stores.");
|
|
|
|
|
|
|
|
Value *FirstStoredVal = SL[i]->getValueOperand();
|
|
|
|
Value *FirstStorePtr = SL[i]->getPointerOperand();
|
|
|
|
const SCEVAddRecExpr *FirstStoreEv =
|
|
|
|
cast<SCEVAddRecExpr>(SE->getSCEV(FirstStorePtr));
|
2016-02-13 03:05:27 +08:00
|
|
|
APInt FirstStride = getStoreStride(FirstStoreEv);
|
2016-01-26 10:27:47 +08:00
|
|
|
unsigned FirstStoreSize = getStoreSizeInBytes(SL[i], DL);
|
|
|
|
|
|
|
|
// See if we can optimize just this store in isolation.
|
2016-02-13 03:05:27 +08:00
|
|
|
if (FirstStride == FirstStoreSize || -FirstStride == FirstStoreSize) {
|
2016-01-26 10:27:47 +08:00
|
|
|
Heads.insert(SL[i]);
|
|
|
|
continue;
|
|
|
|
}
|
2016-01-23 14:52:41 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
Value *FirstSplatValue = nullptr;
|
|
|
|
Constant *FirstPatternValue = nullptr;
|
2016-01-23 14:52:41 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
if (ForMemset)
|
|
|
|
FirstSplatValue = isBytewiseValue(FirstStoredVal);
|
|
|
|
else
|
|
|
|
FirstPatternValue = getMemSetPatternValue(FirstStoredVal, DL);
|
|
|
|
|
|
|
|
assert((FirstSplatValue || FirstPatternValue) &&
|
|
|
|
"Expected either splat value or pattern value.");
|
|
|
|
|
|
|
|
IndexQueue.clear();
|
|
|
|
// If a store has multiple consecutive store candidates, search Stores
|
|
|
|
// array according to the sequence: from i+1 to e, then from i-1 to 0.
|
|
|
|
// This is because usually pairing with immediate succeeding or preceding
|
|
|
|
// candidate create the best chance to find memset opportunity.
|
|
|
|
unsigned j = 0;
|
|
|
|
for (j = i + 1; j < e; ++j)
|
|
|
|
IndexQueue.push_back(j);
|
|
|
|
for (j = i; j > 0; --j)
|
|
|
|
IndexQueue.push_back(j - 1);
|
|
|
|
|
|
|
|
for (auto &k : IndexQueue) {
|
|
|
|
assert(SL[k]->isSimple() && "Expected only non-volatile stores.");
|
|
|
|
Value *SecondStorePtr = SL[k]->getPointerOperand();
|
|
|
|
const SCEVAddRecExpr *SecondStoreEv =
|
|
|
|
cast<SCEVAddRecExpr>(SE->getSCEV(SecondStorePtr));
|
2016-02-13 03:05:27 +08:00
|
|
|
APInt SecondStride = getStoreStride(SecondStoreEv);
|
2016-01-26 10:27:47 +08:00
|
|
|
|
|
|
|
if (FirstStride != SecondStride)
|
|
|
|
continue;
|
2016-01-23 14:52:41 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
Value *SecondStoredVal = SL[k]->getValueOperand();
|
|
|
|
Value *SecondSplatValue = nullptr;
|
|
|
|
Constant *SecondPatternValue = nullptr;
|
|
|
|
|
|
|
|
if (ForMemset)
|
|
|
|
SecondSplatValue = isBytewiseValue(SecondStoredVal);
|
|
|
|
else
|
|
|
|
SecondPatternValue = getMemSetPatternValue(SecondStoredVal, DL);
|
|
|
|
|
|
|
|
assert((SecondSplatValue || SecondPatternValue) &&
|
|
|
|
"Expected either splat value or pattern value.");
|
|
|
|
|
|
|
|
if (isConsecutiveAccess(SL[i], SL[k], *DL, *SE, false)) {
|
|
|
|
if (ForMemset) {
|
|
|
|
if (FirstSplatValue != SecondSplatValue)
|
|
|
|
continue;
|
|
|
|
} else {
|
|
|
|
if (FirstPatternValue != SecondPatternValue)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
Tails.insert(SL[k]);
|
|
|
|
Heads.insert(SL[i]);
|
|
|
|
ConsecutiveChain[SL[i]] = SL[k];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// We may run into multiple chains that merge into a single chain. We mark the
|
|
|
|
// stores that we transformed so that we don't visit the same store twice.
|
|
|
|
SmallPtrSet<Value *, 16> TransformedStores;
|
|
|
|
bool Changed = false;
|
|
|
|
|
|
|
|
// For stores that start but don't end a link in the chain:
|
|
|
|
for (SetVector<StoreInst *>::iterator it = Heads.begin(), e = Heads.end();
|
|
|
|
it != e; ++it) {
|
|
|
|
if (Tails.count(*it))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// We found a store instr that starts a chain. Now follow the chain and try
|
|
|
|
// to transform it.
|
|
|
|
SmallPtrSet<Instruction *, 8> AdjacentStores;
|
|
|
|
StoreInst *I = *it;
|
|
|
|
|
|
|
|
StoreInst *HeadStore = I;
|
|
|
|
unsigned StoreSize = 0;
|
|
|
|
|
|
|
|
// Collect the chain into a list.
|
|
|
|
while (Tails.count(I) || Heads.count(I)) {
|
|
|
|
if (TransformedStores.count(I))
|
|
|
|
break;
|
|
|
|
AdjacentStores.insert(I);
|
|
|
|
|
|
|
|
StoreSize += getStoreSizeInBytes(I, DL);
|
|
|
|
// Move to the next value in the chain.
|
|
|
|
I = ConsecutiveChain[I];
|
|
|
|
}
|
|
|
|
|
|
|
|
Value *StoredVal = HeadStore->getValueOperand();
|
|
|
|
Value *StorePtr = HeadStore->getPointerOperand();
|
|
|
|
const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
|
2016-02-13 03:05:27 +08:00
|
|
|
APInt Stride = getStoreStride(StoreEv);
|
2016-01-26 10:27:47 +08:00
|
|
|
|
|
|
|
// Check to see if the stride matches the size of the stores. If so, then
|
|
|
|
// we know that every byte is touched in the loop.
|
|
|
|
if (StoreSize != Stride && StoreSize != -Stride)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
bool NegStride = StoreSize == -Stride;
|
|
|
|
|
|
|
|
if (processLoopStridedStore(StorePtr, StoreSize, HeadStore->getAlignment(),
|
|
|
|
StoredVal, HeadStore, AdjacentStores, StoreEv,
|
|
|
|
BECount, NegStride)) {
|
|
|
|
TransformedStores.insert(AdjacentStores.begin(), AdjacentStores.end());
|
|
|
|
Changed = true;
|
|
|
|
}
|
|
|
|
}
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
return Changed;
|
2015-08-13 08:44:29 +08:00
|
|
|
}
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// processLoopMemSet - See if this memset can be promoted to a large memset.
|
|
|
|
bool LoopIdiomRecognize::processLoopMemSet(MemSetInst *MSI,
|
|
|
|
const SCEV *BECount) {
|
|
|
|
// We can only handle non-volatile memsets with a constant size.
|
|
|
|
if (MSI->isVolatile() || !isa<ConstantInt>(MSI->getLength()))
|
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// If we're not allowed to hack on memset, we fail.
|
2016-04-28 03:04:46 +08:00
|
|
|
if (!HasMemset)
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *Pointer = MSI->getDest();
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// See if the pointer expression is an AddRec like {base,+,1} on the current
|
|
|
|
// loop, which indicates a strided store. If we have something else, it's a
|
|
|
|
// random store we can't handle.
|
|
|
|
const SCEVAddRecExpr *Ev = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Pointer));
|
|
|
|
if (!Ev || Ev->getLoop() != CurLoop || !Ev->isAffine())
|
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Reject memsets that are so large that they overflow an unsigned.
|
|
|
|
uint64_t SizeInBytes = cast<ConstantInt>(MSI->getLength())->getZExtValue();
|
|
|
|
if ((SizeInBytes >> 32) != 0)
|
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Check to see if the stride matches the size of the memset. If so, then we
|
|
|
|
// know that every byte is touched in the loop.
|
2016-02-13 05:03:23 +08:00
|
|
|
const SCEVConstant *ConstStride = dyn_cast<SCEVConstant>(Ev->getOperand(1));
|
|
|
|
if (!ConstStride)
|
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2016-02-13 05:03:23 +08:00
|
|
|
APInt Stride = ConstStride->getAPInt();
|
|
|
|
if (SizeInBytes != Stride && SizeInBytes != -Stride)
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
// Verify that the memset value is loop invariant. If not, we can't promote
|
|
|
|
// the memset.
|
|
|
|
Value *SplatValue = MSI->getValue();
|
|
|
|
if (!SplatValue || !CurLoop->isLoopInvariant(SplatValue))
|
|
|
|
return false;
|
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
SmallPtrSet<Instruction *, 1> MSIs;
|
|
|
|
MSIs.insert(MSI);
|
2016-02-13 05:03:23 +08:00
|
|
|
bool NegStride = SizeInBytes == -Stride;
|
2015-08-13 08:44:29 +08:00
|
|
|
return processLoopStridedStore(Pointer, (unsigned)SizeInBytes,
|
2016-01-26 10:27:47 +08:00
|
|
|
MSI->getAlignment(), SplatValue, MSI, MSIs, Ev,
|
2016-08-12 02:28:33 +08:00
|
|
|
BECount, NegStride, /*IsLoopMemset=*/true);
|
2015-08-13 08:44:29 +08:00
|
|
|
}
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// mayLoopAccessLocation - Return true if the specified loop might access the
|
|
|
|
/// specified pointer location, which is a loop-strided access. The 'Access'
|
|
|
|
/// argument specifies what the verboten forms of access are (read or write).
|
2016-01-26 10:27:47 +08:00
|
|
|
static bool
|
|
|
|
mayLoopAccessLocation(Value *Ptr, ModRefInfo Access, Loop *L,
|
|
|
|
const SCEV *BECount, unsigned StoreSize,
|
|
|
|
AliasAnalysis &AA,
|
|
|
|
SmallPtrSetImpl<Instruction *> &IgnoredStores) {
|
2015-08-13 08:44:29 +08:00
|
|
|
// Get the location that may be stored across the loop. Since the access is
|
|
|
|
// strided positively through memory, we say that the modified location starts
|
|
|
|
// at the pointer and has infinite size.
|
|
|
|
uint64_t AccessSize = MemoryLocation::UnknownSize;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// If the loop iterates a fixed number of times, we can refine the access size
|
|
|
|
// to be exactly the size of the memset, which is (BECount+1)*StoreSize
|
|
|
|
if (const SCEVConstant *BECst = dyn_cast<SCEVConstant>(BECount))
|
|
|
|
AccessSize = (BECst->getValue()->getZExtValue() + 1) * StoreSize;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// TODO: For this to be really effective, we have to dive into the pointer
|
|
|
|
// operand in the store. Store to &A[i] of 100 will always return may alias
|
|
|
|
// with store of &A[100], we need to StoreLoc to be "A" with size of 100,
|
|
|
|
// which will then no-alias a store to &A[100].
|
|
|
|
MemoryLocation StoreLoc(Ptr, AccessSize);
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
for (Loop::block_iterator BI = L->block_begin(), E = L->block_end(); BI != E;
|
|
|
|
++BI)
|
2016-06-26 20:28:59 +08:00
|
|
|
for (Instruction &I : **BI)
|
|
|
|
if (IgnoredStores.count(&I) == 0 &&
|
|
|
|
(AA.getModRefInfo(&I, StoreLoc) & Access))
|
2015-08-13 08:44:29 +08:00
|
|
|
return true;
|
2012-12-09 11:12:46 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2012-12-09 11:12:46 +08:00
|
|
|
}
|
|
|
|
|
2015-11-14 03:11:07 +08:00
|
|
|
// If we have a negative stride, Start refers to the end of the memory location
|
|
|
|
// we're trying to memset. Therefore, we need to recompute the base pointer,
|
|
|
|
// which is just Start - BECount*Size.
|
|
|
|
static const SCEV *getStartForNegStride(const SCEV *Start, const SCEV *BECount,
|
|
|
|
Type *IntPtr, unsigned StoreSize,
|
|
|
|
ScalarEvolution *SE) {
|
|
|
|
const SCEV *Index = SE->getTruncateOrZeroExtend(BECount, IntPtr);
|
|
|
|
if (StoreSize != 1)
|
|
|
|
Index = SE->getMulExpr(Index, SE->getConstant(IntPtr, StoreSize),
|
|
|
|
SCEV::FlagNUW);
|
|
|
|
return SE->getMinusSCEV(Start, Index);
|
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// processLoopStridedStore - We see a strided store of some value. If we can
|
|
|
|
/// transform this into a memset or memset_pattern in the loop preheader, do so.
|
|
|
|
bool LoopIdiomRecognize::processLoopStridedStore(
|
|
|
|
Value *DestPtr, unsigned StoreSize, unsigned StoreAlignment,
|
2016-01-26 10:27:47 +08:00
|
|
|
Value *StoredVal, Instruction *TheStore,
|
|
|
|
SmallPtrSetImpl<Instruction *> &Stores, const SCEVAddRecExpr *Ev,
|
2016-08-12 02:28:33 +08:00
|
|
|
const SCEV *BECount, bool NegStride, bool IsLoopMemset) {
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *SplatValue = isBytewiseValue(StoredVal);
|
|
|
|
Constant *PatternValue = nullptr;
|
2015-08-13 08:10:03 +08:00
|
|
|
|
2016-01-05 05:43:14 +08:00
|
|
|
if (!SplatValue)
|
|
|
|
PatternValue = getMemSetPatternValue(StoredVal, DL);
|
|
|
|
|
|
|
|
assert((SplatValue || PatternValue) &&
|
|
|
|
"Expected either splat value or pattern value.");
|
2015-08-13 08:10:03 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// The trip count of the loop and the base pointer of the addrec SCEV is
|
|
|
|
// guaranteed to be loop invariant, which means that it should dominate the
|
|
|
|
// header. This allows us to insert code for it in the preheader.
|
2016-01-05 05:43:14 +08:00
|
|
|
unsigned DestAS = DestPtr->getType()->getPointerAddressSpace();
|
2015-08-13 08:44:29 +08:00
|
|
|
BasicBlock *Preheader = CurLoop->getLoopPreheader();
|
|
|
|
IRBuilder<> Builder(Preheader->getTerminator());
|
2015-11-07 00:33:57 +08:00
|
|
|
SCEVExpander Expander(*SE, *DL, "loop-idiom");
|
2011-01-03 03:01:03 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Type *DestInt8PtrTy = Builder.getInt8PtrTy(DestAS);
|
2015-11-07 00:33:57 +08:00
|
|
|
Type *IntPtr = Builder.getIntPtrTy(*DL, DestAS);
|
2015-10-28 22:38:49 +08:00
|
|
|
|
|
|
|
const SCEV *Start = Ev->getStart();
|
2015-11-14 03:13:40 +08:00
|
|
|
// Handle negative strided loops.
|
2015-11-14 03:11:07 +08:00
|
|
|
if (NegStride)
|
|
|
|
Start = getStartForNegStride(Start, BECount, IntPtr, StoreSize, SE);
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Okay, we have a strided store "p[i]" of a splattable value. We can turn
|
|
|
|
// this into a memset in the loop preheader now if we want. However, this
|
|
|
|
// would be unsafe to do if there is anything else in the loop that may read
|
|
|
|
// or write to the aliased location. Check for any overlap by generating the
|
|
|
|
// base pointer and checking the region.
|
2015-10-28 22:38:49 +08:00
|
|
|
Value *BasePtr =
|
|
|
|
Expander.expandCodeFor(Start, DestInt8PtrTy, Preheader->getTerminator());
|
2015-08-13 08:44:29 +08:00
|
|
|
if (mayLoopAccessLocation(BasePtr, MRI_ModRef, CurLoop, BECount, StoreSize,
|
2016-01-26 10:27:47 +08:00
|
|
|
*AA, Stores)) {
|
2015-08-13 08:44:29 +08:00
|
|
|
Expander.clear();
|
|
|
|
// If we generated new code for the base pointer, clean up.
|
|
|
|
RecursivelyDeleteTriviallyDeadInstructions(BasePtr, TLI);
|
|
|
|
return false;
|
|
|
|
}
|
2011-01-03 03:01:03 +08:00
|
|
|
|
2016-08-12 02:28:33 +08:00
|
|
|
if (avoidLIRForMultiBlockLoop(/*IsMemset=*/true, IsLoopMemset))
|
|
|
|
return false;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Okay, everything looks good, insert the memset.
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
|
|
|
|
// pointer size if it isn't already.
|
|
|
|
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtr);
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
const SCEV *NumBytesS =
|
2015-09-23 09:59:04 +08:00
|
|
|
SE->getAddExpr(BECount, SE->getOne(IntPtr), SCEV::FlagNUW);
|
2015-08-13 08:44:29 +08:00
|
|
|
if (StoreSize != 1) {
|
|
|
|
NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtr, StoreSize),
|
|
|
|
SCEV::FlagNUW);
|
2011-01-03 03:01:03 +08:00
|
|
|
}
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *NumBytes =
|
|
|
|
Expander.expandCodeFor(NumBytesS, IntPtr, Preheader->getTerminator());
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
CallInst *NewCall;
|
|
|
|
if (SplatValue) {
|
|
|
|
NewCall =
|
|
|
|
Builder.CreateMemSet(BasePtr, SplatValue, NumBytes, StoreAlignment);
|
|
|
|
} else {
|
|
|
|
// Everything is emitted in default address space
|
|
|
|
Type *Int8PtrTy = DestInt8PtrTy;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-12-15 01:24:23 +08:00
|
|
|
Module *M = TheStore->getModule();
|
2017-04-07 04:23:57 +08:00
|
|
|
Value *MSP =
|
|
|
|
M->getOrInsertFunction("memset_pattern16", Builder.getVoidTy(),
|
2017-04-11 23:01:18 +08:00
|
|
|
Int8PtrTy, Int8PtrTy, IntPtr);
|
2016-04-28 03:04:50 +08:00
|
|
|
inferLibFuncAttributes(*M->getFunction("memset_pattern16"), *TLI);
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Otherwise we should form a memset_pattern16. PatternValue is known to be
|
|
|
|
// an constant array of 16-bytes. Plop the value into a mergable global.
|
|
|
|
GlobalVariable *GV = new GlobalVariable(*M, PatternValue->getType(), true,
|
|
|
|
GlobalValue::PrivateLinkage,
|
|
|
|
PatternValue, ".memset_pattern");
|
2016-06-15 05:01:22 +08:00
|
|
|
GV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global); // Ok to merge these.
|
2015-08-13 08:44:29 +08:00
|
|
|
GV->setAlignment(16);
|
|
|
|
Value *PatternPtr = ConstantExpr::getBitCast(GV, Int8PtrTy);
|
|
|
|
NewCall = Builder.CreateCall(MSP, {BasePtr, PatternPtr, NumBytes});
|
2010-12-27 04:45:45 +08:00
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
DEBUG(dbgs() << " Formed memset: " << *NewCall << "\n"
|
|
|
|
<< " from store to: " << *Ev << " at: " << *TheStore << "\n");
|
|
|
|
NewCall->setDebugLoc(TheStore->getDebugLoc());
|
2011-01-04 15:46:33 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Okay, the memset has been formed. Zap the original store and anything that
|
|
|
|
// feeds into it.
|
2016-01-26 10:27:47 +08:00
|
|
|
for (auto *I : Stores)
|
2016-06-21 00:07:38 +08:00
|
|
|
deleteDeadInstruction(I);
|
2015-08-13 08:44:29 +08:00
|
|
|
++NumMemSet;
|
|
|
|
return true;
|
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-11-20 05:33:07 +08:00
|
|
|
/// If the stored value is a strided load in the same loop with the same stride
|
|
|
|
/// this may be transformable into a memcpy. This kicks in for stuff like
|
|
|
|
/// for (i) A[i] = B[i];
|
2016-01-05 05:43:14 +08:00
|
|
|
bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
|
|
|
|
const SCEV *BECount) {
|
|
|
|
assert(SI->isSimple() && "Expected only non-volatile stores.");
|
|
|
|
|
|
|
|
Value *StorePtr = SI->getPointerOperand();
|
|
|
|
const SCEVAddRecExpr *StoreEv = cast<SCEVAddRecExpr>(SE->getSCEV(StorePtr));
|
2016-02-13 03:05:27 +08:00
|
|
|
APInt Stride = getStoreStride(StoreEv);
|
2016-01-05 05:43:14 +08:00
|
|
|
unsigned StoreSize = getStoreSizeInBytes(SI, DL);
|
|
|
|
bool NegStride = StoreSize == -Stride;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-11-20 02:22:21 +08:00
|
|
|
// The store must be feeding a non-volatile load.
|
2016-01-05 05:43:14 +08:00
|
|
|
LoadInst *LI = cast<LoadInst>(SI->getValueOperand());
|
|
|
|
assert(LI->isSimple() && "Expected only non-volatile stores.");
|
2015-11-20 02:22:21 +08:00
|
|
|
|
|
|
|
// See if the pointer expression is an AddRec like {base,+,1} on the current
|
|
|
|
// loop, which indicates a strided load. If we have something else, it's a
|
|
|
|
// random load we can't handle.
|
2015-11-20 02:25:11 +08:00
|
|
|
const SCEVAddRecExpr *LoadEv =
|
2016-01-05 05:43:14 +08:00
|
|
|
cast<SCEVAddRecExpr>(SE->getSCEV(LI->getPointerOperand()));
|
2010-12-27 04:45:45 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// The trip count of the loop and the base pointer of the addrec SCEV is
|
|
|
|
// guaranteed to be loop invariant, which means that it should dominate the
|
|
|
|
// header. This allows us to insert code for it in the preheader.
|
|
|
|
BasicBlock *Preheader = CurLoop->getLoopPreheader();
|
|
|
|
IRBuilder<> Builder(Preheader->getTerminator());
|
2015-11-07 00:33:57 +08:00
|
|
|
SCEVExpander Expander(*SE, *DL, "loop-idiom");
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-11-14 05:51:02 +08:00
|
|
|
const SCEV *StrStart = StoreEv->getStart();
|
|
|
|
unsigned StrAS = SI->getPointerAddressSpace();
|
|
|
|
Type *IntPtrTy = Builder.getIntPtrTy(*DL, StrAS);
|
|
|
|
|
|
|
|
// Handle negative strided loops.
|
|
|
|
if (NegStride)
|
|
|
|
StrStart = getStartForNegStride(StrStart, BECount, IntPtrTy, StoreSize, SE);
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Okay, we have a strided store "p[i]" of a loaded value. We can turn
|
|
|
|
// this into a memcpy in the loop preheader now if we want. However, this
|
|
|
|
// would be unsafe to do if there is anything else in the loop that may read
|
|
|
|
// or write the memory region we're storing to. This includes the load that
|
|
|
|
// feeds the stores. Check for an alias by generating the base address and
|
|
|
|
// checking everything.
|
|
|
|
Value *StoreBasePtr = Expander.expandCodeFor(
|
2015-11-14 05:51:02 +08:00
|
|
|
StrStart, Builder.getInt8PtrTy(StrAS), Preheader->getTerminator());
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2016-01-26 10:27:47 +08:00
|
|
|
SmallPtrSet<Instruction *, 1> Stores;
|
|
|
|
Stores.insert(SI);
|
2015-08-13 08:44:29 +08:00
|
|
|
if (mayLoopAccessLocation(StoreBasePtr, MRI_ModRef, CurLoop, BECount,
|
2016-01-26 10:27:47 +08:00
|
|
|
StoreSize, *AA, Stores)) {
|
2015-08-13 08:44:29 +08:00
|
|
|
Expander.clear();
|
|
|
|
// If we generated new code for the base pointer, clean up.
|
|
|
|
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
|
2010-12-27 04:45:45 +08:00
|
|
|
return false;
|
2011-02-21 10:08:54 +08:00
|
|
|
}
|
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
2011-02-20 03:31:39 +08:00
|
|
|
|
2015-11-14 05:51:02 +08:00
|
|
|
const SCEV *LdStart = LoadEv->getStart();
|
|
|
|
unsigned LdAS = LI->getPointerAddressSpace();
|
|
|
|
|
|
|
|
// Handle negative strided loops.
|
|
|
|
if (NegStride)
|
|
|
|
LdStart = getStartForNegStride(LdStart, BECount, IntPtrTy, StoreSize, SE);
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// For a memcpy, we have to make sure that the input array is not being
|
|
|
|
// mutated by the loop.
|
|
|
|
Value *LoadBasePtr = Expander.expandCodeFor(
|
2015-11-14 05:51:02 +08:00
|
|
|
LdStart, Builder.getInt8PtrTy(LdAS), Preheader->getTerminator());
|
implement enough of the memset inference algorithm to recognize and insert
memsets. This is still missing one important validity check, but this is enough
to compile stuff like this:
void test0(std::vector<char> &X) {
for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
*I = 0;
}
void test1(std::vector<int> &X) {
for (long i = 0, e = X.size(); i != e; ++i)
X[i] = 0x01010101;
}
With:
$ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc
to:
__Z5test0RSt6vectorIcSaIcEE: ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rsi
cmpq %rsi, %rax
je LBB0_2
## BB#1: ## %bb.nph
subq %rax, %rsi
movq %rax, %rdi
callq ___bzero
LBB0_2: ## %for.end
addq $8, %rsp
ret
...
__Z5test1RSt6vectorIiSaIiEE: ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rdx
subq %rax, %rdx
cmpq $4, %rdx
jb LBB1_2
## BB#1: ## %for.body.preheader
andq $-4, %rdx
movl $1, %esi
movq %rax, %rdi
callq _memset
LBB1_2: ## %for.end
addq $8, %rsp
ret
llvm-svn: 122573
2010-12-27 07:42:51 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
if (mayLoopAccessLocation(LoadBasePtr, MRI_Mod, CurLoop, BECount, StoreSize,
|
2016-01-26 10:27:47 +08:00
|
|
|
*AA, Stores)) {
|
2015-08-13 08:44:29 +08:00
|
|
|
Expander.clear();
|
|
|
|
// If we generated new code for the base pointer, clean up.
|
|
|
|
RecursivelyDeleteTriviallyDeadInstructions(LoadBasePtr, TLI);
|
|
|
|
RecursivelyDeleteTriviallyDeadInstructions(StoreBasePtr, TLI);
|
|
|
|
return false;
|
2011-01-02 11:37:56 +08:00
|
|
|
}
|
2010-12-27 04:45:45 +08:00
|
|
|
|
2016-08-12 02:28:33 +08:00
|
|
|
if (avoidLIRForMultiBlockLoop())
|
|
|
|
return false;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Okay, everything is safe, we can transform this!
|
2010-12-27 03:39:38 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// The # stored bytes is (BECount+1)*Size. Expand the trip count out to
|
|
|
|
// pointer size if it isn't already.
|
|
|
|
BECount = SE->getTruncateOrZeroExtend(BECount, IntPtrTy);
|
2011-01-04 15:46:33 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
const SCEV *NumBytesS =
|
2015-09-23 09:59:04 +08:00
|
|
|
SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
|
2015-08-13 08:44:29 +08:00
|
|
|
if (StoreSize != 1)
|
|
|
|
NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
|
|
|
|
SCEV::FlagNUW);
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *NumBytes =
|
|
|
|
Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
CallInst *NewCall =
|
|
|
|
Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes,
|
2015-11-19 13:56:52 +08:00
|
|
|
std::min(SI->getAlignment(), LI->getAlignment()));
|
2015-08-13 08:44:29 +08:00
|
|
|
NewCall->setDebugLoc(SI->getDebugLoc());
|
2011-01-04 15:46:33 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
DEBUG(dbgs() << " Formed memcpy: " << *NewCall << "\n"
|
|
|
|
<< " from load ptr=" << *LoadEv << " at: " << *LI << "\n"
|
|
|
|
<< " from store ptr=" << *StoreEv << " at: " << *SI << "\n");
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-10-14 04:59:16 +08:00
|
|
|
// Okay, the memcpy has been formed. Zap the original store and anything that
|
2015-08-13 08:44:29 +08:00
|
|
|
// feeds into it.
|
2016-06-21 00:07:38 +08:00
|
|
|
deleteDeadInstruction(SI);
|
2015-08-13 08:44:29 +08:00
|
|
|
++NumMemCpy;
|
|
|
|
return true;
|
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2016-08-12 02:28:33 +08:00
|
|
|
// When compiling for codesize we avoid idiom recognition for a multi-block loop
|
|
|
|
// unless it is a loop_memset idiom or a memset/memcpy idiom in a nested loop.
|
|
|
|
//
|
|
|
|
bool LoopIdiomRecognize::avoidLIRForMultiBlockLoop(bool IsMemset,
|
|
|
|
bool IsLoopMemset) {
|
|
|
|
if (ApplyCodeSizeHeuristics && CurLoop->getNumBlocks() > 1) {
|
|
|
|
if (!CurLoop->getParentLoop() && (!IsMemset || !IsLoopMemset)) {
|
|
|
|
DEBUG(dbgs() << " " << CurLoop->getHeader()->getParent()->getName()
|
|
|
|
<< " : LIR " << (IsMemset ? "Memset" : "Memcpy")
|
|
|
|
<< " avoided: multi-block top-level loop\n");
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
bool LoopIdiomRecognize::runOnNoncountableLoop() {
|
2015-11-10 00:56:06 +08:00
|
|
|
return recognizePopcount();
|
2011-01-04 15:46:33 +08:00
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// Check if the given conditional branch is based on the comparison between
|
|
|
|
/// a variable and zero, and if the variable is non-zero, the control yields to
|
|
|
|
/// the loop entry. If the branch matches the behavior, the variable involved
|
2017-01-06 05:40:08 +08:00
|
|
|
/// in the comparison is returned. This function will be called to see if the
|
2015-08-13 08:44:29 +08:00
|
|
|
/// precondition and postcondition of the loop are in desirable form.
|
|
|
|
static Value *matchCondition(BranchInst *BI, BasicBlock *LoopEntry) {
|
|
|
|
if (!BI || !BI->isConditional())
|
|
|
|
return nullptr;
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
ICmpInst *Cond = dyn_cast<ICmpInst>(BI->getCondition());
|
|
|
|
if (!Cond)
|
|
|
|
return nullptr;
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
ConstantInt *CmpZero = dyn_cast<ConstantInt>(Cond->getOperand(1));
|
|
|
|
if (!CmpZero || !CmpZero->isZero())
|
|
|
|
return nullptr;
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
ICmpInst::Predicate Pred = Cond->getPredicate();
|
|
|
|
if ((Pred == ICmpInst::ICMP_NE && BI->getSuccessor(0) == LoopEntry) ||
|
|
|
|
(Pred == ICmpInst::ICMP_EQ && BI->getSuccessor(1) == LoopEntry))
|
|
|
|
return Cond->getOperand(0);
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
return nullptr;
|
2012-11-02 16:33:25 +08:00
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// Return true iff the idiom is detected in the loop.
|
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
2011-02-20 03:31:39 +08:00
|
|
|
///
|
2015-08-13 08:44:29 +08:00
|
|
|
/// Additionally:
|
|
|
|
/// 1) \p CntInst is set to the instruction counting the population bit.
|
|
|
|
/// 2) \p CntPhi is set to the corresponding phi node.
|
|
|
|
/// 3) \p Var is set to the value whose population bits are being counted.
|
|
|
|
///
|
|
|
|
/// The core idiom we are trying to detect is:
|
|
|
|
/// \code
|
|
|
|
/// if (x0 != 0)
|
|
|
|
/// goto loop-exit // the precondition of the loop
|
|
|
|
/// cnt0 = init-val;
|
|
|
|
/// do {
|
|
|
|
/// x1 = phi (x0, x2);
|
|
|
|
/// cnt1 = phi(cnt0, cnt2);
|
|
|
|
///
|
|
|
|
/// cnt2 = cnt1 + 1;
|
|
|
|
/// ...
|
|
|
|
/// x2 = x1 & (x1 - 1);
|
|
|
|
/// ...
|
|
|
|
/// } while(x != 0);
|
|
|
|
///
|
|
|
|
/// loop-exit:
|
|
|
|
/// \endcode
|
|
|
|
static bool detectPopcountIdiom(Loop *CurLoop, BasicBlock *PreCondBB,
|
|
|
|
Instruction *&CntInst, PHINode *&CntPhi,
|
|
|
|
Value *&Var) {
|
|
|
|
// step 1: Check to see if the look-back branch match this pattern:
|
|
|
|
// "if (a!=0) goto loop-entry".
|
|
|
|
BasicBlock *LoopEntry;
|
|
|
|
Instruction *DefX2, *CountInst;
|
|
|
|
Value *VarX1, *VarX0;
|
|
|
|
PHINode *PhiX, *CountPhi;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
DefX2 = CountInst = nullptr;
|
|
|
|
VarX1 = VarX0 = nullptr;
|
|
|
|
PhiX = CountPhi = nullptr;
|
|
|
|
LoopEntry = *(CurLoop->block_begin());
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 1: Check if the loop-back branch is in desirable form.
|
|
|
|
{
|
|
|
|
if (Value *T = matchCondition(
|
|
|
|
dyn_cast<BranchInst>(LoopEntry->getTerminator()), LoopEntry))
|
|
|
|
DefX2 = dyn_cast<Instruction>(T);
|
|
|
|
else
|
|
|
|
return false;
|
|
|
|
}
|
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
2011-02-20 03:31:39 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 2: detect instructions corresponding to "x2 = x1 & (x1 - 1)"
|
|
|
|
{
|
|
|
|
if (!DefX2 || DefX2->getOpcode() != Instruction::And)
|
|
|
|
return false;
|
2011-02-19 06:22:15 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
BinaryOperator *SubOneOp;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
if ((SubOneOp = dyn_cast<BinaryOperator>(DefX2->getOperand(0))))
|
|
|
|
VarX1 = DefX2->getOperand(1);
|
|
|
|
else {
|
|
|
|
VarX1 = DefX2->getOperand(0);
|
|
|
|
SubOneOp = dyn_cast<BinaryOperator>(DefX2->getOperand(1));
|
|
|
|
}
|
|
|
|
if (!SubOneOp)
|
|
|
|
return false;
|
rewrite the memset_pattern pattern generation stuff to accept any 2/4/8/16-byte
constant, including globals. This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.
This enables us to handle stores of relocatable globals. This kicks in about
48 times in 254.gap, giving us stuff like this:
@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16
...
call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind
llvm-svn: 126044
2011-02-20 03:56:44 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Instruction *SubInst = cast<Instruction>(SubOneOp);
|
|
|
|
ConstantInt *Dec = dyn_cast<ConstantInt>(SubInst->getOperand(1));
|
|
|
|
if (!Dec ||
|
|
|
|
!((SubInst->getOpcode() == Instruction::Sub && Dec->isOne()) ||
|
|
|
|
(SubInst->getOpcode() == Instruction::Add &&
|
|
|
|
Dec->isAllOnesValue()))) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
2011-02-20 03:31:39 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 3: Check the recurrence of variable X
|
|
|
|
{
|
|
|
|
PhiX = dyn_cast<PHINode>(VarX1);
|
|
|
|
if (!PhiX ||
|
|
|
|
(PhiX->getOperand(0) != DefX2 && PhiX->getOperand(1) != DefX2)) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 4: Find the instruction which count the population: cnt2 = cnt1 + 1
|
|
|
|
{
|
|
|
|
CountInst = nullptr;
|
2015-10-14 03:26:58 +08:00
|
|
|
for (BasicBlock::iterator Iter = LoopEntry->getFirstNonPHI()->getIterator(),
|
2015-08-13 08:44:29 +08:00
|
|
|
IterE = LoopEntry->end();
|
|
|
|
Iter != IterE; Iter++) {
|
2015-10-14 03:26:58 +08:00
|
|
|
Instruction *Inst = &*Iter;
|
2015-08-13 08:44:29 +08:00
|
|
|
if (Inst->getOpcode() != Instruction::Add)
|
|
|
|
continue;
|
2013-09-11 13:09:42 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
ConstantInt *Inc = dyn_cast<ConstantInt>(Inst->getOperand(1));
|
|
|
|
if (!Inc || !Inc->isOne())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
PHINode *Phi = dyn_cast<PHINode>(Inst->getOperand(0));
|
|
|
|
if (!Phi || Phi->getParent() != LoopEntry)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Check if the result of the instruction is live of the loop.
|
|
|
|
bool LiveOutLoop = false;
|
|
|
|
for (User *U : Inst->users()) {
|
|
|
|
if ((cast<Instruction>(U))->getParent() != LoopEntry) {
|
|
|
|
LiveOutLoop = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (LiveOutLoop) {
|
|
|
|
CountInst = Inst;
|
|
|
|
CountPhi = Phi;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!CountInst)
|
|
|
|
return false;
|
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
llvm-svn: 126040
2011-02-20 03:31:39 +08:00
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 5: check if the precondition is in this form:
|
|
|
|
// "if (x != 0) goto loop-head ; else goto somewhere-we-don't-care;"
|
|
|
|
{
|
|
|
|
auto *PreCondBr = dyn_cast<BranchInst>(PreCondBB->getTerminator());
|
|
|
|
Value *T = matchCondition(PreCondBr, CurLoop->getLoopPreheader());
|
|
|
|
if (T != PhiX->getOperand(0) && T != PhiX->getOperand(1))
|
|
|
|
return false;
|
2011-06-28 13:04:16 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
CntInst = CountInst;
|
|
|
|
CntPhi = CountPhi;
|
|
|
|
Var = T;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
2013-09-11 13:09:42 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
/// Recognizes a population count idiom in a non-countable loop.
|
|
|
|
///
|
|
|
|
/// If detected, transforms the relevant code to issue the popcount intrinsic
|
|
|
|
/// function call, and returns true; otherwise, returns false.
|
|
|
|
bool LoopIdiomRecognize::recognizePopcount() {
|
|
|
|
if (TTI->getPopcntSupport(32) != TargetTransformInfo::PSK_FastHardware)
|
2012-11-02 16:33:25 +08:00
|
|
|
return false;
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Counting population are usually conducted by few arithmetic instructions.
|
2015-08-19 06:41:58 +08:00
|
|
|
// Such instructions can be easily "absorbed" by vacant slots in a
|
2015-08-13 08:44:29 +08:00
|
|
|
// non-compact loop. Therefore, recognizing popcount idiom only makes sense
|
|
|
|
// in a compact loop.
|
2011-05-23 01:39:56 +08:00
|
|
|
|
2015-08-13 19:25:38 +08:00
|
|
|
// Give up if the loop has multiple blocks or multiple backedges.
|
|
|
|
if (CurLoop->getNumBackEdges() != 1 || CurLoop->getNumBlocks() != 1)
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 19:25:38 +08:00
|
|
|
BasicBlock *LoopBody = *(CurLoop->block_begin());
|
|
|
|
if (LoopBody->size() >= 20) {
|
|
|
|
// The loop is too big, bail out.
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2015-08-13 19:25:38 +08:00
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// It should have a preheader containing nothing but an unconditional branch.
|
2015-08-13 19:25:38 +08:00
|
|
|
BasicBlock *PH = CurLoop->getLoopPreheader();
|
2016-10-08 02:39:43 +08:00
|
|
|
if (!PH || &PH->front() != PH->getTerminator())
|
2015-08-13 08:44:29 +08:00
|
|
|
return false;
|
2015-08-13 19:25:38 +08:00
|
|
|
auto *EntryBI = dyn_cast<BranchInst>(PH->getTerminator());
|
2015-08-13 08:44:29 +08:00
|
|
|
if (!EntryBI || EntryBI->isConditional())
|
|
|
|
return false;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// It should have a precondition block where the generated popcount instrinsic
|
|
|
|
// function can be inserted.
|
2015-08-13 19:25:38 +08:00
|
|
|
auto *PreCondBB = PH->getSinglePredecessor();
|
2015-08-13 08:44:29 +08:00
|
|
|
if (!PreCondBB)
|
|
|
|
return false;
|
|
|
|
auto *PreCondBI = dyn_cast<BranchInst>(PreCondBB->getTerminator());
|
|
|
|
if (!PreCondBI || PreCondBI->isUnconditional())
|
|
|
|
return false;
|
2013-09-11 13:09:42 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Instruction *CntInst;
|
|
|
|
PHINode *CntPhi;
|
|
|
|
Value *Val;
|
|
|
|
if (!detectPopcountIdiom(CurLoop, PreCondBB, CntInst, CntPhi, Val))
|
|
|
|
return false;
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
transformLoopToPopcount(PreCondBB, CntInst, CntPhi, Val);
|
|
|
|
return true;
|
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
static CallInst *createPopcntIntrinsic(IRBuilder<> &IRBuilder, Value *Val,
|
2016-06-12 23:39:02 +08:00
|
|
|
const DebugLoc &DL) {
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *Ops[] = {Val};
|
|
|
|
Type *Tys[] = {Val->getType()};
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Module *M = IRBuilder.GetInsertBlock()->getParent()->getParent();
|
|
|
|
Value *Func = Intrinsic::getDeclaration(M, Intrinsic::ctpop, Tys);
|
|
|
|
CallInst *CI = IRBuilder.CreateCall(Func, Ops);
|
|
|
|
CI->setDebugLoc(DL);
|
|
|
|
|
|
|
|
return CI;
|
implement enough of the memset inference algorithm to recognize and insert
memsets. This is still missing one important validity check, but this is enough
to compile stuff like this:
void test0(std::vector<char> &X) {
for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
*I = 0;
}
void test1(std::vector<int> &X) {
for (long i = 0, e = X.size(); i != e; ++i)
X[i] = 0x01010101;
}
With:
$ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc
to:
__Z5test0RSt6vectorIcSaIcEE: ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rsi
cmpq %rsi, %rax
je LBB0_2
## BB#1: ## %bb.nph
subq %rax, %rsi
movq %rax, %rdi
callq ___bzero
LBB0_2: ## %for.end
addq $8, %rsp
ret
...
__Z5test1RSt6vectorIiSaIiEE: ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0: ## %entry
subq $8, %rsp
movq (%rdi), %rax
movq 8(%rdi), %rdx
subq %rax, %rdx
cmpq $4, %rdx
jb LBB1_2
## BB#1: ## %for.body.preheader
andq $-4, %rdx
movl $1, %esi
movq %rax, %rdi
callq _memset
LBB1_2: ## %for.end
addq $8, %rsp
ret
llvm-svn: 122573
2010-12-27 07:42:51 +08:00
|
|
|
}
|
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
void LoopIdiomRecognize::transformLoopToPopcount(BasicBlock *PreCondBB,
|
|
|
|
Instruction *CntInst,
|
|
|
|
PHINode *CntPhi, Value *Var) {
|
|
|
|
BasicBlock *PreHead = CurLoop->getLoopPreheader();
|
|
|
|
auto *PreCondBr = dyn_cast<BranchInst>(PreCondBB->getTerminator());
|
|
|
|
const DebugLoc DL = CntInst->getDebugLoc();
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Assuming before transformation, the loop is following:
|
|
|
|
// if (x) // the precondition
|
|
|
|
// do { cnt++; x &= x - 1; } while(x);
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Step 1: Insert the ctpop instruction at the end of the precondition block
|
|
|
|
IRBuilder<> Builder(PreCondBr);
|
|
|
|
Value *PopCnt, *PopCntZext, *NewCount, *TripCnt;
|
|
|
|
{
|
|
|
|
PopCnt = createPopcntIntrinsic(Builder, Var, DL);
|
|
|
|
NewCount = PopCntZext =
|
|
|
|
Builder.CreateZExtOrTrunc(PopCnt, cast<IntegerType>(CntPhi->getType()));
|
2011-06-28 13:04:16 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
if (NewCount != PopCnt)
|
|
|
|
(cast<Instruction>(NewCount))->setDebugLoc(DL);
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// TripCnt is exactly the number of iterations the loop has
|
|
|
|
TripCnt = NewCount;
|
|
|
|
|
|
|
|
// If the population counter's initial value is not zero, insert Add Inst.
|
|
|
|
Value *CntInitVal = CntPhi->getIncomingValueForBlock(PreHead);
|
|
|
|
ConstantInt *InitConst = dyn_cast<ConstantInt>(CntInitVal);
|
|
|
|
if (!InitConst || !InitConst->isZero()) {
|
|
|
|
NewCount = Builder.CreateAdd(NewCount, CntInitVal);
|
|
|
|
(cast<Instruction>(NewCount))->setDebugLoc(DL);
|
|
|
|
}
|
2012-11-02 16:33:25 +08:00
|
|
|
}
|
|
|
|
|
2015-08-19 14:22:33 +08:00
|
|
|
// Step 2: Replace the precondition from "if (x == 0) goto loop-exit" to
|
2015-08-19 14:25:30 +08:00
|
|
|
// "if (NewCount == 0) loop-exit". Without this change, the intrinsic
|
2015-08-13 08:44:29 +08:00
|
|
|
// function would be partial dead code, and downstream passes will drag
|
|
|
|
// it back from the precondition block to the preheader.
|
|
|
|
{
|
|
|
|
ICmpInst *PreCond = cast<ICmpInst>(PreCondBr->getCondition());
|
2011-05-23 01:39:56 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Value *Opnd0 = PopCntZext;
|
|
|
|
Value *Opnd1 = ConstantInt::get(PopCntZext->getType(), 0);
|
|
|
|
if (PreCond->getOperand(0) != Var)
|
|
|
|
std::swap(Opnd0, Opnd1);
|
2012-11-02 16:33:25 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
ICmpInst *NewPreCond = cast<ICmpInst>(
|
|
|
|
Builder.CreateICmp(PreCond->getPredicate(), Opnd0, Opnd1));
|
|
|
|
PreCondBr->setCondition(NewPreCond);
|
2011-06-28 13:04:16 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
RecursivelyDeleteTriviallyDeadInstructions(PreCond, TLI);
|
|
|
|
}
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Step 3: Note that the population count is exactly the trip count of the
|
2015-08-19 14:25:30 +08:00
|
|
|
// loop in question, which enable us to to convert the loop from noncountable
|
2015-08-13 08:44:29 +08:00
|
|
|
// loop into a countable one. The benefit is twofold:
|
|
|
|
//
|
2015-08-19 14:22:33 +08:00
|
|
|
// - If the loop only counts population, the entire loop becomes dead after
|
|
|
|
// the transformation. It is a lot easier to prove a countable loop dead
|
|
|
|
// than to prove a noncountable one. (In some C dialects, an infinite loop
|
2015-08-13 08:44:29 +08:00
|
|
|
// isn't dead even if it computes nothing useful. In general, DCE needs
|
|
|
|
// to prove a noncountable loop finite before safely delete it.)
|
|
|
|
//
|
|
|
|
// - If the loop also performs something else, it remains alive.
|
|
|
|
// Since it is transformed to countable form, it can be aggressively
|
|
|
|
// optimized by some optimizations which are in general not applicable
|
|
|
|
// to a noncountable loop.
|
|
|
|
//
|
|
|
|
// After this step, this loop (conceptually) would look like following:
|
|
|
|
// newcnt = __builtin_ctpop(x);
|
|
|
|
// t = newcnt;
|
|
|
|
// if (x)
|
|
|
|
// do { cnt++; x &= x-1; t--) } while (t > 0);
|
|
|
|
BasicBlock *Body = *(CurLoop->block_begin());
|
|
|
|
{
|
|
|
|
auto *LbBr = dyn_cast<BranchInst>(Body->getTerminator());
|
|
|
|
ICmpInst *LbCond = cast<ICmpInst>(LbBr->getCondition());
|
|
|
|
Type *Ty = TripCnt->getType();
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-10-14 03:26:58 +08:00
|
|
|
PHINode *TcPhi = PHINode::Create(Ty, 2, "tcphi", &Body->front());
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
Builder.SetInsertPoint(LbCond);
|
|
|
|
Instruction *TcDec = cast<Instruction>(
|
2015-08-19 14:25:30 +08:00
|
|
|
Builder.CreateSub(TcPhi, ConstantInt::get(Ty, 1),
|
|
|
|
"tcdec", false, true));
|
2011-03-15 00:48:10 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
TcPhi->addIncoming(TripCnt, PreHead);
|
|
|
|
TcPhi->addIncoming(TcDec, Body);
|
2011-06-28 13:04:16 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
CmpInst::Predicate Pred =
|
|
|
|
(LbBr->getSuccessor(0) == Body) ? CmpInst::ICMP_UGT : CmpInst::ICMP_SLE;
|
|
|
|
LbCond->setPredicate(Pred);
|
|
|
|
LbCond->setOperand(0, TcDec);
|
2015-08-19 14:22:33 +08:00
|
|
|
LbCond->setOperand(1, ConstantInt::get(Ty, 0));
|
2015-08-13 08:44:29 +08:00
|
|
|
}
|
2015-08-13 08:10:03 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// Step 4: All the references to the original population counter outside
|
|
|
|
// the loop are replaced with the NewCount -- the value returned from
|
|
|
|
// __builtin_ctpop().
|
|
|
|
CntInst->replaceUsesOutsideBlock(NewCount, Body);
|
2015-08-13 08:10:03 +08:00
|
|
|
|
2015-08-13 08:44:29 +08:00
|
|
|
// step 5: Forget the "non-computable" trip-count SCEV associated with the
|
|
|
|
// loop. The loop would otherwise not be deleted even if it becomes empty.
|
|
|
|
SE->forgetLoop(CurLoop);
|
2015-08-13 08:10:03 +08:00
|
|
|
}
|