2013-01-07 11:08:10 +08:00
|
|
|
//===- llvm/Analysis/TargetTransformInfo.cpp ------------------------------===//
|
2012-10-19 07:22:48 +08:00
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2012-10-19 07:22:48 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2013-01-07 11:08:10 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-14 05:15:01 +08:00
|
|
|
#include "llvm/Analysis/CFG.h"
|
|
|
|
#include "llvm/Analysis/LoopIterator.h"
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfoImpl.h"
|
2019-10-01 15:53:28 +08:00
|
|
|
#include "llvm/IR/CFG.h"
|
2013-01-21 09:27:39 +08:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
2020-06-06 21:06:25 +08:00
|
|
|
#include "llvm/IR/Dominators.h"
|
2013-01-21 09:27:39 +08:00
|
|
|
#include "llvm/IR/Instruction.h"
|
|
|
|
#include "llvm/IR/Instructions.h"
|
2014-01-07 19:48:04 +08:00
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
2015-02-01 18:11:22 +08:00
|
|
|
#include "llvm/IR/Module.h"
|
2014-01-07 19:48:04 +08:00
|
|
|
#include "llvm/IR/Operator.h"
|
2017-09-09 06:29:17 +08:00
|
|
|
#include "llvm/IR/PatternMatch.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-14 05:15:01 +08:00
|
|
|
#include "llvm/InitializePasses.h"
|
2017-07-07 10:00:06 +08:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
2012-10-19 07:22:48 +08:00
|
|
|
#include "llvm/Support/ErrorHandling.h"
|
2016-05-27 22:27:24 +08:00
|
|
|
#include <utility>
|
2012-10-19 07:22:48 +08:00
|
|
|
|
|
|
|
using namespace llvm;
|
2017-09-09 06:29:17 +08:00
|
|
|
using namespace PatternMatch;
|
2012-10-19 07:22:48 +08:00
|
|
|
|
2014-04-22 10:48:03 +08:00
|
|
|
#define DEBUG_TYPE "tti"
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
static cl::opt<bool> EnableReduxCost("costmodel-reduxcost", cl::init(false),
|
|
|
|
cl::Hidden,
|
|
|
|
cl::desc("Recognize reduction patterns."));
|
|
|
|
|
2015-01-31 19:17:59 +08:00
|
|
|
namespace {
|
2018-05-01 23:54:18 +08:00
|
|
|
/// No-op implementation of the TTI interface using the utility base
|
2015-01-31 19:17:59 +08:00
|
|
|
/// classes.
|
|
|
|
///
|
|
|
|
/// This is used when no target specific information is available.
|
|
|
|
struct NoTTIImpl : TargetTransformInfoImplCRTPBase<NoTTIImpl> {
|
2015-07-09 10:08:42 +08:00
|
|
|
explicit NoTTIImpl(const DataLayout &DL)
|
2015-01-31 19:17:59 +08:00
|
|
|
: TargetTransformInfoImplCRTPBase<NoTTIImpl>(DL) {}
|
|
|
|
};
|
2020-04-15 20:43:26 +08:00
|
|
|
} // namespace
|
2015-01-31 19:17:59 +08:00
|
|
|
|
2019-06-26 20:02:43 +08:00
|
|
|
bool HardwareLoopInfo::canAnalyze(LoopInfo &LI) {
|
|
|
|
// If the loop has irreducible control flow, it can not be converted to
|
|
|
|
// Hardware loop.
|
2020-02-18 10:48:38 +08:00
|
|
|
LoopBlocksRPO RPOT(L);
|
2019-06-26 20:02:43 +08:00
|
|
|
RPOT.perform(&LI);
|
|
|
|
if (containsIrreducibleCFG<const BasicBlock *>(RPOT, LI))
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2020-05-20 16:18:42 +08:00
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(const IntrinsicInst &I) :
|
|
|
|
II(&I), RetTy(I.getType()), IID(I.getIntrinsicID()) {
|
|
|
|
|
|
|
|
FunctionType *FTy = I.getCalledFunction()->getFunctionType();
|
|
|
|
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
|
|
|
|
Arguments.insert(Arguments.begin(), I.arg_begin(), I.arg_end());
|
|
|
|
if (auto *FPMO = dyn_cast<FPMathOperator>(&I))
|
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
}
|
|
|
|
|
2020-05-26 16:23:18 +08:00
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
|
|
|
|
const CallBase &CI) :
|
|
|
|
II(dyn_cast<IntrinsicInst>(&CI)), RetTy(CI.getType()), IID(Id) {
|
|
|
|
|
2020-06-23 21:07:44 +08:00
|
|
|
if (const auto *FPMO = dyn_cast<FPMathOperator>(&CI))
|
2020-05-26 16:23:18 +08:00
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
|
2020-09-29 03:23:36 +08:00
|
|
|
Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
|
2020-05-26 16:23:18 +08:00
|
|
|
FunctionType *FTy =
|
|
|
|
CI.getCalledFunction()->getFunctionType();
|
|
|
|
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
|
|
|
|
const CallBase &CI,
|
2020-11-16 18:14:28 +08:00
|
|
|
ElementCount Factor)
|
|
|
|
: RetTy(CI.getType()), IID(Id), VF(Factor) {
|
2020-05-20 16:18:42 +08:00
|
|
|
|
2020-11-16 18:14:28 +08:00
|
|
|
assert(!Factor.isScalable() && "Scalable vectors are not yet supported");
|
2020-05-20 16:18:42 +08:00
|
|
|
if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))
|
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
|
|
|
|
Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
|
|
|
|
FunctionType *FTy =
|
|
|
|
CI.getCalledFunction()->getFunctionType();
|
|
|
|
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
|
|
|
|
}
|
|
|
|
|
2020-05-26 16:23:18 +08:00
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
|
|
|
|
const CallBase &CI,
|
2020-11-16 18:14:28 +08:00
|
|
|
ElementCount Factor,
|
|
|
|
unsigned ScalarCost)
|
|
|
|
: RetTy(CI.getType()), IID(Id), VF(Factor), ScalarizationCost(ScalarCost) {
|
2020-05-20 16:18:42 +08:00
|
|
|
|
2020-06-23 21:07:44 +08:00
|
|
|
if (const auto *FPMO = dyn_cast<FPMathOperator>(&CI))
|
2020-05-20 16:18:42 +08:00
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
|
|
|
|
Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
|
|
|
|
FunctionType *FTy =
|
|
|
|
CI.getCalledFunction()->getFunctionType();
|
|
|
|
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
|
|
|
|
ArrayRef<Type *> Tys,
|
|
|
|
FastMathFlags Flags) :
|
|
|
|
RetTy(RTy), IID(Id), FMF(Flags) {
|
|
|
|
ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
|
|
|
|
ArrayRef<Type *> Tys,
|
|
|
|
FastMathFlags Flags,
|
|
|
|
unsigned ScalarCost) :
|
|
|
|
RetTy(RTy), IID(Id), FMF(Flags), ScalarizationCost(ScalarCost) {
|
|
|
|
ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
|
|
|
|
ArrayRef<Type *> Tys,
|
|
|
|
FastMathFlags Flags,
|
|
|
|
unsigned ScalarCost,
|
|
|
|
const IntrinsicInst *I) :
|
|
|
|
II(I), RetTy(RTy), IID(Id), FMF(Flags), ScalarizationCost(ScalarCost) {
|
|
|
|
ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
|
|
|
|
ArrayRef<Type *> Tys) :
|
|
|
|
RetTy(RTy), IID(Id) {
|
|
|
|
ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
|
|
|
|
}
|
|
|
|
|
|
|
|
IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *Ty,
|
2020-06-23 21:07:44 +08:00
|
|
|
ArrayRef<const Value *> Args)
|
|
|
|
: RetTy(Ty), IID(Id) {
|
2020-05-20 16:18:42 +08:00
|
|
|
|
|
|
|
Arguments.insert(Arguments.begin(), Args.begin(), Args.end());
|
|
|
|
ParamTys.reserve(Arguments.size());
|
|
|
|
for (unsigned Idx = 0, Size = Arguments.size(); Idx != Size; ++Idx)
|
|
|
|
ParamTys.push_back(Arguments[Idx]->getType());
|
|
|
|
}
|
|
|
|
|
2019-06-19 09:26:31 +08:00
|
|
|
bool HardwareLoopInfo::isHardwareLoopCandidate(ScalarEvolution &SE,
|
|
|
|
LoopInfo &LI, DominatorTree &DT,
|
|
|
|
bool ForceNestedLoop,
|
2019-07-10 01:53:09 +08:00
|
|
|
bool ForceHardwareLoopPHI) {
|
2019-06-19 09:26:31 +08:00
|
|
|
SmallVector<BasicBlock *, 4> ExitingBlocks;
|
|
|
|
L->getExitingBlocks(ExitingBlocks);
|
|
|
|
|
2019-10-01 15:53:28 +08:00
|
|
|
for (BasicBlock *BB : ExitingBlocks) {
|
2019-06-19 09:26:31 +08:00
|
|
|
// If we pass the updated counter back through a phi, we need to know
|
|
|
|
// which latch the updated value will be coming from.
|
|
|
|
if (!L->isLoopLatch(BB)) {
|
|
|
|
if (ForceHardwareLoopPHI || CounterInReg)
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
const SCEV *EC = SE.getExitCount(L, BB);
|
|
|
|
if (isa<SCEVCouldNotCompute>(EC))
|
|
|
|
continue;
|
|
|
|
if (const SCEVConstant *ConstEC = dyn_cast<SCEVConstant>(EC)) {
|
|
|
|
if (ConstEC->getValue()->isZero())
|
|
|
|
continue;
|
|
|
|
} else if (!SE.isLoopInvariant(EC, L))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (SE.getTypeSizeInBits(EC->getType()) > CountType->getBitWidth())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// If this exiting block is contained in a nested loop, it is not eligible
|
|
|
|
// for insertion of the branch-and-decrement since the inner loop would
|
|
|
|
// end up messing up the value in the CTR.
|
|
|
|
if (!IsNestingLegal && LI.getLoopFor(BB) != L && !ForceNestedLoop)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// We now have a loop-invariant count of loop iterations (which is not the
|
|
|
|
// constant zero) for which we know that this loop will not exit via this
|
|
|
|
// existing block.
|
|
|
|
|
|
|
|
// We need to make sure that this block will run on every loop iteration.
|
|
|
|
// For this to be true, we must dominate all blocks with backedges. Such
|
|
|
|
// blocks are in-loop predecessors to the header block.
|
|
|
|
bool NotAlways = false;
|
2019-10-01 15:53:28 +08:00
|
|
|
for (BasicBlock *Pred : predecessors(L->getHeader())) {
|
|
|
|
if (!L->contains(Pred))
|
2019-06-19 09:26:31 +08:00
|
|
|
continue;
|
|
|
|
|
2019-10-01 15:53:28 +08:00
|
|
|
if (!DT.dominates(BB, Pred)) {
|
2019-06-19 09:26:31 +08:00
|
|
|
NotAlways = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (NotAlways)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Make sure this blocks ends with a conditional branch.
|
|
|
|
Instruction *TI = BB->getTerminator();
|
|
|
|
if (!TI)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
|
|
|
|
if (!BI->isConditional())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ExitBranch = BI;
|
|
|
|
} else
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Note that this block may not be the loop latch block, even if the loop
|
|
|
|
// has a latch block.
|
2019-10-01 15:53:28 +08:00
|
|
|
ExitBlock = BB;
|
2020-11-25 00:54:34 +08:00
|
|
|
TripCount = SE.getAddExpr(EC, SE.getOne(EC->getType()));
|
|
|
|
|
|
|
|
if (!EC->getType()->isPointerTy() && EC->getType() != CountType)
|
|
|
|
TripCount = SE.getZeroExtendExpr(TripCount, CountType);
|
|
|
|
|
2019-06-19 09:26:31 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!ExitBlock)
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-07-09 10:08:42 +08:00
|
|
|
TargetTransformInfo::TargetTransformInfo(const DataLayout &DL)
|
2015-01-31 19:17:59 +08:00
|
|
|
: TTIImpl(new Model<NoTTIImpl>(NoTTIImpl(DL))) {}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::~TargetTransformInfo() {}
|
2012-10-19 07:22:48 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::TargetTransformInfo(TargetTransformInfo &&Arg)
|
|
|
|
: TTIImpl(std::move(Arg.TTIImpl)) {}
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo &TargetTransformInfo::operator=(TargetTransformInfo &&RHS) {
|
|
|
|
TTIImpl = std::move(RHS.TTIImpl);
|
|
|
|
return *this;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2016-04-15 09:38:48 +08:00
|
|
|
unsigned TargetTransformInfo::getInliningThresholdMultiplier() const {
|
|
|
|
return TTIImpl->getInliningThresholdMultiplier();
|
|
|
|
}
|
|
|
|
|
[AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
2019-07-18 00:51:29 +08:00
|
|
|
int TargetTransformInfo::getInlinerVectorBonusPercent() const {
|
|
|
|
return TTIImpl->getInlinerVectorBonusPercent();
|
|
|
|
}
|
|
|
|
|
2016-07-09 05:48:05 +08:00
|
|
|
int TargetTransformInfo::getGEPCost(Type *PointeeType, const Value *Ptr,
|
2020-04-28 21:11:27 +08:00
|
|
|
ArrayRef<const Value *> Operands,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
return TTIImpl->getGEPCost(PointeeType, Ptr, Operands, CostKind);
|
2016-07-09 05:48:05 +08:00
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(
|
2019-10-30 02:30:30 +08:00
|
|
|
const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,
|
|
|
|
BlockFrequencyInfo *BFI) const {
|
|
|
|
return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
|
[InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets
```
lor.lhs.false2: ; preds = %if.then
switch i32 %Status, label %if.then27 [
i32 -7012, label %if.end35
i32 -10008, label %if.end35
i32 -10016, label %if.end35
i32 15000, label %if.end35
i32 14013, label %if.end35
i32 10114, label %if.end35
i32 10107, label %if.end35
i32 10105, label %if.end35
i32 10013, label %if.end35
i32 10011, label %if.end35
i32 7008, label %if.end35
i32 7007, label %if.end35
i32 5002, label %if.end35
]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
```
.LBB853_9: // %lor.lhs.false2
mov w8, #10012
cmp w19, w8
b.gt .LBB853_14
// BB#10: // %lor.lhs.false2
mov w8, #5001
cmp w19, w8
b.gt .LBB853_18
// BB#11: // %lor.lhs.false2
mov w8, #-10016
cmp w19, w8
b.eq .LBB853_23
// BB#12: // %lor.lhs.false2
mov w8, #-10008
cmp w19, w8
b.eq .LBB853_23
// BB#13: // %lor.lhs.false2
mov w8, #-7012
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_14: // %lor.lhs.false2
mov w8, #14012
cmp w19, w8
b.gt .LBB853_21
// BB#15: // %lor.lhs.false2
mov w8, #-10105
add w8, w19, w8
cmp w8, #9 // =9
b.hi .LBB853_17
// BB#16: // %lor.lhs.false2
orr w9, wzr, #0x1
lsl w8, w9, w8
mov w9, #517
and w8, w8, w9
cbnz w8, .LBB853_23
.LBB853_17: // %lor.lhs.false2
mov w8, #10013
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_18: // %lor.lhs.false2
mov w8, #-7007
add w8, w19, w8
cmp w8, #2 // =2
b.lo .LBB853_23
// BB#19: // %lor.lhs.false2
mov w8, #5002
cmp w19, w8
b.eq .LBB853_23
// BB#20: // %lor.lhs.false2
mov w8, #10011
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_21: // %lor.lhs.false2
mov w8, #14013
cmp w19, w8
b.eq .LBB853_23
// BB#22: // %lor.lhs.false2
mov w8, #15000
cmp w19, w8
b.ne .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
-inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
2017-04-29 00:04:03 +08:00
|
|
|
}
|
|
|
|
|
2017-06-29 21:42:12 +08:00
|
|
|
int TargetTransformInfo::getUserCost(const User *U,
|
2020-04-27 16:02:14 +08:00
|
|
|
ArrayRef<const Value *> Operands,
|
|
|
|
enum TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getUserCost(U, Operands, CostKind);
|
2020-05-26 19:17:26 +08:00
|
|
|
assert((CostKind == TTI::TCK_RecipThroughput || Cost >= 0) &&
|
|
|
|
"TTI should not produce negative costs!");
|
2015-08-06 02:08:10 +08:00
|
|
|
return Cost;
|
2013-01-21 09:27:39 +08:00
|
|
|
}
|
|
|
|
|
2013-07-27 08:01:07 +08:00
|
|
|
bool TargetTransformInfo::hasBranchDivergence() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->hasBranchDivergence();
|
2013-07-27 08:01:07 +08:00
|
|
|
}
|
|
|
|
|
Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary:
Enable the new diveregence analysis by default for AMDGPU.
Resubmit with test updates since GPUDA was causing failures on Windows.
Reviewers: rampitec, nhaehnle, arsenm, thakis
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73315
2020-01-20 23:25:20 +08:00
|
|
|
bool TargetTransformInfo::useGPUDivergenceAnalysis() const {
|
|
|
|
return TTIImpl->useGPUDivergenceAnalysis();
|
|
|
|
}
|
|
|
|
|
Divergence analysis for GPU programs
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
llvm-svn: 234567
2015-04-10 13:03:50 +08:00
|
|
|
bool TargetTransformInfo::isSourceOfDivergence(const Value *V) const {
|
|
|
|
return TTIImpl->isSourceOfDivergence(V);
|
|
|
|
}
|
|
|
|
|
2017-06-16 03:33:10 +08:00
|
|
|
bool llvm::TargetTransformInfo::isAlwaysUniform(const Value *V) const {
|
|
|
|
return TTIImpl->isAlwaysUniform(V);
|
|
|
|
}
|
|
|
|
|
2017-01-31 07:02:12 +08:00
|
|
|
unsigned TargetTransformInfo::getFlatAddressSpace() const {
|
|
|
|
return TTIImpl->getFlatAddressSpace();
|
|
|
|
}
|
|
|
|
|
2019-08-15 02:13:00 +08:00
|
|
|
bool TargetTransformInfo::collectFlatAddressOperands(
|
2020-04-15 20:43:26 +08:00
|
|
|
SmallVectorImpl<int> &OpIndexes, Intrinsic::ID IID) const {
|
2019-08-15 02:13:00 +08:00
|
|
|
return TTIImpl->collectFlatAddressOperands(OpIndexes, IID);
|
|
|
|
}
|
|
|
|
|
2020-06-10 03:07:08 +08:00
|
|
|
bool TargetTransformInfo::isNoopAddrSpaceCast(unsigned FromAS,
|
|
|
|
unsigned ToAS) const {
|
|
|
|
return TTIImpl->isNoopAddrSpaceCast(FromAS, ToAS);
|
|
|
|
}
|
|
|
|
|
2020-11-07 19:47:57 +08:00
|
|
|
unsigned TargetTransformInfo::getAssumedAddrSpace(const Value *V) const {
|
|
|
|
return TTIImpl->getAssumedAddrSpace(V);
|
|
|
|
}
|
|
|
|
|
2020-05-16 02:54:51 +08:00
|
|
|
Value *TargetTransformInfo::rewriteIntrinsicWithAddressSpace(
|
|
|
|
IntrinsicInst *II, Value *OldV, Value *NewV) const {
|
2019-08-15 02:13:00 +08:00
|
|
|
return TTIImpl->rewriteIntrinsicWithAddressSpace(II, OldV, NewV);
|
|
|
|
}
|
|
|
|
|
2013-01-22 19:26:02 +08:00
|
|
|
bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLoweredToCall(F);
|
2013-01-22 19:26:02 +08:00
|
|
|
}
|
|
|
|
|
2019-06-07 15:35:30 +08:00
|
|
|
bool TargetTransformInfo::isHardwareLoopProfitable(
|
2020-04-15 20:43:26 +08:00
|
|
|
Loop *L, ScalarEvolution &SE, AssumptionCache &AC,
|
|
|
|
TargetLibraryInfo *LibInfo, HardwareLoopInfo &HWLoopInfo) const {
|
2019-06-07 15:35:30 +08:00
|
|
|
return TTIImpl->isHardwareLoopProfitable(L, SE, AC, LibInfo, HWLoopInfo);
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
bool TargetTransformInfo::preferPredicateOverEpilogue(
|
|
|
|
Loop *L, LoopInfo *LI, ScalarEvolution &SE, AssumptionCache &AC,
|
|
|
|
TargetLibraryInfo *TLI, DominatorTree *DT,
|
|
|
|
const LoopAccessInfo *LAI) const {
|
2019-11-06 17:58:36 +08:00
|
|
|
return TTIImpl->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI);
|
|
|
|
}
|
|
|
|
|
2020-06-10 00:19:57 +08:00
|
|
|
bool TargetTransformInfo::emitGetActiveLaneMask() const {
|
|
|
|
return TTIImpl->emitGetActiveLaneMask();
|
2020-05-29 16:05:41 +08:00
|
|
|
}
|
|
|
|
|
2020-06-03 21:56:40 +08:00
|
|
|
Optional<Instruction *>
|
|
|
|
TargetTransformInfo::instCombineIntrinsic(InstCombiner &IC,
|
|
|
|
IntrinsicInst &II) const {
|
|
|
|
return TTIImpl->instCombineIntrinsic(IC, II);
|
|
|
|
}
|
|
|
|
|
|
|
|
Optional<Value *> TargetTransformInfo::simplifyDemandedUseBitsIntrinsic(
|
|
|
|
InstCombiner &IC, IntrinsicInst &II, APInt DemandedMask, KnownBits &Known,
|
|
|
|
bool &KnownBitsComputed) const {
|
|
|
|
return TTIImpl->simplifyDemandedUseBitsIntrinsic(IC, II, DemandedMask, Known,
|
|
|
|
KnownBitsComputed);
|
|
|
|
}
|
|
|
|
|
|
|
|
Optional<Value *> TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic(
|
|
|
|
InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
|
|
|
|
APInt &UndefElts2, APInt &UndefElts3,
|
|
|
|
std::function<void(Instruction *, unsigned, APInt, APInt &)>
|
|
|
|
SimplifyAndSetOp) const {
|
|
|
|
return TTIImpl->simplifyDemandedVectorEltsIntrinsic(
|
|
|
|
IC, II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
|
|
|
|
SimplifyAndSetOp);
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
void TargetTransformInfo::getUnrollingPreferences(
|
[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34531
llvm-svn: 306554
2017-06-28 23:53:17 +08:00
|
|
|
Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {
|
|
|
|
return TTIImpl->getUnrollingPreferences(L, SE, UP);
|
2013-09-12 03:25:43 +08:00
|
|
|
}
|
|
|
|
|
[NFC] Separate Peeling Properties into its own struct (re-land after minor fix)
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.
Author: sidbav (Sidharth Baveja)
Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)
Reviewed By: Meinersbur (Michael Kruse)
Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80580
2020-07-11 02:38:08 +08:00
|
|
|
void TargetTransformInfo::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
|
|
|
|
PeelingPreferences &PP) const {
|
|
|
|
return TTIImpl->getPeelingPreferences(L, SE, PP);
|
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalAddImmediate(Imm);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalICmpImmediate(Imm);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
bool TargetTransformInfo::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
2020-04-15 20:43:26 +08:00
|
|
|
bool HasBaseReg, int64_t Scale,
|
2017-07-21 19:59:37 +08:00
|
|
|
unsigned AddrSpace,
|
|
|
|
Instruction *I) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
|
2017-07-21 19:59:37 +08:00
|
|
|
Scale, AddrSpace, I);
|
2014-12-04 17:40:44 +08:00
|
|
|
}
|
|
|
|
|
2017-06-06 07:37:00 +08:00
|
|
|
bool TargetTransformInfo::isLSRCostLess(LSRCost &C1, LSRCost &C2) const {
|
|
|
|
return TTIImpl->isLSRCostLess(C1, C2);
|
|
|
|
}
|
|
|
|
|
2020-10-27 10:29:22 +08:00
|
|
|
bool TargetTransformInfo::isNumRegsMajorCostOfLSR() const {
|
|
|
|
return TTIImpl->isNumRegsMajorCostOfLSR();
|
2020-10-21 11:25:27 +08:00
|
|
|
}
|
|
|
|
|
2020-05-05 21:25:23 +08:00
|
|
|
bool TargetTransformInfo::isProfitableLSRChainElement(Instruction *I) const {
|
|
|
|
return TTIImpl->isProfitableLSRChainElement(I);
|
|
|
|
}
|
|
|
|
|
2018-02-06 07:43:05 +08:00
|
|
|
bool TargetTransformInfo::canMacroFuseCmp() const {
|
|
|
|
return TTIImpl->canMacroFuseCmp();
|
|
|
|
}
|
|
|
|
|
2019-07-03 09:49:03 +08:00
|
|
|
bool TargetTransformInfo::canSaveCmp(Loop *L, BranchInst **BI,
|
|
|
|
ScalarEvolution *SE, LoopInfo *LI,
|
|
|
|
DominatorTree *DT, AssumptionCache *AC,
|
|
|
|
TargetLibraryInfo *LibInfo) const {
|
|
|
|
return TTIImpl->canSaveCmp(L, BI, SE, LI, DT, AC, LibInfo);
|
|
|
|
}
|
|
|
|
|
2018-03-26 21:10:09 +08:00
|
|
|
bool TargetTransformInfo::shouldFavorPostInc() const {
|
|
|
|
return TTIImpl->shouldFavorPostInc();
|
|
|
|
}
|
|
|
|
|
2019-02-07 21:32:54 +08:00
|
|
|
bool TargetTransformInfo::shouldFavorBackedgeIndex(const Loop *L) const {
|
|
|
|
return TTIImpl->shouldFavorBackedgeIndex(L);
|
|
|
|
}
|
|
|
|
|
2019-10-14 18:00:21 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedStore(Type *DataType,
|
2020-05-19 10:16:06 +08:00
|
|
|
Align Alignment) const {
|
2019-10-14 18:00:21 +08:00
|
|
|
return TTIImpl->isLegalMaskedStore(DataType, Alignment);
|
2014-12-04 17:40:44 +08:00
|
|
|
}
|
|
|
|
|
2019-10-14 18:00:21 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,
|
2020-05-19 10:16:06 +08:00
|
|
|
Align Alignment) const {
|
2019-10-14 18:00:21 +08:00
|
|
|
return TTIImpl->isLegalMaskedLoad(DataType, Alignment);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 01:20:08 +08:00
|
|
|
bool TargetTransformInfo::isLegalNTStore(Type *DataType,
|
2019-09-27 20:54:21 +08:00
|
|
|
Align Alignment) const {
|
2019-06-18 01:20:08 +08:00
|
|
|
return TTIImpl->isLegalNTStore(DataType, Alignment);
|
|
|
|
}
|
|
|
|
|
2019-09-27 20:54:21 +08:00
|
|
|
bool TargetTransformInfo::isLegalNTLoad(Type *DataType, Align Alignment) const {
|
2019-06-18 01:20:08 +08:00
|
|
|
return TTIImpl->isLegalNTLoad(DataType, Alignment);
|
|
|
|
}
|
|
|
|
|
2019-12-18 16:42:53 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedGather(Type *DataType,
|
2020-05-19 10:16:06 +08:00
|
|
|
Align Alignment) const {
|
2019-12-18 16:42:53 +08:00
|
|
|
return TTIImpl->isLegalMaskedGather(DataType, Alignment);
|
2015-10-25 23:37:55 +08:00
|
|
|
}
|
|
|
|
|
2019-12-18 16:42:53 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType,
|
2020-05-19 10:16:06 +08:00
|
|
|
Align Alignment) const {
|
2019-12-18 16:42:53 +08:00
|
|
|
return TTIImpl->isLegalMaskedScatter(DataType, Alignment);
|
2015-10-25 23:37:55 +08:00
|
|
|
}
|
|
|
|
|
2019-03-22 01:38:52 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedCompressStore(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedCompressStore(DataType);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalMaskedExpandLoad(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedExpandLoad(DataType);
|
|
|
|
}
|
|
|
|
|
2017-09-09 21:38:18 +08:00
|
|
|
bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
|
|
|
|
return TTIImpl->hasDivRemOp(DataType, IsSigned);
|
|
|
|
}
|
|
|
|
|
2017-10-25 04:31:44 +08:00
|
|
|
bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
|
|
|
|
unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->hasVolatileVariant(I, AddrSpace);
|
|
|
|
}
|
|
|
|
|
2017-05-24 21:42:56 +08:00
|
|
|
bool TargetTransformInfo::prefersVectorizedAddressing() const {
|
|
|
|
return TTIImpl->prefersVectorizedAddressing();
|
|
|
|
}
|
|
|
|
|
2013-06-01 05:29:03 +08:00
|
|
|
int TargetTransformInfo::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
2020-04-15 20:43:26 +08:00
|
|
|
bool HasBaseReg, int64_t Scale,
|
2015-06-08 04:12:03 +08:00
|
|
|
unsigned AddrSpace) const {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
|
|
|
|
Scale, AddrSpace);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-06-01 05:29:03 +08:00
|
|
|
}
|
|
|
|
|
2017-07-21 19:59:37 +08:00
|
|
|
bool TargetTransformInfo::LSRWithInstrQueries() const {
|
|
|
|
return TTIImpl->LSRWithInstrQueries();
|
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isTruncateFree(Type *Ty1, Type *Ty2) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isTruncateFree(Ty1, Ty2);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-02-24 03:15:16 +08:00
|
|
|
bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
|
|
|
|
return TTIImpl->isProfitableToHoist(I);
|
|
|
|
}
|
|
|
|
|
2018-03-29 06:28:50 +08:00
|
|
|
bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isTypeLegal(Ty);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2020-11-12 20:33:36 +08:00
|
|
|
unsigned TargetTransformInfo::getRegUsageForType(Type *Ty) const {
|
|
|
|
return TTIImpl->getRegUsageForType(Ty);
|
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::shouldBuildLookupTables() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->shouldBuildLookupTables();
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
2020-04-15 20:43:26 +08:00
|
|
|
bool TargetTransformInfo::shouldBuildLookupTablesForConstant(
|
|
|
|
Constant *C) const {
|
2016-10-07 16:48:24 +08:00
|
|
|
return TTIImpl->shouldBuildLookupTablesForConstant(C);
|
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2018-01-31 00:17:22 +08:00
|
|
|
bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
|
|
|
|
return TTIImpl->useColdCCForColdCall(F);
|
|
|
|
}
|
|
|
|
|
2020-05-05 23:57:55 +08:00
|
|
|
unsigned
|
|
|
|
TargetTransformInfo::getScalarizationOverhead(VectorType *Ty,
|
|
|
|
const APInt &DemandedElts,
|
|
|
|
bool Insert, bool Extract) const {
|
2020-04-29 18:39:13 +08:00
|
|
|
return TTIImpl->getScalarizationOverhead(Ty, DemandedElts, Insert, Extract);
|
2017-01-26 15:03:25 +08:00
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
unsigned TargetTransformInfo::getOperandsScalarizationOverhead(
|
|
|
|
ArrayRef<const Value *> Args, unsigned VF) const {
|
2017-01-26 15:03:25 +08:00
|
|
|
return TTIImpl->getOperandsScalarizationOverhead(Args, VF);
|
|
|
|
}
|
|
|
|
|
2017-04-12 20:41:37 +08:00
|
|
|
bool TargetTransformInfo::supportsEfficientVectorElementLoadStore() const {
|
|
|
|
return TTIImpl->supportsEfficientVectorElementLoadStore();
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
bool TargetTransformInfo::enableAggressiveInterleaving(
|
|
|
|
bool LoopHasReductions) const {
|
2015-03-07 07:12:04 +08:00
|
|
|
return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);
|
|
|
|
}
|
|
|
|
|
2019-06-25 16:04:13 +08:00
|
|
|
TargetTransformInfo::MemCmpExpansionOptions
|
|
|
|
TargetTransformInfo::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
|
|
|
|
return TTIImpl->enableMemCmpExpansion(OptSize, IsZeroCmp);
|
2017-06-01 01:12:38 +08:00
|
|
|
}
|
|
|
|
|
2015-08-10 22:50:54 +08:00
|
|
|
bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2018-10-14 16:50:06 +08:00
|
|
|
bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableMaskedInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2016-04-15 04:42:18 +08:00
|
|
|
bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {
|
|
|
|
return TTIImpl->isFPVectorizationPotentiallyUnsafe();
|
|
|
|
}
|
|
|
|
|
2016-08-05 00:38:44 +08:00
|
|
|
bool TargetTransformInfo::allowsMisalignedMemoryAccesses(LLVMContext &Context,
|
|
|
|
unsigned BitWidth,
|
2016-07-12 04:46:17 +08:00
|
|
|
unsigned AddressSpace,
|
|
|
|
unsigned Alignment,
|
|
|
|
bool *Fast) const {
|
2020-04-15 20:43:26 +08:00
|
|
|
return TTIImpl->allowsMisalignedMemoryAccesses(Context, BitWidth,
|
|
|
|
AddressSpace, Alignment, Fast);
|
2016-07-12 04:46:17 +08:00
|
|
|
}
|
|
|
|
|
2013-01-07 11:16:03 +08:00
|
|
|
TargetTransformInfo::PopcntSupportKind
|
|
|
|
TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getPopcntSupport(IntTyWidthInBit);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2013-08-23 18:27:02 +08:00
|
|
|
bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->haveFastSqrt(Ty);
|
2013-08-23 18:27:02 +08:00
|
|
|
}
|
|
|
|
|
2017-11-28 05:15:43 +08:00
|
|
|
bool TargetTransformInfo::isFCmpOrdCheaperThanFCmpZero(Type *Ty) const {
|
|
|
|
return TTIImpl->isFCmpOrdCheaperThanFCmpZero(Ty);
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getFPOpCost(Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getFPOpCost(Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-02-05 10:09:33 +08:00
|
|
|
}
|
|
|
|
|
2016-07-14 15:44:20 +08:00
|
|
|
int TargetTransformInfo::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm,
|
|
|
|
Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCodeSizeCost(Opcode, Idx, Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2020-04-28 21:11:27 +08:00
|
|
|
int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCost(Imm, Ty, CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2020-09-22 19:54:10 +08:00
|
|
|
int TargetTransformInfo::getIntImmCostInst(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty,
|
|
|
|
TTI::TargetCostKind CostKind,
|
|
|
|
Instruction *Inst) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCostInst(Opcode, Idx, Imm, Ty, CostKind, Inst);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
2020-04-28 21:11:27 +08:00
|
|
|
int
|
|
|
|
TargetTransformInfo::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCostIntrin(IID, Idx, Imm, Ty, CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 10:53:04 +08:00
|
|
|
unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
|
|
|
|
return TTIImpl->getNumberOfRegisters(ClassID);
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
unsigned TargetTransformInfo::getRegisterClassForType(bool Vector,
|
|
|
|
Type *Ty) const {
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 10:53:04 +08:00
|
|
|
return TTIImpl->getRegisterClassForType(Vector, Ty);
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
const char *TargetTransformInfo::getRegisterClassName(unsigned ClassID) const {
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 10:53:04 +08:00
|
|
|
return TTIImpl->getRegisterClassName(ClassID);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2013-01-10 06:29:00 +08:00
|
|
|
unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getRegisterBitWidth(Vector);
|
2013-01-10 06:29:00 +08:00
|
|
|
}
|
|
|
|
|
2017-05-16 05:15:01 +08:00
|
|
|
unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
|
|
|
|
return TTIImpl->getMinVectorRegisterBitWidth();
|
|
|
|
}
|
|
|
|
|
2020-12-18 00:15:28 +08:00
|
|
|
Optional<unsigned> TargetTransformInfo::getMaxVScale() const {
|
|
|
|
return TTIImpl->getMaxVScale();
|
|
|
|
}
|
|
|
|
|
2018-03-28 00:14:11 +08:00
|
|
|
bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {
|
|
|
|
return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);
|
|
|
|
}
|
|
|
|
|
2018-04-14 04:16:32 +08:00
|
|
|
unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {
|
|
|
|
return TTIImpl->getMinimumVF(ElemWidth);
|
|
|
|
}
|
|
|
|
|
2020-11-25 02:42:43 +08:00
|
|
|
unsigned TargetTransformInfo::getMaximumVF(unsigned ElemWidth,
|
|
|
|
unsigned Opcode) const {
|
|
|
|
return TTIImpl->getMaximumVF(ElemWidth, Opcode);
|
|
|
|
}
|
|
|
|
|
2017-04-04 03:20:07 +08:00
|
|
|
bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
|
|
|
|
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
|
|
|
|
return TTIImpl->shouldConsiderAddressTypePromotion(
|
|
|
|
I, AllowPromotionWithoutCommonHeader);
|
|
|
|
}
|
|
|
|
|
2016-01-22 02:28:36 +08:00
|
|
|
unsigned TargetTransformInfo::getCacheLineSize() const {
|
|
|
|
return TTIImpl->getCacheLineSize();
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
llvm::Optional<unsigned>
|
|
|
|
TargetTransformInfo::getCacheSize(CacheLevel Level) const {
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
return TTIImpl->getCacheSize(Level);
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
llvm::Optional<unsigned>
|
|
|
|
TargetTransformInfo::getCacheAssociativity(CacheLevel Level) const {
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
return TTIImpl->getCacheAssociativity(Level);
|
|
|
|
}
|
|
|
|
|
2016-01-28 06:21:25 +08:00
|
|
|
unsigned TargetTransformInfo::getPrefetchDistance() const {
|
|
|
|
return TTIImpl->getPrefetchDistance();
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
unsigned TargetTransformInfo::getMinPrefetchStride(
|
|
|
|
unsigned NumMemAccesses, unsigned NumStridedMemAccesses,
|
|
|
|
unsigned NumPrefetches, bool HasCall) const {
|
2019-10-31 23:05:58 +08:00
|
|
|
return TTIImpl->getMinPrefetchStride(NumMemAccesses, NumStridedMemAccesses,
|
|
|
|
NumPrefetches, HasCall);
|
2016-03-18 08:27:38 +08:00
|
|
|
}
|
|
|
|
|
2016-03-18 08:27:43 +08:00
|
|
|
unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {
|
|
|
|
return TTIImpl->getMaxPrefetchIterationsAhead();
|
|
|
|
}
|
|
|
|
|
2019-10-31 23:05:58 +08:00
|
|
|
bool TargetTransformInfo::enableWritePrefetching() const {
|
|
|
|
return TTIImpl->enableWritePrefetching();
|
|
|
|
}
|
|
|
|
|
2015-05-07 01:12:25 +08:00
|
|
|
unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {
|
|
|
|
return TTIImpl->getMaxInterleaveFactor(VF);
|
2013-01-09 09:15:42 +08:00
|
|
|
}
|
|
|
|
|
2018-10-05 22:34:04 +08:00
|
|
|
TargetTransformInfo::OperandValueKind
|
2020-06-23 21:07:44 +08:00
|
|
|
TargetTransformInfo::getOperandInfo(const Value *V,
|
|
|
|
OperandValueProperties &OpProps) {
|
2018-10-05 22:34:04 +08:00
|
|
|
OperandValueKind OpInfo = OK_AnyValue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
|
2020-06-23 21:07:44 +08:00
|
|
|
if (const auto *CI = dyn_cast<ConstantInt>(V)) {
|
2018-10-05 22:34:04 +08:00
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
return OK_UniformConstantValue;
|
|
|
|
}
|
|
|
|
|
2018-11-14 23:04:08 +08:00
|
|
|
// A broadcast shuffle creates a uniform value.
|
|
|
|
// TODO: Add support for non-zero index broadcasts.
|
|
|
|
// TODO: Add support for different source vector width.
|
2020-06-23 21:07:44 +08:00
|
|
|
if (const auto *ShuffleInst = dyn_cast<ShuffleVectorInst>(V))
|
2018-11-14 23:04:08 +08:00
|
|
|
if (ShuffleInst->isZeroEltSplat())
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
2018-10-05 22:34:04 +08:00
|
|
|
const Value *Splat = getSplatValue(V);
|
|
|
|
|
|
|
|
// Check for a splat of a constant or for a non uniform vector of constants
|
|
|
|
// and check if the constant(s) are all powers of two.
|
|
|
|
if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V)) {
|
|
|
|
OpInfo = OK_NonUniformConstantValue;
|
|
|
|
if (Splat) {
|
|
|
|
OpInfo = OK_UniformConstantValue;
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(Splat))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
2020-06-23 21:07:44 +08:00
|
|
|
} else if (const auto *CDS = dyn_cast<ConstantDataSequential>(V)) {
|
2018-10-05 22:34:04 +08:00
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(CDS->getElementAsConstant(I)))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
continue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Check for a splat of a uniform value. This is not loop aware, so return
|
|
|
|
// true only for the obviously uniform cases (argument, globalvalue)
|
|
|
|
if (Splat && (isa<Argument>(Splat) || isa<GlobalValue>(Splat)))
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
|
|
|
return OpInfo;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getArithmeticInstrCost(
|
2020-04-28 21:11:27 +08:00
|
|
|
unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
|
|
|
|
OperandValueKind Opd1Info,
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
|
[ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
2019-12-08 23:33:24 +08:00
|
|
|
OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args,
|
|
|
|
const Instruction *CxtI) const {
|
|
|
|
int Cost = TTIImpl->getArithmeticInstrCost(
|
2020-04-28 21:11:27 +08:00
|
|
|
Opcode, Ty, CostKind, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo,
|
|
|
|
Args, CxtI);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2020-04-17 20:29:31 +08:00
|
|
|
int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, VectorType *Ty,
|
|
|
|
int Index, VectorType *SubTp) const {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
[Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 20:32:53 +08:00
|
|
|
TTI::CastContextHint
|
|
|
|
TargetTransformInfo::getCastContextHint(const Instruction *I) {
|
|
|
|
if (!I)
|
|
|
|
return CastContextHint::None;
|
|
|
|
|
|
|
|
auto getLoadStoreKind = [](const Value *V, unsigned LdStOp, unsigned MaskedOp,
|
|
|
|
unsigned GatScatOp) {
|
|
|
|
const Instruction *I = dyn_cast<Instruction>(V);
|
|
|
|
if (!I)
|
|
|
|
return CastContextHint::None;
|
|
|
|
|
|
|
|
if (I->getOpcode() == LdStOp)
|
|
|
|
return CastContextHint::Normal;
|
|
|
|
|
|
|
|
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
if (II->getIntrinsicID() == MaskedOp)
|
|
|
|
return TTI::CastContextHint::Masked;
|
|
|
|
if (II->getIntrinsicID() == GatScatOp)
|
|
|
|
return TTI::CastContextHint::GatherScatter;
|
|
|
|
}
|
|
|
|
|
|
|
|
return TTI::CastContextHint::None;
|
|
|
|
};
|
|
|
|
|
|
|
|
switch (I->getOpcode()) {
|
|
|
|
case Instruction::ZExt:
|
|
|
|
case Instruction::SExt:
|
|
|
|
case Instruction::FPExt:
|
|
|
|
return getLoadStoreKind(I->getOperand(0), Instruction::Load,
|
|
|
|
Intrinsic::masked_load, Intrinsic::masked_gather);
|
|
|
|
case Instruction::Trunc:
|
|
|
|
case Instruction::FPTrunc:
|
|
|
|
if (I->hasOneUse())
|
|
|
|
return getLoadStoreKind(*I->user_begin(), Instruction::Store,
|
|
|
|
Intrinsic::masked_store,
|
|
|
|
Intrinsic::masked_scatter);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return CastContextHint::None;
|
|
|
|
}
|
|
|
|
|
|
|
|
return TTI::CastContextHint::None;
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
int TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
|
[Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 20:32:53 +08:00
|
|
|
CastContextHint CCH,
|
2020-04-28 21:11:27 +08:00
|
|
|
TTI::TargetCostKind CostKind,
|
2020-04-15 20:43:26 +08:00
|
|
|
const Instruction *I) const {
|
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
[Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 20:32:53 +08:00
|
|
|
int Cost = TTIImpl->getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2016-04-27 23:20:21 +08:00
|
|
|
int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
|
|
|
|
VectorType *VecTy,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2020-04-28 21:11:27 +08:00
|
|
|
int TargetTransformInfo::getCFInstrCost(unsigned Opcode,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getCFInstrCost(Opcode, CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
|
2020-04-15 20:43:26 +08:00
|
|
|
Type *CondTy,
|
2020-11-02 20:40:34 +08:00
|
|
|
CmpInst::Predicate VecPred,
|
2020-04-28 21:11:27 +08:00
|
|
|
TTI::TargetCostKind CostKind,
|
2020-04-15 20:43:26 +08:00
|
|
|
const Instruction *I) const {
|
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
2020-11-02 20:40:34 +08:00
|
|
|
int Cost =
|
|
|
|
TTIImpl->getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getVectorInstrCost(unsigned Opcode, Type *Val,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getMemoryOpCost(unsigned Opcode, Type *Src,
|
2020-05-19 10:16:06 +08:00
|
|
|
Align Alignment, unsigned AddressSpace,
|
2020-04-28 21:11:27 +08:00
|
|
|
TTI::TargetCostKind CostKind,
|
2017-04-12 19:49:08 +08:00
|
|
|
const Instruction *I) const {
|
2020-04-15 20:43:26 +08:00
|
|
|
assert((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
2020-04-28 21:11:27 +08:00
|
|
|
int Cost = TTIImpl->getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
|
|
|
|
CostKind, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2020-06-26 18:14:16 +08:00
|
|
|
int TargetTransformInfo::getMaskedMemoryOpCost(
|
|
|
|
unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost =
|
2020-04-28 21:11:27 +08:00
|
|
|
TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
|
|
|
|
CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-01-25 16:44:46 +08:00
|
|
|
}
|
|
|
|
|
2020-06-26 19:08:27 +08:00
|
|
|
int TargetTransformInfo::getGatherScatterOpCost(
|
|
|
|
unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
|
|
|
|
Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) const {
|
2015-12-29 04:10:59 +08:00
|
|
|
int Cost = TTIImpl->getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
|
2020-04-28 21:11:27 +08:00
|
|
|
Alignment, CostKind, I);
|
2015-12-29 04:10:59 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getInterleavedMemoryOpCost(
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 14:39:56 +08:00
|
|
|
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
|
2020-06-26 19:00:53 +08:00
|
|
|
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
|
2020-04-28 21:11:27 +08:00
|
|
|
bool UseMaskForCond, bool UseMaskForGaps) const {
|
2020-04-15 20:43:26 +08:00
|
|
|
int Cost = TTIImpl->getInterleavedMemoryOpCost(
|
2020-04-28 21:11:27 +08:00
|
|
|
Opcode, VecTy, Factor, Indices, Alignment, AddressSpace, CostKind,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 14:39:56 +08:00
|
|
|
}
|
|
|
|
|
2020-05-20 16:18:42 +08:00
|
|
|
int
|
|
|
|
TargetTransformInfo::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getIntrinsicInstrCost(ICA, CostKind);
|
2015-12-29 04:10:59 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCallInstrCost(Function *F, Type *RetTy,
|
2020-04-28 21:11:27 +08:00
|
|
|
ArrayRef<Type *> Tys,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getCallInstrCost(F, RetTy, Tys, CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-03-18 03:26:23 +08:00
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
unsigned TargetTransformInfo::getNumberOfParts(Type *Tp) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getNumberOfParts(Tp);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getAddressComputationCost(Type *Tp,
|
2017-01-05 22:03:41 +08:00
|
|
|
ScalarEvolution *SE,
|
|
|
|
const SCEV *Ptr) const {
|
|
|
|
int Cost = TTIImpl->getAddressComputationCost(Tp, SE, Ptr);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-02-08 22:50:48 +08:00
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2019-04-30 18:28:50 +08:00
|
|
|
int TargetTransformInfo::getMemcpyCost(const Instruction *I) const {
|
|
|
|
int Cost = TTIImpl->getMemcpyCost(I);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2020-04-17 20:29:31 +08:00
|
|
|
int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode,
|
|
|
|
VectorType *Ty,
|
2020-04-28 21:11:27 +08:00
|
|
|
bool IsPairwiseForm,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm,
|
|
|
|
CostKind);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
}
|
|
|
|
|
2020-04-28 21:11:27 +08:00
|
|
|
int TargetTransformInfo::getMinMaxReductionCost(
|
|
|
|
VectorType *Ty, VectorType *CondTy, bool IsPairwiseForm, bool IsUnsigned,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
2017-09-08 21:49:36 +08:00
|
|
|
int Cost =
|
2020-04-28 21:11:27 +08:00
|
|
|
TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm, IsUnsigned,
|
|
|
|
CostKind);
|
2017-09-08 21:49:36 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
[LV][ARM] Inloop reduction cost modelling
This adds cost modelling for the inloop vectorization added in
745bf6cf4471. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:
%sa = sext <16 x i8> A to <16 x i32>
%sb = sext <16 x i8> B to <16 x i32>
%m = mul <16 x i32> %sa, %sb
%r = vecreduce.add(%m)
->
R = VMLADAV A, B
There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.
Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.
This patch attempt to improve the costmodelling of in-loop reductions
by:
- Adding some pattern matching in the loop vectorizer cost model to
match extended reduction patterns that are optionally extended and/or
MLA patterns. This marks the cost of the reduction instruction correctly
and the sext/zext/mul leading up to it as free, which is otherwise
difficult to tell and may get a very high cost. (In the long run this
can hopefully be replaced by vplan producing a single node and costing
it correctly, but that is not yet something that vplan can do).
- getExtendedAddReductionCost is added to query the cost of these
extended reduction patterns.
- Expanded the ARM costs to account for these expanded sizes, which is a
fairly simple change in itself.
- Some minor alterations to allow inloop reduction larger than the highest
vector width and i64 MVE reductions.
- An extra InLoopReductionImmediateChains map was added to the vectorizer
for it to efficiently detect which instructions are reductions in the
cost model.
- The tests have some updates to show what I believe is optimal
vectorization and where we are now.
Put together this can greatly improve performance for reduction loop
under MVE.
Differential Revision: https://reviews.llvm.org/D93476
2021-01-22 05:03:41 +08:00
|
|
|
InstructionCost TargetTransformInfo::getExtendedAddReductionCost(
|
|
|
|
bool IsMLA, bool IsUnsigned, Type *ResTy, VectorType *Ty,
|
|
|
|
TTI::TargetCostKind CostKind) const {
|
|
|
|
return TTIImpl->getExtendedAddReductionCost(IsMLA, IsUnsigned, ResTy, Ty,
|
|
|
|
CostKind);
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
unsigned
|
|
|
|
TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
|
|
|
|
return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
|
Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
2013-09-18 02:06:50 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,
|
|
|
|
MemIntrinsicInfo &Info) const {
|
|
|
|
return TTIImpl->getTgtMemIntrinsic(Inst, Info);
|
2014-08-05 20:30:34 +08:00
|
|
|
}
|
|
|
|
|
2017-06-07 00:45:25 +08:00
|
|
|
unsigned TargetTransformInfo::getAtomicMemIntrinsicMaxElementSize() const {
|
|
|
|
return TTIImpl->getAtomicMemIntrinsicMaxElementSize();
|
|
|
|
}
|
|
|
|
|
2015-01-27 06:51:15 +08:00
|
|
|
Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(
|
|
|
|
IntrinsicInst *Inst, Type *ExpectedType) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);
|
2015-01-27 06:51:15 +08:00
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
Type *TargetTransformInfo::getMemcpyLoopLoweringType(
|
|
|
|
LLVMContext &Context, Value *Length, unsigned SrcAddrSpace,
|
|
|
|
unsigned DestAddrSpace, unsigned SrcAlign, unsigned DestAlign) const {
|
2020-02-15 02:22:53 +08:00
|
|
|
return TTIImpl->getMemcpyLoopLoweringType(Context, Length, SrcAddrSpace,
|
2020-04-15 20:43:26 +08:00
|
|
|
DestAddrSpace, SrcAlign, DestAlign);
|
2017-07-07 10:00:06 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void TargetTransformInfo::getMemcpyLoopResidualLoweringType(
|
|
|
|
SmallVectorImpl<Type *> &OpsOut, LLVMContext &Context,
|
2020-04-15 20:43:26 +08:00
|
|
|
unsigned RemainingBytes, unsigned SrcAddrSpace, unsigned DestAddrSpace,
|
2020-02-15 02:22:53 +08:00
|
|
|
unsigned SrcAlign, unsigned DestAlign) const {
|
2017-07-07 10:00:06 +08:00
|
|
|
TTIImpl->getMemcpyLoopResidualLoweringType(OpsOut, Context, RemainingBytes,
|
2020-02-15 02:22:53 +08:00
|
|
|
SrcAddrSpace, DestAddrSpace,
|
2017-07-07 10:00:06 +08:00
|
|
|
SrcAlign, DestAlign);
|
|
|
|
}
|
|
|
|
|
2015-07-30 06:09:48 +08:00
|
|
|
bool TargetTransformInfo::areInlineCompatible(const Function *Caller,
|
|
|
|
const Function *Callee) const {
|
|
|
|
return TTIImpl->areInlineCompatible(Caller, Callee);
|
2018-03-26 21:10:09 +08:00
|
|
|
}
|
|
|
|
|
2019-01-16 13:15:31 +08:00
|
|
|
bool TargetTransformInfo::areFunctionArgsABICompatible(
|
|
|
|
const Function *Caller, const Function *Callee,
|
|
|
|
SmallPtrSetImpl<Argument *> &Args) const {
|
|
|
|
return TTIImpl->areFunctionArgsABICompatible(Caller, Callee, Args);
|
|
|
|
}
|
|
|
|
|
2018-03-26 21:10:09 +08:00
|
|
|
bool TargetTransformInfo::isIndexedLoadLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedLoadLegal(Mode, Ty);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isIndexedStoreLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedStoreLegal(Mode, Ty);
|
2015-07-02 09:11:47 +08:00
|
|
|
}
|
|
|
|
|
2016-10-03 18:31:34 +08:00
|
|
|
unsigned TargetTransformInfo::getLoadStoreVecRegBitWidth(unsigned AS) const {
|
|
|
|
return TTIImpl->getLoadStoreVecRegBitWidth(AS);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoad(LoadInst *LI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeLoad(LI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStore(StoreInst *SI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeStore(SI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoadChain(
|
2020-06-26 22:14:27 +08:00
|
|
|
unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const {
|
2016-10-03 18:31:34 +08:00
|
|
|
return TTIImpl->isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStoreChain(
|
2020-06-26 22:14:27 +08:00
|
|
|
unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const {
|
2016-10-03 18:31:34 +08:00
|
|
|
return TTIImpl->isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getLoadVectorFactor(unsigned VF,
|
|
|
|
unsigned LoadSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
|
|
|
|
unsigned StoreSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
2020-04-15 20:43:26 +08:00
|
|
|
bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode, Type *Ty,
|
|
|
|
ReductionFlags Flags) const {
|
2017-05-09 18:43:25 +08:00
|
|
|
return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
|
|
|
|
}
|
|
|
|
|
2020-09-13 00:47:04 +08:00
|
|
|
bool TargetTransformInfo::preferInLoopReduction(unsigned Opcode, Type *Ty,
|
|
|
|
ReductionFlags Flags) const {
|
|
|
|
return TTIImpl->preferInLoopReduction(Opcode, Ty, Flags);
|
|
|
|
}
|
|
|
|
|
2020-08-21 15:48:12 +08:00
|
|
|
bool TargetTransformInfo::preferPredicatedReductionSelect(
|
|
|
|
unsigned Opcode, Type *Ty, ReductionFlags Flags) const {
|
|
|
|
return TTIImpl->preferPredicatedReductionSelect(Opcode, Ty, Flags);
|
|
|
|
}
|
|
|
|
|
2017-05-10 17:42:49 +08:00
|
|
|
bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
|
|
|
|
return TTIImpl->shouldExpandReduction(II);
|
|
|
|
}
|
2017-05-09 18:43:25 +08:00
|
|
|
|
2019-06-18 07:20:29 +08:00
|
|
|
unsigned TargetTransformInfo::getGISelRematGlobalCost() const {
|
|
|
|
return TTIImpl->getGISelRematGlobalCost();
|
|
|
|
}
|
|
|
|
|
2020-12-09 01:40:13 +08:00
|
|
|
bool TargetTransformInfo::supportsScalableVectors() const {
|
|
|
|
return TTIImpl->supportsScalableVectors();
|
|
|
|
}
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
int TargetTransformInfo::getInstructionLatency(const Instruction *I) const {
|
|
|
|
return TTIImpl->getInstructionLatency(I);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool matchPairwiseShuffleMask(ShuffleVectorInst *SI, bool IsLeft,
|
|
|
|
unsigned Level) {
|
|
|
|
// We don't need a shuffle if we just want to have element 0 in position 0 of
|
|
|
|
// the vector.
|
|
|
|
if (!SI && Level == 0 && IsLeft)
|
|
|
|
return true;
|
|
|
|
else if (!SI)
|
|
|
|
return false;
|
|
|
|
|
2020-07-23 05:36:48 +08:00
|
|
|
SmallVector<int, 32> Mask(
|
|
|
|
cast<FixedVectorType>(SI->getType())->getNumElements(), -1);
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Build a mask of 0, 2, ... (left) or 1, 3, ... (right) depending on whether
|
|
|
|
// we look at the left or right side.
|
|
|
|
for (unsigned i = 0, e = (1 << Level), val = !IsLeft; i != e; ++i, val += 2)
|
|
|
|
Mask[i] = val;
|
|
|
|
|
2020-04-01 04:08:59 +08:00
|
|
|
ArrayRef<int> ActualMask = SI->getShuffleMask();
|
2017-09-09 06:29:17 +08:00
|
|
|
return Mask == ActualMask;
|
|
|
|
}
|
|
|
|
|
2020-06-15 15:27:14 +08:00
|
|
|
static Optional<TTI::ReductionData> getReductionData(Instruction *I) {
|
2017-09-09 06:29:17 +08:00
|
|
|
Value *L, *R;
|
|
|
|
if (m_BinOp(m_Value(L), m_Value(R)).match(I))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::ReductionData(TTI::RK_Arithmetic, I->getOpcode(), L, R);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (auto *SI = dyn_cast<SelectInst>(I)) {
|
|
|
|
if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_SMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::ReductionData(TTI::RK_MinMax, CI->getOpcode(), L, R);
|
2018-07-31 03:41:25 +08:00
|
|
|
}
|
2017-09-09 06:29:17 +08:00
|
|
|
if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::ReductionData(TTI::RK_UnsignedMinMax, CI->getOpcode(), L, R);
|
2017-09-09 06:29:17 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return llvm::None;
|
|
|
|
}
|
|
|
|
|
2020-06-15 15:27:14 +08:00
|
|
|
static TTI::ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
|
|
|
|
unsigned Level,
|
|
|
|
unsigned NumLevels) {
|
2017-09-09 06:29:17 +08:00
|
|
|
// Match one level of pairwise operations.
|
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
if (!I)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
assert(I->getType()->isVectorTy() && "Expecting a vector type");
|
|
|
|
|
2020-06-15 15:27:14 +08:00
|
|
|
Optional<TTI::ReductionData> RD = getReductionData(I);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!RD)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
|
|
|
|
if (!LS && Level)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
|
|
|
|
if (!RS && Level)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// On level 0 we can omit one shufflevector instruction.
|
|
|
|
if (!Level && !RS && !LS)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Shuffle inputs must match.
|
|
|
|
Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOpR = RS ? RS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOp = nullptr;
|
|
|
|
if (NextLevelOpR && NextLevelOpL) {
|
|
|
|
// If we have two shuffles their operands must match.
|
|
|
|
if (NextLevelOpL != NextLevelOpR)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL;
|
|
|
|
} else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
|
|
|
|
// On the first level we can omit the shufflevector <0, undef,...>. So the
|
|
|
|
// input to the other shufflevector <1, undef> must match with one of the
|
|
|
|
// inputs to the current binary operation.
|
|
|
|
// Example:
|
|
|
|
// %NextLevelOpL = shufflevector %R, <1, undef ...>
|
|
|
|
// %BinOp = fadd %NextLevelOpL, %R
|
|
|
|
if (NextLevelOpL && NextLevelOpL != RD->RHS)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
else if (NextLevelOpR && NextLevelOpR != RD->LHS)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
|
|
|
|
} else
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Check that the next levels binary operation exists and matches with the
|
|
|
|
// current one.
|
|
|
|
if (Level + 1 != NumLevels) {
|
2020-07-02 19:13:23 +08:00
|
|
|
if (!isa<Instruction>(NextLevelOp))
|
|
|
|
return TTI::RK_None;
|
2020-06-15 15:27:14 +08:00
|
|
|
Optional<TTI::ReductionData> NextLevelRD =
|
2017-09-09 06:29:17 +08:00
|
|
|
getReductionData(cast<Instruction>(NextLevelOp));
|
|
|
|
if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// Shuffle mask for pairwise operation must match.
|
|
|
|
if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
} else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
} else {
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (++Level == NumLevels)
|
|
|
|
return RD->Kind;
|
|
|
|
|
|
|
|
// Match next level.
|
2020-07-02 19:13:23 +08:00
|
|
|
return matchPairwiseReductionAtLevel(dyn_cast<Instruction>(NextLevelOp), Level,
|
2017-09-09 06:29:17 +08:00
|
|
|
NumLevels);
|
|
|
|
}
|
|
|
|
|
2020-06-15 15:27:14 +08:00
|
|
|
TTI::ReductionKind TTI::matchPairwiseReduction(
|
|
|
|
const ExtractElementInst *ReduxRoot, unsigned &Opcode, VectorType *&Ty) {
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!EnableReduxCost)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
|
|
|
Optional<TTI::ReductionData> RD = getReductionData(RdxStart);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!RD)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
2020-07-23 05:36:48 +08:00
|
|
|
auto *VecTy = cast<FixedVectorType>(RdxStart->getType());
|
2020-04-10 03:19:23 +08:00
|
|
|
unsigned NumVecElems = VecTy->getNumElements();
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// We look for a sequence of shuffle,shuffle,add triples like the following
|
|
|
|
// that builds a pairwise reduction tree.
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// (X0, X1, X2, X3)
|
|
|
|
// (X0 + X1, X2 + X3, undef, undef)
|
|
|
|
// ((X0 + X1) + (X2 + X3), undef, undef, undef)
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
// %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
|
2020-06-15 15:27:14 +08:00
|
|
|
TTI::RK_None)
|
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
|
|
|
static std::pair<Value *, ShuffleVectorInst *>
|
|
|
|
getShuffleAndOtherOprd(Value *L, Value *R) {
|
|
|
|
ShuffleVectorInst *S = nullptr;
|
|
|
|
|
|
|
|
if ((S = dyn_cast<ShuffleVectorInst>(L)))
|
|
|
|
return std::make_pair(R, S);
|
|
|
|
|
|
|
|
S = dyn_cast<ShuffleVectorInst>(R);
|
|
|
|
return std::make_pair(L, S);
|
|
|
|
}
|
|
|
|
|
2020-06-15 15:27:14 +08:00
|
|
|
TTI::ReductionKind TTI::matchVectorSplittingReduction(
|
|
|
|
const ExtractElementInst *ReduxRoot, unsigned &Opcode, VectorType *&Ty) {
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!EnableReduxCost)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
|
|
|
Optional<TTI::ReductionData> RD = getReductionData(RdxStart);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!RD)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
2020-07-23 05:36:48 +08:00
|
|
|
auto *VecTy = cast<FixedVectorType>(ReduxRoot->getOperand(0)->getType());
|
2020-04-10 03:19:23 +08:00
|
|
|
unsigned NumVecElems = VecTy->getNumElements();
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// We look for a sequence of shuffles and adds like the following matching one
|
|
|
|
// fadd, shuffle vector pair at a time.
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
|
|
|
|
// %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
|
|
|
|
unsigned MaskStart = 1;
|
|
|
|
Instruction *RdxOp = RdxStart;
|
2018-07-31 03:41:25 +08:00
|
|
|
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
|
2017-09-09 06:29:17 +08:00
|
|
|
unsigned NumVecElemsRemain = NumVecElems;
|
|
|
|
while (NumVecElemsRemain - 1) {
|
|
|
|
// Check for the right reduction operation.
|
|
|
|
if (!RdxOp)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
|
|
|
Optional<TTI::ReductionData> RDLevel = getReductionData(RdxOp);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (!RDLevel || !RDLevel->hasSameData(*RD))
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
Value *NextRdxOp;
|
|
|
|
ShuffleVectorInst *Shuffle;
|
|
|
|
std::tie(NextRdxOp, Shuffle) =
|
|
|
|
getShuffleAndOtherOprd(RDLevel->LHS, RDLevel->RHS);
|
|
|
|
|
|
|
|
// Check the current reduction operation and the shuffle use the same value.
|
|
|
|
if (Shuffle == nullptr)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
if (Shuffle->getOperand(0) != NextRdxOp)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
// Check that shuffle masks matches.
|
|
|
|
for (unsigned j = 0; j != MaskStart; ++j)
|
|
|
|
ShuffleMask[j] = MaskStart + j;
|
|
|
|
// Fill the rest of the mask with -1 for undef.
|
|
|
|
std::fill(&ShuffleMask[MaskStart], ShuffleMask.end(), -1);
|
|
|
|
|
2020-04-01 04:08:59 +08:00
|
|
|
ArrayRef<int> Mask = Shuffle->getShuffleMask();
|
2017-09-09 06:29:17 +08:00
|
|
|
if (ShuffleMask != Mask)
|
2020-06-15 15:27:14 +08:00
|
|
|
return TTI::RK_None;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
|
|
|
RdxOp = dyn_cast<Instruction>(NextRdxOp);
|
|
|
|
NumVecElemsRemain /= 2;
|
|
|
|
MaskStart *= 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
2020-09-24 01:46:26 +08:00
|
|
|
TTI::ReductionKind
|
|
|
|
TTI::matchVectorReduction(const ExtractElementInst *Root, unsigned &Opcode,
|
|
|
|
VectorType *&Ty, bool &IsPairwise) {
|
|
|
|
TTI::ReductionKind RdxKind = matchVectorSplittingReduction(Root, Opcode, Ty);
|
|
|
|
if (RdxKind != TTI::ReductionKind::RK_None) {
|
|
|
|
IsPairwise = false;
|
|
|
|
return RdxKind;
|
|
|
|
}
|
|
|
|
IsPairwise = true;
|
|
|
|
return matchPairwiseReduction(Root, Opcode, Ty);
|
|
|
|
}
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
|
2020-04-28 21:11:27 +08:00
|
|
|
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
switch (I->getOpcode()) {
|
|
|
|
case Instruction::GetElementPtr:
|
|
|
|
case Instruction::Ret:
|
|
|
|
case Instruction::PHI:
|
2020-06-16 15:33:28 +08:00
|
|
|
case Instruction::Br:
|
2017-09-09 06:29:17 +08:00
|
|
|
case Instruction::Add:
|
|
|
|
case Instruction::FAdd:
|
|
|
|
case Instruction::Sub:
|
|
|
|
case Instruction::FSub:
|
|
|
|
case Instruction::Mul:
|
|
|
|
case Instruction::FMul:
|
|
|
|
case Instruction::UDiv:
|
|
|
|
case Instruction::SDiv:
|
|
|
|
case Instruction::FDiv:
|
|
|
|
case Instruction::URem:
|
|
|
|
case Instruction::SRem:
|
|
|
|
case Instruction::FRem:
|
|
|
|
case Instruction::Shl:
|
|
|
|
case Instruction::LShr:
|
|
|
|
case Instruction::AShr:
|
|
|
|
case Instruction::And:
|
|
|
|
case Instruction::Or:
|
2020-06-05 15:42:03 +08:00
|
|
|
case Instruction::Xor:
|
2020-06-11 16:59:12 +08:00
|
|
|
case Instruction::FNeg:
|
2020-05-26 21:28:34 +08:00
|
|
|
case Instruction::Select:
|
2017-09-09 06:29:17 +08:00
|
|
|
case Instruction::ICmp:
|
2020-05-26 21:28:34 +08:00
|
|
|
case Instruction::FCmp:
|
2020-06-05 17:09:56 +08:00
|
|
|
case Instruction::Store:
|
|
|
|
case Instruction::Load:
|
2017-09-09 06:29:17 +08:00
|
|
|
case Instruction::ZExt:
|
|
|
|
case Instruction::SExt:
|
|
|
|
case Instruction::FPToUI:
|
|
|
|
case Instruction::FPToSI:
|
|
|
|
case Instruction::FPExt:
|
|
|
|
case Instruction::PtrToInt:
|
|
|
|
case Instruction::IntToPtr:
|
|
|
|
case Instruction::SIToFP:
|
|
|
|
case Instruction::UIToFP:
|
|
|
|
case Instruction::Trunc:
|
|
|
|
case Instruction::FPTrunc:
|
|
|
|
case Instruction::BitCast:
|
2020-05-26 18:27:57 +08:00
|
|
|
case Instruction::AddrSpaceCast:
|
2020-06-15 15:27:14 +08:00
|
|
|
case Instruction::ExtractElement:
|
2020-06-09 16:04:53 +08:00
|
|
|
case Instruction::InsertElement:
|
[CostModel] Model all `extractvalue`s as free.
Summary:
As disscussed in https://reviews.llvm.org/D65148#1606412,
`extractvalue` don't actually generate any code,
so we should treat them as free.
Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen
Reviewed By: jmolloy
Subscribers: javed.absar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66098
llvm-svn: 370339
2019-08-29 19:50:30 +08:00
|
|
|
case Instruction::ExtractValue:
|
2020-06-09 16:04:53 +08:00
|
|
|
case Instruction::ShuffleVector:
|
2017-09-09 06:29:17 +08:00
|
|
|
case Instruction::Call:
|
2020-05-26 19:17:26 +08:00
|
|
|
return getUserCost(I, CostKind);
|
2017-09-09 06:29:17 +08:00
|
|
|
default:
|
|
|
|
// We don't have any information on this instruction.
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::Concept::~Concept() {}
|
2015-01-27 06:51:15 +08:00
|
|
|
|
2015-02-01 18:11:22 +08:00
|
|
|
TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
|
|
|
|
|
|
|
|
TargetIRAnalysis::TargetIRAnalysis(
|
2015-09-17 07:38:13 +08:00
|
|
|
std::function<Result(const Function &)> TTICallback)
|
2016-05-27 22:27:24 +08:00
|
|
|
: TTICallback(std::move(TTICallback)) {}
|
2015-02-01 18:11:22 +08:00
|
|
|
|
2016-06-17 08:11:01 +08:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::run(const Function &F,
|
2016-08-09 08:28:15 +08:00
|
|
|
FunctionAnalysisManager &) {
|
2015-02-01 18:11:22 +08:00
|
|
|
return TTICallback(F);
|
|
|
|
}
|
|
|
|
|
2016-11-24 01:53:26 +08:00
|
|
|
AnalysisKey TargetIRAnalysis::Key;
|
2016-02-29 01:17:00 +08:00
|
|
|
|
2015-09-17 07:38:13 +08:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::getDefaultTTI(const Function &F) {
|
2015-07-09 10:08:42 +08:00
|
|
|
return Result(F.getParent()->getDataLayout());
|
2015-02-01 18:11:22 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
// Register the basic pass.
|
|
|
|
INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti",
|
|
|
|
"Target Transform Information", false, true)
|
|
|
|
char TargetTransformInfoWrapperPass::ID = 0;
|
2013-01-05 19:43:11 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
void TargetTransformInfoWrapperPass::anchor() {}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass()
|
2015-02-01 20:26:09 +08:00
|
|
|
: ImmutablePass(ID) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass(
|
2015-02-01 20:26:09 +08:00
|
|
|
TargetIRAnalysis TIRA)
|
|
|
|
: ImmutablePass(ID), TIRA(std::move(TIRA)) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2015-09-17 07:38:13 +08:00
|
|
|
TargetTransformInfo &TargetTransformInfoWrapperPass::getTTI(const Function &F) {
|
2016-08-09 08:28:15 +08:00
|
|
|
FunctionAnalysisManager DummyFAM;
|
2016-06-17 08:11:01 +08:00
|
|
|
TTI = TIRA.run(F, DummyFAM);
|
2015-02-01 20:26:09 +08:00
|
|
|
return *TTI;
|
|
|
|
}
|
|
|
|
|
2015-01-31 19:17:59 +08:00
|
|
|
ImmutablePass *
|
2015-02-01 20:26:09 +08:00
|
|
|
llvm::createTargetTransformInfoWrapperPass(TargetIRAnalysis TIRA) {
|
|
|
|
return new TargetTransformInfoWrapperPass(std::move(TIRA));
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|