2013-01-07 11:08:10 +08:00
|
|
|
//===- llvm/Analysis/TargetTransformInfo.cpp ------------------------------===//
|
2012-10-19 07:22:48 +08:00
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2013-01-07 11:08:10 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfoImpl.h"
|
2014-03-04 19:01:28 +08:00
|
|
|
#include "llvm/IR/CallSite.h"
|
2013-01-21 09:27:39 +08:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
|
|
|
#include "llvm/IR/Instruction.h"
|
|
|
|
#include "llvm/IR/Instructions.h"
|
2014-01-07 19:48:04 +08:00
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
2015-02-01 18:11:22 +08:00
|
|
|
#include "llvm/IR/Module.h"
|
2014-01-07 19:48:04 +08:00
|
|
|
#include "llvm/IR/Operator.h"
|
2017-09-09 06:29:17 +08:00
|
|
|
#include "llvm/IR/PatternMatch.h"
|
2017-07-07 10:00:06 +08:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
2012-10-19 07:22:48 +08:00
|
|
|
#include "llvm/Support/ErrorHandling.h"
|
2016-05-27 22:27:24 +08:00
|
|
|
#include <utility>
|
2012-10-19 07:22:48 +08:00
|
|
|
|
|
|
|
using namespace llvm;
|
2017-09-09 06:29:17 +08:00
|
|
|
using namespace PatternMatch;
|
2012-10-19 07:22:48 +08:00
|
|
|
|
2014-04-22 10:48:03 +08:00
|
|
|
#define DEBUG_TYPE "tti"
|
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
static cl::opt<bool> EnableReduxCost("costmodel-reduxcost", cl::init(false),
|
|
|
|
cl::Hidden,
|
|
|
|
cl::desc("Recognize reduction patterns."));
|
|
|
|
|
2015-01-31 19:17:59 +08:00
|
|
|
namespace {
|
2018-05-01 23:54:18 +08:00
|
|
|
/// No-op implementation of the TTI interface using the utility base
|
2015-01-31 19:17:59 +08:00
|
|
|
/// classes.
|
|
|
|
///
|
|
|
|
/// This is used when no target specific information is available.
|
|
|
|
struct NoTTIImpl : TargetTransformInfoImplCRTPBase<NoTTIImpl> {
|
2015-07-09 10:08:42 +08:00
|
|
|
explicit NoTTIImpl(const DataLayout &DL)
|
2015-01-31 19:17:59 +08:00
|
|
|
: TargetTransformInfoImplCRTPBase<NoTTIImpl>(DL) {}
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
2015-07-09 10:08:42 +08:00
|
|
|
TargetTransformInfo::TargetTransformInfo(const DataLayout &DL)
|
2015-01-31 19:17:59 +08:00
|
|
|
: TTIImpl(new Model<NoTTIImpl>(NoTTIImpl(DL))) {}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::~TargetTransformInfo() {}
|
2012-10-19 07:22:48 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::TargetTransformInfo(TargetTransformInfo &&Arg)
|
|
|
|
: TTIImpl(std::move(Arg.TTIImpl)) {}
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo &TargetTransformInfo::operator=(TargetTransformInfo &&RHS) {
|
|
|
|
TTIImpl = std::move(RHS.TTIImpl);
|
|
|
|
return *this;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getOperationCost(unsigned Opcode, Type *Ty,
|
|
|
|
Type *OpTy) const {
|
|
|
|
int Cost = TTIImpl->getOperationCost(Opcode, Ty, OpTy);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-21 09:27:39 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCallCost(FunctionType *FTy, int NumArgs) const {
|
|
|
|
int Cost = TTIImpl->getCallCost(FTy, NumArgs);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-22 19:26:02 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCallCost(const Function *F,
|
|
|
|
ArrayRef<const Value *> Arguments) const {
|
|
|
|
int Cost = TTIImpl->getCallCost(F, Arguments);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-22 19:26:02 +08:00
|
|
|
}
|
|
|
|
|
2016-04-15 09:38:48 +08:00
|
|
|
unsigned TargetTransformInfo::getInliningThresholdMultiplier() const {
|
|
|
|
return TTIImpl->getInliningThresholdMultiplier();
|
|
|
|
}
|
|
|
|
|
2016-07-09 05:48:05 +08:00
|
|
|
int TargetTransformInfo::getGEPCost(Type *PointeeType, const Value *Ptr,
|
|
|
|
ArrayRef<const Value *> Operands) const {
|
|
|
|
return TTIImpl->getGEPCost(PointeeType, Ptr, Operands);
|
|
|
|
}
|
|
|
|
|
2017-07-15 10:12:16 +08:00
|
|
|
int TargetTransformInfo::getExtCost(const Instruction *I,
|
|
|
|
const Value *Src) const {
|
|
|
|
return TTIImpl->getExtCost(I, Src);
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getIntrinsicCost(
|
|
|
|
Intrinsic::ID IID, Type *RetTy, ArrayRef<const Value *> Arguments) const {
|
|
|
|
int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-22 19:26:02 +08:00
|
|
|
}
|
|
|
|
|
[InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets
```
lor.lhs.false2: ; preds = %if.then
switch i32 %Status, label %if.then27 [
i32 -7012, label %if.end35
i32 -10008, label %if.end35
i32 -10016, label %if.end35
i32 15000, label %if.end35
i32 14013, label %if.end35
i32 10114, label %if.end35
i32 10107, label %if.end35
i32 10105, label %if.end35
i32 10013, label %if.end35
i32 10011, label %if.end35
i32 7008, label %if.end35
i32 7007, label %if.end35
i32 5002, label %if.end35
]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
```
.LBB853_9: // %lor.lhs.false2
mov w8, #10012
cmp w19, w8
b.gt .LBB853_14
// BB#10: // %lor.lhs.false2
mov w8, #5001
cmp w19, w8
b.gt .LBB853_18
// BB#11: // %lor.lhs.false2
mov w8, #-10016
cmp w19, w8
b.eq .LBB853_23
// BB#12: // %lor.lhs.false2
mov w8, #-10008
cmp w19, w8
b.eq .LBB853_23
// BB#13: // %lor.lhs.false2
mov w8, #-7012
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_14: // %lor.lhs.false2
mov w8, #14012
cmp w19, w8
b.gt .LBB853_21
// BB#15: // %lor.lhs.false2
mov w8, #-10105
add w8, w19, w8
cmp w8, #9 // =9
b.hi .LBB853_17
// BB#16: // %lor.lhs.false2
orr w9, wzr, #0x1
lsl w8, w9, w8
mov w9, #517
and w8, w8, w9
cbnz w8, .LBB853_23
.LBB853_17: // %lor.lhs.false2
mov w8, #10013
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_18: // %lor.lhs.false2
mov w8, #-7007
add w8, w19, w8
cmp w8, #2 // =2
b.lo .LBB853_23
// BB#19: // %lor.lhs.false2
mov w8, #5002
cmp w19, w8
b.eq .LBB853_23
// BB#20: // %lor.lhs.false2
mov w8, #10011
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_21: // %lor.lhs.false2
mov w8, #14013
cmp w19, w8
b.eq .LBB853_23
// BB#22: // %lor.lhs.false2
mov w8, #15000
cmp w19, w8
b.ne .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
-inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
2017-04-29 00:04:03 +08:00
|
|
|
unsigned
|
|
|
|
TargetTransformInfo::getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
|
|
|
|
unsigned &JTSize) const {
|
|
|
|
return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize);
|
|
|
|
}
|
|
|
|
|
2017-06-29 21:42:12 +08:00
|
|
|
int TargetTransformInfo::getUserCost(const User *U,
|
|
|
|
ArrayRef<const Value *> Operands) const {
|
|
|
|
int Cost = TTIImpl->getUserCost(U, Operands);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-21 09:27:39 +08:00
|
|
|
}
|
|
|
|
|
2013-07-27 08:01:07 +08:00
|
|
|
bool TargetTransformInfo::hasBranchDivergence() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->hasBranchDivergence();
|
2013-07-27 08:01:07 +08:00
|
|
|
}
|
|
|
|
|
Divergence analysis for GPU programs
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
llvm-svn: 234567
2015-04-10 13:03:50 +08:00
|
|
|
bool TargetTransformInfo::isSourceOfDivergence(const Value *V) const {
|
|
|
|
return TTIImpl->isSourceOfDivergence(V);
|
|
|
|
}
|
|
|
|
|
2017-06-16 03:33:10 +08:00
|
|
|
bool llvm::TargetTransformInfo::isAlwaysUniform(const Value *V) const {
|
|
|
|
return TTIImpl->isAlwaysUniform(V);
|
|
|
|
}
|
|
|
|
|
2017-01-31 07:02:12 +08:00
|
|
|
unsigned TargetTransformInfo::getFlatAddressSpace() const {
|
|
|
|
return TTIImpl->getFlatAddressSpace();
|
|
|
|
}
|
|
|
|
|
2013-01-22 19:26:02 +08:00
|
|
|
bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLoweredToCall(F);
|
2013-01-22 19:26:02 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
void TargetTransformInfo::getUnrollingPreferences(
|
[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34531
llvm-svn: 306554
2017-06-28 23:53:17 +08:00
|
|
|
Loop *L, ScalarEvolution &SE, UnrollingPreferences &UP) const {
|
|
|
|
return TTIImpl->getUnrollingPreferences(L, SE, UP);
|
2013-09-12 03:25:43 +08:00
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isLegalAddImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalAddImmediate(Imm);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalICmpImmediate(int64_t Imm) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalICmpImmediate(Imm);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
bool TargetTransformInfo::isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
|
|
|
bool HasBaseReg,
|
2015-06-08 04:12:03 +08:00
|
|
|
int64_t Scale,
|
2017-07-21 19:59:37 +08:00
|
|
|
unsigned AddrSpace,
|
|
|
|
Instruction *I) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
|
2017-07-21 19:59:37 +08:00
|
|
|
Scale, AddrSpace, I);
|
2014-12-04 17:40:44 +08:00
|
|
|
}
|
|
|
|
|
2017-06-06 07:37:00 +08:00
|
|
|
bool TargetTransformInfo::isLSRCostLess(LSRCost &C1, LSRCost &C2) const {
|
|
|
|
return TTIImpl->isLSRCostLess(C1, C2);
|
|
|
|
}
|
|
|
|
|
2018-02-06 07:43:05 +08:00
|
|
|
bool TargetTransformInfo::canMacroFuseCmp() const {
|
|
|
|
return TTIImpl->canMacroFuseCmp();
|
|
|
|
}
|
|
|
|
|
2018-03-26 21:10:09 +08:00
|
|
|
bool TargetTransformInfo::shouldFavorPostInc() const {
|
|
|
|
return TTIImpl->shouldFavorPostInc();
|
|
|
|
}
|
|
|
|
|
2015-10-19 15:43:38 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedStore(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedStore(DataType);
|
2014-12-04 17:40:44 +08:00
|
|
|
}
|
|
|
|
|
2015-10-19 15:43:38 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedLoad(DataType);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-10-25 23:37:55 +08:00
|
|
|
bool TargetTransformInfo::isLegalMaskedGather(Type *DataType) const {
|
|
|
|
return TTIImpl->isLegalMaskedGather(DataType);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType) const {
|
2017-07-27 18:28:16 +08:00
|
|
|
return TTIImpl->isLegalMaskedScatter(DataType);
|
2015-10-25 23:37:55 +08:00
|
|
|
}
|
|
|
|
|
2017-09-09 21:38:18 +08:00
|
|
|
bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
|
|
|
|
return TTIImpl->hasDivRemOp(DataType, IsSigned);
|
|
|
|
}
|
|
|
|
|
2017-10-25 04:31:44 +08:00
|
|
|
bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
|
|
|
|
unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->hasVolatileVariant(I, AddrSpace);
|
|
|
|
}
|
|
|
|
|
2017-05-24 21:42:56 +08:00
|
|
|
bool TargetTransformInfo::prefersVectorizedAddressing() const {
|
|
|
|
return TTIImpl->prefersVectorizedAddressing();
|
|
|
|
}
|
|
|
|
|
2013-06-01 05:29:03 +08:00
|
|
|
int TargetTransformInfo::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
|
|
|
|
int64_t BaseOffset,
|
|
|
|
bool HasBaseReg,
|
2015-06-08 04:12:03 +08:00
|
|
|
int64_t Scale,
|
|
|
|
unsigned AddrSpace) const {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
|
|
|
|
Scale, AddrSpace);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-06-01 05:29:03 +08:00
|
|
|
}
|
|
|
|
|
2017-07-21 19:59:37 +08:00
|
|
|
bool TargetTransformInfo::LSRWithInstrQueries() const {
|
|
|
|
return TTIImpl->LSRWithInstrQueries();
|
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isTruncateFree(Type *Ty1, Type *Ty2) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isTruncateFree(Ty1, Ty2);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-02-24 03:15:16 +08:00
|
|
|
bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
|
|
|
|
return TTIImpl->isProfitableToHoist(I);
|
|
|
|
}
|
|
|
|
|
2018-03-29 06:28:50 +08:00
|
|
|
bool TargetTransformInfo::useAA() const { return TTIImpl->useAA(); }
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
bool TargetTransformInfo::isTypeLegal(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->isTypeLegal(Ty);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getJumpBufAlignment() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getJumpBufAlignment();
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getJumpBufSize() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getJumpBufSize();
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::shouldBuildLookupTables() const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->shouldBuildLookupTables();
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
2016-10-07 16:48:24 +08:00
|
|
|
bool TargetTransformInfo::shouldBuildLookupTablesForConstant(Constant *C) const {
|
|
|
|
return TTIImpl->shouldBuildLookupTablesForConstant(C);
|
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2018-01-31 00:17:22 +08:00
|
|
|
bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
|
|
|
|
return TTIImpl->useColdCCForColdCall(F);
|
|
|
|
}
|
|
|
|
|
2017-01-26 15:03:25 +08:00
|
|
|
unsigned TargetTransformInfo::
|
|
|
|
getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const {
|
|
|
|
return TTIImpl->getScalarizationOverhead(Ty, Insert, Extract);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::
|
|
|
|
getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
|
|
|
|
unsigned VF) const {
|
|
|
|
return TTIImpl->getOperandsScalarizationOverhead(Args, VF);
|
|
|
|
}
|
|
|
|
|
2017-04-12 20:41:37 +08:00
|
|
|
bool TargetTransformInfo::supportsEfficientVectorElementLoadStore() const {
|
|
|
|
return TTIImpl->supportsEfficientVectorElementLoadStore();
|
|
|
|
}
|
|
|
|
|
2015-03-07 07:12:04 +08:00
|
|
|
bool TargetTransformInfo::enableAggressiveInterleaving(bool LoopHasReductions) const {
|
|
|
|
return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);
|
|
|
|
}
|
|
|
|
|
2017-10-30 22:19:33 +08:00
|
|
|
const TargetTransformInfo::MemCmpExpansionOptions *
|
|
|
|
TargetTransformInfo::enableMemCmpExpansion(bool IsZeroCmp) const {
|
|
|
|
return TTIImpl->enableMemCmpExpansion(IsZeroCmp);
|
2017-06-01 01:12:38 +08:00
|
|
|
}
|
|
|
|
|
2015-08-10 22:50:54 +08:00
|
|
|
bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2018-10-14 16:50:06 +08:00
|
|
|
bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {
|
|
|
|
return TTIImpl->enableMaskedInterleavedAccessVectorization();
|
|
|
|
}
|
|
|
|
|
2016-04-15 04:42:18 +08:00
|
|
|
bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {
|
|
|
|
return TTIImpl->isFPVectorizationPotentiallyUnsafe();
|
|
|
|
}
|
|
|
|
|
2016-08-05 00:38:44 +08:00
|
|
|
bool TargetTransformInfo::allowsMisalignedMemoryAccesses(LLVMContext &Context,
|
|
|
|
unsigned BitWidth,
|
2016-07-12 04:46:17 +08:00
|
|
|
unsigned AddressSpace,
|
|
|
|
unsigned Alignment,
|
|
|
|
bool *Fast) const {
|
2016-08-05 00:38:44 +08:00
|
|
|
return TTIImpl->allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,
|
2016-07-12 04:46:17 +08:00
|
|
|
Alignment, Fast);
|
|
|
|
}
|
|
|
|
|
2013-01-07 11:16:03 +08:00
|
|
|
TargetTransformInfo::PopcntSupportKind
|
|
|
|
TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getPopcntSupport(IntTyWidthInBit);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2013-08-23 18:27:02 +08:00
|
|
|
bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->haveFastSqrt(Ty);
|
2013-08-23 18:27:02 +08:00
|
|
|
}
|
|
|
|
|
2017-11-28 05:15:43 +08:00
|
|
|
bool TargetTransformInfo::isFCmpOrdCheaperThanFCmpZero(Type *Ty) const {
|
|
|
|
return TTIImpl->isFCmpOrdCheaperThanFCmpZero(Ty);
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getFPOpCost(Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getFPOpCost(Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-02-05 10:09:33 +08:00
|
|
|
}
|
|
|
|
|
2016-07-14 15:44:20 +08:00
|
|
|
int TargetTransformInfo::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm,
|
|
|
|
Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCodeSizeCost(Opcode, Idx, Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCost(Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCost(Opcode, Idx, Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getIntImmCost(Intrinsic::ID IID, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty) const {
|
|
|
|
int Cost = TTIImpl->getIntImmCost(IID, Idx, Imm, Ty);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
unsigned TargetTransformInfo::getNumberOfRegisters(bool Vector) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getNumberOfRegisters(Vector);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2013-01-10 06:29:00 +08:00
|
|
|
unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getRegisterBitWidth(Vector);
|
2013-01-10 06:29:00 +08:00
|
|
|
}
|
|
|
|
|
2017-05-16 05:15:01 +08:00
|
|
|
unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
|
|
|
|
return TTIImpl->getMinVectorRegisterBitWidth();
|
|
|
|
}
|
|
|
|
|
2018-03-28 00:14:11 +08:00
|
|
|
bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {
|
|
|
|
return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);
|
|
|
|
}
|
|
|
|
|
2018-04-14 04:16:32 +08:00
|
|
|
unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {
|
|
|
|
return TTIImpl->getMinimumVF(ElemWidth);
|
|
|
|
}
|
|
|
|
|
2017-04-04 03:20:07 +08:00
|
|
|
bool TargetTransformInfo::shouldConsiderAddressTypePromotion(
|
|
|
|
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const {
|
|
|
|
return TTIImpl->shouldConsiderAddressTypePromotion(
|
|
|
|
I, AllowPromotionWithoutCommonHeader);
|
|
|
|
}
|
|
|
|
|
2016-01-22 02:28:36 +08:00
|
|
|
unsigned TargetTransformInfo::getCacheLineSize() const {
|
|
|
|
return TTIImpl->getCacheLineSize();
|
|
|
|
}
|
|
|
|
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
llvm::Optional<unsigned> TargetTransformInfo::getCacheSize(CacheLevel Level)
|
|
|
|
const {
|
|
|
|
return TTIImpl->getCacheSize(Level);
|
|
|
|
}
|
|
|
|
|
|
|
|
llvm::Optional<unsigned> TargetTransformInfo::getCacheAssociativity(
|
|
|
|
CacheLevel Level) const {
|
|
|
|
return TTIImpl->getCacheAssociativity(Level);
|
|
|
|
}
|
|
|
|
|
2016-01-28 06:21:25 +08:00
|
|
|
unsigned TargetTransformInfo::getPrefetchDistance() const {
|
|
|
|
return TTIImpl->getPrefetchDistance();
|
|
|
|
}
|
|
|
|
|
2016-03-18 08:27:38 +08:00
|
|
|
unsigned TargetTransformInfo::getMinPrefetchStride() const {
|
|
|
|
return TTIImpl->getMinPrefetchStride();
|
|
|
|
}
|
|
|
|
|
2016-03-18 08:27:43 +08:00
|
|
|
unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {
|
|
|
|
return TTIImpl->getMaxPrefetchIterationsAhead();
|
|
|
|
}
|
|
|
|
|
2015-05-07 01:12:25 +08:00
|
|
|
unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {
|
|
|
|
return TTIImpl->getMaxInterleaveFactor(VF);
|
2013-01-09 09:15:42 +08:00
|
|
|
}
|
|
|
|
|
2018-10-05 22:34:04 +08:00
|
|
|
TargetTransformInfo::OperandValueKind
|
2018-11-13 21:45:10 +08:00
|
|
|
TargetTransformInfo::getOperandInfo(Value *V, OperandValueProperties &OpProps) {
|
2018-10-05 22:34:04 +08:00
|
|
|
OperandValueKind OpInfo = OK_AnyValue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(V)) {
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
return OK_UniformConstantValue;
|
|
|
|
}
|
|
|
|
|
2018-11-14 23:04:08 +08:00
|
|
|
// A broadcast shuffle creates a uniform value.
|
|
|
|
// TODO: Add support for non-zero index broadcasts.
|
|
|
|
// TODO: Add support for different source vector width.
|
|
|
|
if (auto *ShuffleInst = dyn_cast<ShuffleVectorInst>(V))
|
|
|
|
if (ShuffleInst->isZeroEltSplat())
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
2018-10-05 22:34:04 +08:00
|
|
|
const Value *Splat = getSplatValue(V);
|
|
|
|
|
|
|
|
// Check for a splat of a constant or for a non uniform vector of constants
|
|
|
|
// and check if the constant(s) are all powers of two.
|
|
|
|
if (isa<ConstantVector>(V) || isa<ConstantDataVector>(V)) {
|
|
|
|
OpInfo = OK_NonUniformConstantValue;
|
|
|
|
if (Splat) {
|
|
|
|
OpInfo = OK_UniformConstantValue;
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(Splat))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
} else if (auto *CDS = dyn_cast<ConstantDataSequential>(V)) {
|
|
|
|
OpProps = OP_PowerOf2;
|
|
|
|
for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I) {
|
|
|
|
if (auto *CI = dyn_cast<ConstantInt>(CDS->getElementAsConstant(I)))
|
|
|
|
if (CI->getValue().isPowerOf2())
|
|
|
|
continue;
|
|
|
|
OpProps = OP_None;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Check for a splat of a uniform value. This is not loop aware, so return
|
|
|
|
// true only for the obviously uniform cases (argument, globalvalue)
|
|
|
|
if (Splat && (isa<Argument>(Splat) || isa<GlobalValue>(Splat)))
|
|
|
|
OpInfo = OK_UniformValue;
|
|
|
|
|
|
|
|
return OpInfo;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getArithmeticInstrCost(
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
|
|
|
|
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
|
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 16:23:37 +08:00
|
|
|
OperandValueProperties Opd2PropInfo,
|
|
|
|
ArrayRef<const Value *> Args) const {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = TTIImpl->getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,
|
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 16:23:37 +08:00
|
|
|
Opd1PropInfo, Opd2PropInfo, Args);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty, int Index,
|
|
|
|
Type *SubTp) const {
|
|
|
|
int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst,
|
2017-04-12 19:49:08 +08:00
|
|
|
Type *Src, const Instruction *I) const {
|
|
|
|
assert ((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
|
|
|
int Cost = TTIImpl->getCastInstrCost(Opcode, Dst, Src, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2016-04-27 23:20:21 +08:00
|
|
|
int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
|
|
|
|
VectorType *VecTy,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCFInstrCost(unsigned Opcode) const {
|
|
|
|
int Cost = TTIImpl->getCFInstrCost(Opcode);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
|
2017-04-12 19:49:08 +08:00
|
|
|
Type *CondTy, const Instruction *I) const {
|
|
|
|
assert ((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
|
|
|
int Cost = TTIImpl->getCmpSelInstrCost(Opcode, ValTy, CondTy, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getVectorInstrCost(unsigned Opcode, Type *Val,
|
|
|
|
unsigned Index) const {
|
|
|
|
int Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getMemoryOpCost(unsigned Opcode, Type *Src,
|
|
|
|
unsigned Alignment,
|
2017-04-12 19:49:08 +08:00
|
|
|
unsigned AddressSpace,
|
|
|
|
const Instruction *I) const {
|
|
|
|
assert ((I == nullptr || I->getOpcode() == Opcode) &&
|
|
|
|
"Opcode should reflect passed instruction.");
|
|
|
|
int Cost = TTIImpl->getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, I);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
|
|
|
|
unsigned Alignment,
|
|
|
|
unsigned AddressSpace) const {
|
|
|
|
int Cost =
|
|
|
|
TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-01-25 16:44:46 +08:00
|
|
|
}
|
|
|
|
|
2015-12-29 04:10:59 +08:00
|
|
|
int TargetTransformInfo::getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
|
|
|
|
Value *Ptr, bool VariableMask,
|
|
|
|
unsigned Alignment) const {
|
|
|
|
int Cost = TTIImpl->getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
|
|
|
|
Alignment);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getInterleavedMemoryOpCost(
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 14:39:56 +08:00
|
|
|
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
unsigned Alignment, unsigned AddressSpace, bool UseMaskForCond,
|
|
|
|
bool UseMaskForGaps) const {
|
|
|
|
int Cost = TTIImpl->getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond,
|
|
|
|
UseMaskForGaps);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
2015-06-08 14:39:56 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
|
2017-03-14 14:35:36 +08:00
|
|
|
ArrayRef<Type *> Tys, FastMathFlags FMF,
|
|
|
|
unsigned ScalarizationCostPassed) const {
|
|
|
|
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
|
|
|
|
ScalarizationCostPassed);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-12-29 04:10:59 +08:00
|
|
|
int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
|
2017-03-14 14:35:36 +08:00
|
|
|
ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) const {
|
|
|
|
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF);
|
2015-12-29 04:10:59 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getCallInstrCost(Function *F, Type *RetTy,
|
|
|
|
ArrayRef<Type *> Tys) const {
|
|
|
|
int Cost = TTIImpl->getCallInstrCost(F, RetTy, Tys);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2015-03-18 03:26:23 +08:00
|
|
|
}
|
|
|
|
|
2013-01-05 19:43:11 +08:00
|
|
|
unsigned TargetTransformInfo::getNumberOfParts(Type *Tp) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getNumberOfParts(Tp);
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int TargetTransformInfo::getAddressComputationCost(Type *Tp,
|
2017-01-05 22:03:41 +08:00
|
|
|
ScalarEvolution *SE,
|
|
|
|
const SCEV *Ptr) const {
|
|
|
|
int Cost = TTIImpl->getAddressComputationCost(Tp, SE, Ptr);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
2013-02-08 22:50:48 +08:00
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2017-07-31 22:19:32 +08:00
|
|
|
int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode, Type *Ty,
|
|
|
|
bool IsPairwiseForm) const {
|
|
|
|
int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
|
2015-08-06 02:08:10 +08:00
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
}
|
|
|
|
|
2017-09-08 21:49:36 +08:00
|
|
|
int TargetTransformInfo::getMinMaxReductionCost(Type *Ty, Type *CondTy,
|
|
|
|
bool IsPairwiseForm,
|
|
|
|
bool IsUnsigned) const {
|
|
|
|
int Cost =
|
|
|
|
TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm, IsUnsigned);
|
|
|
|
assert(Cost >= 0 && "TTI should not produce negative costs!");
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
unsigned
|
|
|
|
TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
|
|
|
|
return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
|
Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.
We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:
(v0, v1, v2, v3)
\ \ / /
\ \ /
+ +
(v0+v2, v1+v3, undef, undef)
\ /
((v0+v2) + (v1+v3), undef, undef)
%rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
%r = extractelement <4 x float> %bin.rdx8, i32 0
This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.
We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).
(v0, v1, v2, v3)
\ / \ /
(v0+v1, v2+v3, undef, undef)
\ /
((v0+v1)+(v2+v3), undef, undef, undef)
%rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
<4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
<4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
%r = extractelement <4 x float> %bin.rdx.1, i32 0
llvm-svn: 190876
2013-09-18 02:06:50 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,
|
|
|
|
MemIntrinsicInfo &Info) const {
|
|
|
|
return TTIImpl->getTgtMemIntrinsic(Inst, Info);
|
2014-08-05 20:30:34 +08:00
|
|
|
}
|
|
|
|
|
2017-06-07 00:45:25 +08:00
|
|
|
unsigned TargetTransformInfo::getAtomicMemIntrinsicMaxElementSize() const {
|
|
|
|
return TTIImpl->getAtomicMemIntrinsicMaxElementSize();
|
|
|
|
}
|
|
|
|
|
2015-01-27 06:51:15 +08:00
|
|
|
Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(
|
|
|
|
IntrinsicInst *Inst, Type *ExpectedType) const {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTIImpl->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);
|
2015-01-27 06:51:15 +08:00
|
|
|
}
|
|
|
|
|
2017-07-07 10:00:06 +08:00
|
|
|
Type *TargetTransformInfo::getMemcpyLoopLoweringType(LLVMContext &Context,
|
|
|
|
Value *Length,
|
|
|
|
unsigned SrcAlign,
|
|
|
|
unsigned DestAlign) const {
|
|
|
|
return TTIImpl->getMemcpyLoopLoweringType(Context, Length, SrcAlign,
|
|
|
|
DestAlign);
|
|
|
|
}
|
|
|
|
|
|
|
|
void TargetTransformInfo::getMemcpyLoopResidualLoweringType(
|
|
|
|
SmallVectorImpl<Type *> &OpsOut, LLVMContext &Context,
|
|
|
|
unsigned RemainingBytes, unsigned SrcAlign, unsigned DestAlign) const {
|
|
|
|
TTIImpl->getMemcpyLoopResidualLoweringType(OpsOut, Context, RemainingBytes,
|
|
|
|
SrcAlign, DestAlign);
|
|
|
|
}
|
|
|
|
|
2015-07-30 06:09:48 +08:00
|
|
|
bool TargetTransformInfo::areInlineCompatible(const Function *Caller,
|
|
|
|
const Function *Callee) const {
|
|
|
|
return TTIImpl->areInlineCompatible(Caller, Callee);
|
2018-03-26 21:10:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isIndexedLoadLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedLoadLegal(Mode, Ty);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isIndexedStoreLegal(MemIndexedMode Mode,
|
|
|
|
Type *Ty) const {
|
|
|
|
return TTIImpl->isIndexedStoreLegal(Mode, Ty);
|
2015-07-02 09:11:47 +08:00
|
|
|
}
|
|
|
|
|
2016-10-03 18:31:34 +08:00
|
|
|
unsigned TargetTransformInfo::getLoadStoreVecRegBitWidth(unsigned AS) const {
|
|
|
|
return TTIImpl->getLoadStoreVecRegBitWidth(AS);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoad(LoadInst *LI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeLoad(LI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStore(StoreInst *SI) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeStore(SI);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeLoadChain(
|
|
|
|
unsigned ChainSizeInBytes, unsigned Alignment, unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool TargetTransformInfo::isLegalToVectorizeStoreChain(
|
|
|
|
unsigned ChainSizeInBytes, unsigned Alignment, unsigned AddrSpace) const {
|
|
|
|
return TTIImpl->isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
|
|
|
|
AddrSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getLoadVectorFactor(unsigned VF,
|
|
|
|
unsigned LoadSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
|
|
|
unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
|
|
|
|
unsigned StoreSize,
|
|
|
|
unsigned ChainSizeInBytes,
|
|
|
|
VectorType *VecTy) const {
|
|
|
|
return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
|
|
|
|
}
|
|
|
|
|
2017-05-09 18:43:25 +08:00
|
|
|
bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
|
|
|
|
Type *Ty, ReductionFlags Flags) const {
|
|
|
|
return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
|
|
|
|
}
|
|
|
|
|
2017-05-10 17:42:49 +08:00
|
|
|
bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
|
|
|
|
return TTIImpl->shouldExpandReduction(II);
|
|
|
|
}
|
2017-05-09 18:43:25 +08:00
|
|
|
|
2017-09-09 06:29:17 +08:00
|
|
|
int TargetTransformInfo::getInstructionLatency(const Instruction *I) const {
|
|
|
|
return TTIImpl->getInstructionLatency(I);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool matchPairwiseShuffleMask(ShuffleVectorInst *SI, bool IsLeft,
|
|
|
|
unsigned Level) {
|
|
|
|
// We don't need a shuffle if we just want to have element 0 in position 0 of
|
|
|
|
// the vector.
|
|
|
|
if (!SI && Level == 0 && IsLeft)
|
|
|
|
return true;
|
|
|
|
else if (!SI)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
SmallVector<int, 32> Mask(SI->getType()->getVectorNumElements(), -1);
|
|
|
|
|
|
|
|
// Build a mask of 0, 2, ... (left) or 1, 3, ... (right) depending on whether
|
|
|
|
// we look at the left or right side.
|
|
|
|
for (unsigned i = 0, e = (1 << Level), val = !IsLeft; i != e; ++i, val += 2)
|
|
|
|
Mask[i] = val;
|
|
|
|
|
|
|
|
SmallVector<int, 16> ActualMask = SI->getShuffleMask();
|
|
|
|
return Mask == ActualMask;
|
|
|
|
}
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
/// Kind of the reduction data.
|
|
|
|
enum ReductionKind {
|
|
|
|
RK_None, /// Not a reduction.
|
|
|
|
RK_Arithmetic, /// Binary reduction data.
|
|
|
|
RK_MinMax, /// Min/max reduction data.
|
|
|
|
RK_UnsignedMinMax, /// Unsigned min/max reduction data.
|
|
|
|
};
|
|
|
|
/// Contains opcode + LHS/RHS parts of the reduction operations.
|
|
|
|
struct ReductionData {
|
|
|
|
ReductionData() = delete;
|
|
|
|
ReductionData(ReductionKind Kind, unsigned Opcode, Value *LHS, Value *RHS)
|
|
|
|
: Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
|
|
|
|
assert(Kind != RK_None && "expected binary or min/max reduction only.");
|
|
|
|
}
|
|
|
|
unsigned Opcode = 0;
|
|
|
|
Value *LHS = nullptr;
|
|
|
|
Value *RHS = nullptr;
|
|
|
|
ReductionKind Kind = RK_None;
|
|
|
|
bool hasSameData(ReductionData &RD) const {
|
|
|
|
return Kind == RD.Kind && Opcode == RD.Opcode;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
static Optional<ReductionData> getReductionData(Instruction *I) {
|
|
|
|
Value *L, *R;
|
|
|
|
if (m_BinOp(m_Value(L), m_Value(R)).match(I))
|
2018-07-31 03:41:25 +08:00
|
|
|
return ReductionData(RK_Arithmetic, I->getOpcode(), L, R);
|
2017-09-09 06:29:17 +08:00
|
|
|
if (auto *SI = dyn_cast<SelectInst>(I)) {
|
|
|
|
if (m_SMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_SMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_OrdFMax(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
2018-07-31 03:41:25 +08:00
|
|
|
return ReductionData(RK_MinMax, CI->getOpcode(), L, R);
|
|
|
|
}
|
2017-09-09 06:29:17 +08:00
|
|
|
if (m_UMin(m_Value(L), m_Value(R)).match(SI) ||
|
|
|
|
m_UMax(m_Value(L), m_Value(R)).match(SI)) {
|
|
|
|
auto *CI = cast<CmpInst>(SI->getCondition());
|
|
|
|
return ReductionData(RK_UnsignedMinMax, CI->getOpcode(), L, R);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return llvm::None;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
|
|
|
|
unsigned Level,
|
|
|
|
unsigned NumLevels) {
|
|
|
|
// Match one level of pairwise operations.
|
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
if (!I)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
assert(I->getType()->isVectorTy() && "Expecting a vector type");
|
|
|
|
|
|
|
|
Optional<ReductionData> RD = getReductionData(I);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
|
|
|
|
if (!LS && Level)
|
|
|
|
return RK_None;
|
|
|
|
ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
|
|
|
|
if (!RS && Level)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// On level 0 we can omit one shufflevector instruction.
|
|
|
|
if (!Level && !RS && !LS)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Shuffle inputs must match.
|
|
|
|
Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOpR = RS ? RS->getOperand(0) : nullptr;
|
|
|
|
Value *NextLevelOp = nullptr;
|
|
|
|
if (NextLevelOpR && NextLevelOpL) {
|
|
|
|
// If we have two shuffles their operands must match.
|
|
|
|
if (NextLevelOpL != NextLevelOpR)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL;
|
|
|
|
} else if (Level == 0 && (NextLevelOpR || NextLevelOpL)) {
|
|
|
|
// On the first level we can omit the shufflevector <0, undef,...>. So the
|
|
|
|
// input to the other shufflevector <1, undef> must match with one of the
|
|
|
|
// inputs to the current binary operation.
|
|
|
|
// Example:
|
|
|
|
// %NextLevelOpL = shufflevector %R, <1, undef ...>
|
|
|
|
// %BinOp = fadd %NextLevelOpL, %R
|
|
|
|
if (NextLevelOpL && NextLevelOpL != RD->RHS)
|
|
|
|
return RK_None;
|
|
|
|
else if (NextLevelOpR && NextLevelOpR != RD->LHS)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
|
|
|
|
} else
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Check that the next levels binary operation exists and matches with the
|
|
|
|
// current one.
|
|
|
|
if (Level + 1 != NumLevels) {
|
|
|
|
Optional<ReductionData> NextLevelRD =
|
|
|
|
getReductionData(cast<Instruction>(NextLevelOp));
|
|
|
|
if (!NextLevelRD || !RD->hasSameData(*NextLevelRD))
|
|
|
|
return RK_None;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Shuffle mask for pairwise operation must match.
|
|
|
|
if (matchPairwiseShuffleMask(LS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(RS, /*IsLeft=*/false, Level))
|
|
|
|
return RK_None;
|
|
|
|
} else if (matchPairwiseShuffleMask(RS, /*IsLeft=*/true, Level)) {
|
|
|
|
if (!matchPairwiseShuffleMask(LS, /*IsLeft=*/false, Level))
|
|
|
|
return RK_None;
|
|
|
|
} else {
|
|
|
|
return RK_None;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (++Level == NumLevels)
|
|
|
|
return RD->Kind;
|
|
|
|
|
|
|
|
// Match next level.
|
|
|
|
return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp), Level,
|
|
|
|
NumLevels);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
|
|
|
|
unsigned &Opcode, Type *&Ty) {
|
|
|
|
if (!EnableReduxCost)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RD = getReductionData(RdxStart);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Type *VecTy = RdxStart->getType();
|
|
|
|
unsigned NumVecElems = VecTy->getVectorNumElements();
|
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// We look for a sequence of shuffle,shuffle,add triples like the following
|
|
|
|
// that builds a pairwise reduction tree.
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// (X0, X1, X2, X3)
|
|
|
|
// (X0 + X1, X2 + X3, undef, undef)
|
|
|
|
// ((X0 + X1) + (X2 + X3), undef, undef, undef)
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
|
|
|
|
// %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
|
|
|
|
RK_None)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
|
|
|
static std::pair<Value *, ShuffleVectorInst *>
|
|
|
|
getShuffleAndOtherOprd(Value *L, Value *R) {
|
|
|
|
ShuffleVectorInst *S = nullptr;
|
|
|
|
|
|
|
|
if ((S = dyn_cast<ShuffleVectorInst>(L)))
|
|
|
|
return std::make_pair(R, S);
|
|
|
|
|
|
|
|
S = dyn_cast<ShuffleVectorInst>(R);
|
|
|
|
return std::make_pair(L, S);
|
|
|
|
}
|
|
|
|
|
|
|
|
static ReductionKind
|
|
|
|
matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
|
|
|
|
unsigned &Opcode, Type *&Ty) {
|
|
|
|
if (!EnableReduxCost)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Need to extract the first element.
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
|
|
|
|
unsigned Idx = ~0u;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
if (Idx != 0)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
|
|
|
|
if (!RdxStart)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RD = getReductionData(RdxStart);
|
|
|
|
if (!RD)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Type *VecTy = ReduxRoot->getOperand(0)->getType();
|
|
|
|
unsigned NumVecElems = VecTy->getVectorNumElements();
|
|
|
|
if (!isPowerOf2_32(NumVecElems))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// We look for a sequence of shuffles and adds like the following matching one
|
|
|
|
// fadd, shuffle vector pair at a time.
|
2018-07-31 03:41:25 +08:00
|
|
|
//
|
2017-09-09 06:29:17 +08:00
|
|
|
// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
|
|
|
|
// %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
|
|
|
|
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
|
|
|
|
// %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
|
|
|
|
// %r = extractelement <4 x float> %bin.rdx8, i32 0
|
|
|
|
|
|
|
|
unsigned MaskStart = 1;
|
|
|
|
Instruction *RdxOp = RdxStart;
|
2018-07-31 03:41:25 +08:00
|
|
|
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
|
2017-09-09 06:29:17 +08:00
|
|
|
unsigned NumVecElemsRemain = NumVecElems;
|
|
|
|
while (NumVecElemsRemain - 1) {
|
|
|
|
// Check for the right reduction operation.
|
|
|
|
if (!RdxOp)
|
|
|
|
return RK_None;
|
|
|
|
Optional<ReductionData> RDLevel = getReductionData(RdxOp);
|
|
|
|
if (!RDLevel || !RDLevel->hasSameData(*RD))
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
Value *NextRdxOp;
|
|
|
|
ShuffleVectorInst *Shuffle;
|
|
|
|
std::tie(NextRdxOp, Shuffle) =
|
|
|
|
getShuffleAndOtherOprd(RDLevel->LHS, RDLevel->RHS);
|
|
|
|
|
|
|
|
// Check the current reduction operation and the shuffle use the same value.
|
|
|
|
if (Shuffle == nullptr)
|
|
|
|
return RK_None;
|
|
|
|
if (Shuffle->getOperand(0) != NextRdxOp)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
// Check that shuffle masks matches.
|
|
|
|
for (unsigned j = 0; j != MaskStart; ++j)
|
|
|
|
ShuffleMask[j] = MaskStart + j;
|
|
|
|
// Fill the rest of the mask with -1 for undef.
|
|
|
|
std::fill(&ShuffleMask[MaskStart], ShuffleMask.end(), -1);
|
|
|
|
|
|
|
|
SmallVector<int, 16> Mask = Shuffle->getShuffleMask();
|
|
|
|
if (ShuffleMask != Mask)
|
|
|
|
return RK_None;
|
|
|
|
|
|
|
|
RdxOp = dyn_cast<Instruction>(NextRdxOp);
|
|
|
|
NumVecElemsRemain /= 2;
|
|
|
|
MaskStart *= 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
Opcode = RD->Opcode;
|
|
|
|
Ty = VecTy;
|
|
|
|
return RD->Kind;
|
|
|
|
}
|
|
|
|
|
|
|
|
int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
|
|
|
|
switch (I->getOpcode()) {
|
|
|
|
case Instruction::GetElementPtr:
|
|
|
|
return getUserCost(I);
|
|
|
|
|
|
|
|
case Instruction::Ret:
|
|
|
|
case Instruction::PHI:
|
|
|
|
case Instruction::Br: {
|
|
|
|
return getCFInstrCost(I->getOpcode());
|
|
|
|
}
|
|
|
|
case Instruction::Add:
|
|
|
|
case Instruction::FAdd:
|
|
|
|
case Instruction::Sub:
|
|
|
|
case Instruction::FSub:
|
|
|
|
case Instruction::Mul:
|
|
|
|
case Instruction::FMul:
|
|
|
|
case Instruction::UDiv:
|
|
|
|
case Instruction::SDiv:
|
|
|
|
case Instruction::FDiv:
|
|
|
|
case Instruction::URem:
|
|
|
|
case Instruction::SRem:
|
|
|
|
case Instruction::FRem:
|
|
|
|
case Instruction::Shl:
|
|
|
|
case Instruction::LShr:
|
|
|
|
case Instruction::AShr:
|
|
|
|
case Instruction::And:
|
|
|
|
case Instruction::Or:
|
|
|
|
case Instruction::Xor: {
|
2018-05-22 18:40:09 +08:00
|
|
|
TargetTransformInfo::OperandValueKind Op1VK, Op2VK;
|
|
|
|
TargetTransformInfo::OperandValueProperties Op1VP, Op2VP;
|
|
|
|
Op1VK = getOperandInfo(I->getOperand(0), Op1VP);
|
|
|
|
Op2VK = getOperandInfo(I->getOperand(1), Op2VP);
|
|
|
|
SmallVector<const Value *, 2> Operands(I->operand_values());
|
|
|
|
return getArithmeticInstrCost(I->getOpcode(), I->getType(), Op1VK, Op2VK,
|
|
|
|
Op1VP, Op2VP, Operands);
|
2017-09-09 06:29:17 +08:00
|
|
|
}
|
|
|
|
case Instruction::Select: {
|
|
|
|
const SelectInst *SI = cast<SelectInst>(I);
|
|
|
|
Type *CondTy = SI->getCondition()->getType();
|
|
|
|
return getCmpSelInstrCost(I->getOpcode(), I->getType(), CondTy, I);
|
|
|
|
}
|
|
|
|
case Instruction::ICmp:
|
|
|
|
case Instruction::FCmp: {
|
|
|
|
Type *ValTy = I->getOperand(0)->getType();
|
|
|
|
return getCmpSelInstrCost(I->getOpcode(), ValTy, I->getType(), I);
|
|
|
|
}
|
|
|
|
case Instruction::Store: {
|
|
|
|
const StoreInst *SI = cast<StoreInst>(I);
|
|
|
|
Type *ValTy = SI->getValueOperand()->getType();
|
|
|
|
return getMemoryOpCost(I->getOpcode(), ValTy,
|
|
|
|
SI->getAlignment(),
|
|
|
|
SI->getPointerAddressSpace(), I);
|
|
|
|
}
|
|
|
|
case Instruction::Load: {
|
|
|
|
const LoadInst *LI = cast<LoadInst>(I);
|
|
|
|
return getMemoryOpCost(I->getOpcode(), I->getType(),
|
|
|
|
LI->getAlignment(),
|
|
|
|
LI->getPointerAddressSpace(), I);
|
|
|
|
}
|
|
|
|
case Instruction::ZExt:
|
|
|
|
case Instruction::SExt:
|
|
|
|
case Instruction::FPToUI:
|
|
|
|
case Instruction::FPToSI:
|
|
|
|
case Instruction::FPExt:
|
|
|
|
case Instruction::PtrToInt:
|
|
|
|
case Instruction::IntToPtr:
|
|
|
|
case Instruction::SIToFP:
|
|
|
|
case Instruction::UIToFP:
|
|
|
|
case Instruction::Trunc:
|
|
|
|
case Instruction::FPTrunc:
|
|
|
|
case Instruction::BitCast:
|
|
|
|
case Instruction::AddrSpaceCast: {
|
|
|
|
Type *SrcTy = I->getOperand(0)->getType();
|
|
|
|
return getCastInstrCost(I->getOpcode(), I->getType(), SrcTy, I);
|
|
|
|
}
|
|
|
|
case Instruction::ExtractElement: {
|
|
|
|
const ExtractElementInst * EEI = cast<ExtractElementInst>(I);
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(I->getOperand(1));
|
|
|
|
unsigned Idx = -1;
|
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
|
|
|
|
// Try to match a reduction sequence (series of shufflevector and vector
|
|
|
|
// adds followed by a extractelement).
|
|
|
|
unsigned ReduxOpCode;
|
|
|
|
Type *ReduxType;
|
|
|
|
|
|
|
|
switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
|
|
|
|
case RK_Arithmetic:
|
|
|
|
return getArithmeticReductionCost(ReduxOpCode, ReduxType,
|
|
|
|
/*IsPairwiseForm=*/false);
|
|
|
|
case RK_MinMax:
|
|
|
|
return getMinMaxReductionCost(
|
|
|
|
ReduxType, CmpInst::makeCmpResultType(ReduxType),
|
|
|
|
/*IsPairwiseForm=*/false, /*IsUnsigned=*/false);
|
|
|
|
case RK_UnsignedMinMax:
|
|
|
|
return getMinMaxReductionCost(
|
|
|
|
ReduxType, CmpInst::makeCmpResultType(ReduxType),
|
|
|
|
/*IsPairwiseForm=*/false, /*IsUnsigned=*/true);
|
|
|
|
case RK_None:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
|
|
|
|
case RK_Arithmetic:
|
|
|
|
return getArithmeticReductionCost(ReduxOpCode, ReduxType,
|
|
|
|
/*IsPairwiseForm=*/true);
|
|
|
|
case RK_MinMax:
|
|
|
|
return getMinMaxReductionCost(
|
|
|
|
ReduxType, CmpInst::makeCmpResultType(ReduxType),
|
|
|
|
/*IsPairwiseForm=*/true, /*IsUnsigned=*/false);
|
|
|
|
case RK_UnsignedMinMax:
|
|
|
|
return getMinMaxReductionCost(
|
|
|
|
ReduxType, CmpInst::makeCmpResultType(ReduxType),
|
|
|
|
/*IsPairwiseForm=*/true, /*IsUnsigned=*/true);
|
|
|
|
case RK_None:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return getVectorInstrCost(I->getOpcode(),
|
|
|
|
EEI->getOperand(0)->getType(), Idx);
|
|
|
|
}
|
|
|
|
case Instruction::InsertElement: {
|
|
|
|
const InsertElementInst * IE = cast<InsertElementInst>(I);
|
|
|
|
ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
|
2018-07-31 03:41:25 +08:00
|
|
|
unsigned Idx = -1;
|
2017-09-09 06:29:17 +08:00
|
|
|
if (CI)
|
|
|
|
Idx = CI->getZExtValue();
|
|
|
|
return getVectorInstrCost(I->getOpcode(),
|
|
|
|
IE->getType(), Idx);
|
|
|
|
}
|
|
|
|
case Instruction::ShuffleVector: {
|
|
|
|
const ShuffleVectorInst *Shuffle = cast<ShuffleVectorInst>(I);
|
2018-11-10 00:28:19 +08:00
|
|
|
Type *Ty = Shuffle->getType();
|
|
|
|
Type *SrcTy = Shuffle->getOperand(0)->getType();
|
|
|
|
|
|
|
|
// TODO: Identify and add costs for insert subvector, etc.
|
|
|
|
int SubIndex;
|
|
|
|
if (Shuffle->isExtractSubvectorMask(SubIndex))
|
2018-11-10 02:30:59 +08:00
|
|
|
return TTIImpl->getShuffleCost(SK_ExtractSubvector, SrcTy, SubIndex, Ty);
|
2018-11-10 00:28:19 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->changesLength())
|
|
|
|
return -1;
|
2018-07-31 03:41:25 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isIdentity())
|
|
|
|
return 0;
|
2017-09-09 06:29:17 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isReverse())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Reverse, Ty, 0, nullptr);
|
2018-06-12 22:47:13 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isSelect())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Select, Ty, 0, nullptr);
|
2018-06-12 22:47:13 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isTranspose())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Transpose, Ty, 0, nullptr);
|
2018-04-26 21:48:33 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isZeroEltSplat())
|
|
|
|
return TTIImpl->getShuffleCost(SK_Broadcast, Ty, 0, nullptr);
|
2017-09-09 06:29:17 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
if (Shuffle->isSingleSource())
|
|
|
|
return TTIImpl->getShuffleCost(SK_PermuteSingleSrc, Ty, 0, nullptr);
|
2017-09-09 06:29:17 +08:00
|
|
|
|
2018-06-20 02:44:00 +08:00
|
|
|
return TTIImpl->getShuffleCost(SK_PermuteTwoSrc, Ty, 0, nullptr);
|
2017-09-09 06:29:17 +08:00
|
|
|
}
|
|
|
|
case Instruction::Call:
|
|
|
|
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
SmallVector<Value *, 4> Args(II->arg_operands());
|
|
|
|
|
|
|
|
FastMathFlags FMF;
|
|
|
|
if (auto *FPMO = dyn_cast<FPMathOperator>(II))
|
|
|
|
FMF = FPMO->getFastMathFlags();
|
|
|
|
|
|
|
|
return getIntrinsicInstrCost(II->getIntrinsicID(), II->getType(),
|
|
|
|
Args, FMF);
|
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
default:
|
|
|
|
// We don't have any information on this instruction.
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::Concept::~Concept() {}
|
2015-01-27 06:51:15 +08:00
|
|
|
|
2015-02-01 18:11:22 +08:00
|
|
|
TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
|
|
|
|
|
|
|
|
TargetIRAnalysis::TargetIRAnalysis(
|
2015-09-17 07:38:13 +08:00
|
|
|
std::function<Result(const Function &)> TTICallback)
|
2016-05-27 22:27:24 +08:00
|
|
|
: TTICallback(std::move(TTICallback)) {}
|
2015-02-01 18:11:22 +08:00
|
|
|
|
2016-06-17 08:11:01 +08:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::run(const Function &F,
|
2016-08-09 08:28:15 +08:00
|
|
|
FunctionAnalysisManager &) {
|
2015-02-01 18:11:22 +08:00
|
|
|
return TTICallback(F);
|
|
|
|
}
|
|
|
|
|
2016-11-24 01:53:26 +08:00
|
|
|
AnalysisKey TargetIRAnalysis::Key;
|
2016-02-29 01:17:00 +08:00
|
|
|
|
2015-09-17 07:38:13 +08:00
|
|
|
TargetIRAnalysis::Result TargetIRAnalysis::getDefaultTTI(const Function &F) {
|
2015-07-09 10:08:42 +08:00
|
|
|
return Result(F.getParent()->getDataLayout());
|
2015-02-01 18:11:22 +08:00
|
|
|
}
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
// Register the basic pass.
|
|
|
|
INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti",
|
|
|
|
"Target Transform Information", false, true)
|
|
|
|
char TargetTransformInfoWrapperPass::ID = 0;
|
2013-01-05 19:43:11 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
void TargetTransformInfoWrapperPass::anchor() {}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass()
|
2015-02-01 20:26:09 +08:00
|
|
|
: ImmutablePass(ID) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass(
|
2015-02-01 20:26:09 +08:00
|
|
|
TargetIRAnalysis TIRA)
|
|
|
|
: ImmutablePass(ID), TIRA(std::move(TIRA)) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
initializeTargetTransformInfoWrapperPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
|
|
|
}
|
2013-01-05 19:43:11 +08:00
|
|
|
|
2015-09-17 07:38:13 +08:00
|
|
|
TargetTransformInfo &TargetTransformInfoWrapperPass::getTTI(const Function &F) {
|
2016-08-09 08:28:15 +08:00
|
|
|
FunctionAnalysisManager DummyFAM;
|
2016-06-17 08:11:01 +08:00
|
|
|
TTI = TIRA.run(F, DummyFAM);
|
2015-02-01 20:26:09 +08:00
|
|
|
return *TTI;
|
|
|
|
}
|
|
|
|
|
2015-01-31 19:17:59 +08:00
|
|
|
ImmutablePass *
|
2015-02-01 20:26:09 +08:00
|
|
|
llvm::createTargetTransformInfoWrapperPass(TargetIRAnalysis TIRA) {
|
|
|
|
return new TargetTransformInfoWrapperPass(std::move(TIRA));
|
2013-01-05 19:43:11 +08:00
|
|
|
}
|