Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
//===-- X86TargetTransformInfo.cpp - X86 specific TTI pass ----------------===//
|
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
/// \file
|
|
|
|
/// This file implements a TargetTransformInfo analysis pass specific to the
|
|
|
|
/// X86 target machine. It uses the target's detailed information to provide
|
|
|
|
/// more precise answers to certain TTI queries, while letting the target
|
|
|
|
/// independent and default TTI implementations handle the rest.
|
|
|
|
///
|
|
|
|
//===----------------------------------------------------------------------===//
|
2016-10-12 21:24:13 +08:00
|
|
|
/// About Cost Model numbers used below it's necessary to say the following:
|
|
|
|
/// the numbers correspond to some "generic" X86 CPU instead of usage of
|
|
|
|
/// concrete CPU model. Usually the numbers correspond to CPU where the feature
|
|
|
|
/// apeared at the first time. For example, if we do Subtarget.hasSSE42() in
|
|
|
|
/// the lookups below the cost is based on Nehalem as that was the first CPU
|
|
|
|
/// to support that feature level and thus has most likely the worst case cost.
|
|
|
|
/// Some examples of other technologies/CPUs:
|
|
|
|
/// SSE 3 - Pentium4 / Athlon64
|
|
|
|
/// SSE 4.1 - Penryn
|
|
|
|
/// SSE 4.2 - Nehalem
|
|
|
|
/// AVX - Sandy Bridge
|
|
|
|
/// AVX2 - Haswell
|
|
|
|
/// AVX-512 - Xeon Phi / Skylake
|
|
|
|
/// And some examples of instruction target dependent costs (latency)
|
|
|
|
/// divss sqrtss rsqrtss
|
|
|
|
/// AMD K7 11-16 19 3
|
|
|
|
/// Piledriver 9-24 13-15 5
|
|
|
|
/// Jaguar 14 16 2
|
|
|
|
/// Pentium II,III 18 30 2
|
|
|
|
/// Nehalem 7-14 7-18 3
|
|
|
|
/// Haswell 10-13 11 5
|
|
|
|
/// TODO: Develop and implement the target dependent cost model and
|
|
|
|
/// specialize cost numbers for different Cost Model Targets such as throughput,
|
|
|
|
/// code size, latency and uop count.
|
|
|
|
//===----------------------------------------------------------------------===//
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2015-01-31 19:17:59 +08:00
|
|
|
#include "X86TargetTransformInfo.h"
|
2013-01-07 11:08:10 +08:00
|
|
|
#include "llvm/Analysis/TargetTransformInfo.h"
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
#include "llvm/CodeGen/BasicTTIImpl.h"
|
2017-11-17 09:07:10 +08:00
|
|
|
#include "llvm/CodeGen/CostTable.h"
|
|
|
|
#include "llvm/CodeGen/TargetLowering.h"
|
2014-01-25 10:02:55 +08:00
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
#include "llvm/Support/Debug.h"
|
2015-10-07 07:24:35 +08:00
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
using namespace llvm;
|
|
|
|
|
2014-04-22 10:41:26 +08:00
|
|
|
#define DEBUG_TYPE "x86tti"
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// X86 cost model.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
TargetTransformInfo::PopcntSupportKind
|
|
|
|
X86TTIImpl::getPopcntSupport(unsigned TyWidth) {
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
|
|
|
|
// TODO: Currently the __builtin_popcount() implementation using SSE3
|
|
|
|
// instructions is inefficient. Once the problem is fixed, we should
|
2013-09-08 08:47:31 +08:00
|
|
|
// call ST->hasSSE3() instead of ST->hasPOPCNT().
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
llvm::Optional<unsigned> X86TTIImpl::getCacheSize(
|
|
|
|
TargetTransformInfo::CacheLevel Level) const {
|
|
|
|
switch (Level) {
|
|
|
|
case TargetTransformInfo::CacheLevel::L1D:
|
2017-11-23 02:23:40 +08:00
|
|
|
// - Penryn
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
// - Nehalem
|
|
|
|
// - Westmere
|
|
|
|
// - Sandy Bridge
|
|
|
|
// - Ivy Bridge
|
|
|
|
// - Haswell
|
|
|
|
// - Broadwell
|
|
|
|
// - Skylake
|
|
|
|
// - Kabylake
|
|
|
|
return 32 * 1024; // 32 KByte
|
|
|
|
case TargetTransformInfo::CacheLevel::L2D:
|
2017-11-23 02:23:40 +08:00
|
|
|
// - Penryn
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
// - Nehalem
|
|
|
|
// - Westmere
|
|
|
|
// - Sandy Bridge
|
|
|
|
// - Ivy Bridge
|
|
|
|
// - Haswell
|
|
|
|
// - Broadwell
|
|
|
|
// - Skylake
|
|
|
|
// - Kabylake
|
|
|
|
return 256 * 1024; // 256 KByte
|
|
|
|
}
|
|
|
|
|
|
|
|
llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
|
|
|
|
}
|
|
|
|
|
|
|
|
llvm::Optional<unsigned> X86TTIImpl::getCacheAssociativity(
|
|
|
|
TargetTransformInfo::CacheLevel Level) const {
|
2017-11-23 02:23:40 +08:00
|
|
|
// - Penryn
|
Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
2017-08-24 17:46:25 +08:00
|
|
|
// - Nehalem
|
|
|
|
// - Westmere
|
|
|
|
// - Sandy Bridge
|
|
|
|
// - Ivy Bridge
|
|
|
|
// - Haswell
|
|
|
|
// - Broadwell
|
|
|
|
// - Skylake
|
|
|
|
// - Kabylake
|
|
|
|
switch (Level) {
|
|
|
|
case TargetTransformInfo::CacheLevel::L1D:
|
|
|
|
LLVM_FALLTHROUGH;
|
|
|
|
case TargetTransformInfo::CacheLevel::L2D:
|
|
|
|
return 8;
|
|
|
|
}
|
|
|
|
|
|
|
|
llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
|
|
|
|
}
|
|
|
|
|
recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
llvm-svn: 374634
2019-10-12 10:53:04 +08:00
|
|
|
unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
|
|
|
|
bool Vector = (ClassID == 1);
|
2013-01-10 06:29:00 +08:00
|
|
|
if (Vector && !ST->hasSSE1())
|
|
|
|
return 0;
|
|
|
|
|
2014-07-10 02:22:33 +08:00
|
|
|
if (ST->is64Bit()) {
|
|
|
|
if (Vector && ST->hasAVX512())
|
|
|
|
return 32;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
return 16;
|
2014-07-10 02:22:33 +08:00
|
|
|
}
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
return 8;
|
|
|
|
}
|
|
|
|
|
2017-04-06 04:51:38 +08:00
|
|
|
unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) const {
|
2018-01-20 08:26:08 +08:00
|
|
|
unsigned PreferVectorWidth = ST->getPreferVectorWidth();
|
2013-01-10 06:29:00 +08:00
|
|
|
if (Vector) {
|
2018-01-20 08:26:08 +08:00
|
|
|
if (ST->hasAVX512() && PreferVectorWidth >= 512)
|
2017-01-05 17:51:02 +08:00
|
|
|
return 512;
|
2018-01-20 08:26:08 +08:00
|
|
|
if (ST->hasAVX() && PreferVectorWidth >= 256)
|
2017-01-05 17:51:02 +08:00
|
|
|
return 256;
|
2018-01-20 08:26:08 +08:00
|
|
|
if (ST->hasSSE1() && PreferVectorWidth >= 128)
|
2017-01-05 17:51:02 +08:00
|
|
|
return 128;
|
2013-01-10 06:29:00 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ST->is64Bit())
|
|
|
|
return 64;
|
|
|
|
|
2015-10-07 07:24:35 +08:00
|
|
|
return 32;
|
2013-01-10 06:29:00 +08:00
|
|
|
}
|
|
|
|
|
2017-04-06 04:51:38 +08:00
|
|
|
unsigned X86TTIImpl::getLoadStoreVecRegBitWidth(unsigned) const {
|
|
|
|
return getRegisterBitWidth(true);
|
|
|
|
}
|
|
|
|
|
2015-05-07 01:12:25 +08:00
|
|
|
unsigned X86TTIImpl::getMaxInterleaveFactor(unsigned VF) {
|
|
|
|
// If the loop will not be vectorized, don't interleave the loop.
|
|
|
|
// Let regular unroll to unroll the loop, which saves the overflow
|
|
|
|
// check and memory check cost.
|
|
|
|
if (VF == 1)
|
|
|
|
return 1;
|
|
|
|
|
2013-01-09 09:15:42 +08:00
|
|
|
if (ST->isAtom())
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
// Sandybridge and Haswell have multiple execution ports and pipelined
|
|
|
|
// vector units.
|
|
|
|
if (ST->hasAVX())
|
|
|
|
return 4;
|
|
|
|
|
|
|
|
return 2;
|
|
|
|
}
|
|
|
|
|
[ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
2019-12-08 23:33:24 +08:00
|
|
|
int X86TTIImpl::getArithmeticInstrCost(unsigned Opcode, Type *Ty,
|
|
|
|
TTI::OperandValueKind Op1Info,
|
|
|
|
TTI::OperandValueKind Op2Info,
|
|
|
|
TTI::OperandValueProperties Opd1PropInfo,
|
|
|
|
TTI::OperandValueProperties Opd2PropInfo,
|
|
|
|
ArrayRef<const Value *> Args,
|
|
|
|
const Instruction *CxtI) {
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// Legalize the type.
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
|
2018-03-25 23:58:12 +08:00
|
|
|
static const CostTblEntry GLMCostTable[] = {
|
|
|
|
{ ISD::FDIV, MVT::f32, 18 }, // divss
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 35 }, // divps
|
|
|
|
{ ISD::FDIV, MVT::f64, 33 }, // divsd
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 65 }, // divpd
|
|
|
|
};
|
|
|
|
|
2019-12-06 02:24:10 +08:00
|
|
|
if (ST->useGLMDivSqrtCosts())
|
2018-03-25 23:58:12 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(GLMCostTable, ISD,
|
|
|
|
LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 16:23:37 +08:00
|
|
|
static const CostTblEntry SLMCostTable[] = {
|
2018-03-25 23:58:12 +08:00
|
|
|
{ ISD::MUL, MVT::v4i32, 11 }, // pmulld
|
|
|
|
{ ISD::MUL, MVT::v8i16, 2 }, // pmullw
|
|
|
|
{ ISD::MUL, MVT::v16i8, 14 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::FMUL, MVT::f64, 2 }, // mulsd
|
|
|
|
{ ISD::FMUL, MVT::v2f64, 4 }, // mulpd
|
|
|
|
{ ISD::FMUL, MVT::v4f32, 2 }, // mulps
|
|
|
|
{ ISD::FDIV, MVT::f32, 17 }, // divss
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 39 }, // divps
|
|
|
|
{ ISD::FDIV, MVT::f64, 32 }, // divsd
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 69 }, // divpd
|
|
|
|
{ ISD::FADD, MVT::v2f64, 2 }, // addpd
|
|
|
|
{ ISD::FSUB, MVT::v2f64, 2 }, // subpd
|
2017-07-02 20:16:15 +08:00
|
|
|
// v2i64/v4i64 mul is custom lowered as a series of long:
|
|
|
|
// multiplies(3), shifts(3) and adds(2)
|
2017-08-01 01:09:27 +08:00
|
|
|
// slm muldq version throughput is 2 and addq throughput 4
|
2018-01-30 20:18:51 +08:00
|
|
|
// thus: 3X2 (muldq throughput) + 3X1 (shift throughput) +
|
2017-08-01 01:09:27 +08:00
|
|
|
// 3X4 (addq throughput) = 17
|
2018-03-25 23:58:12 +08:00
|
|
|
{ ISD::MUL, MVT::v2i64, 17 },
|
2017-07-02 20:16:15 +08:00
|
|
|
// slm addq\subq throughput is 4
|
2018-03-25 23:58:12 +08:00
|
|
|
{ ISD::ADD, MVT::v2i64, 4 },
|
|
|
|
{ ISD::SUB, MVT::v2i64, 4 },
|
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 16:23:37 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->isSLM()) {
|
|
|
|
if (Args.size() == 2 && ISD == ISD::MUL && LT.second == MVT::v4i32) {
|
|
|
|
// Check if the operands can be shrinked into a smaller datatype.
|
|
|
|
bool Op1Signed = false;
|
|
|
|
unsigned Op1MinSize = BaseT::minRequiredElementSize(Args[0], Op1Signed);
|
|
|
|
bool Op2Signed = false;
|
|
|
|
unsigned Op2MinSize = BaseT::minRequiredElementSize(Args[1], Op2Signed);
|
|
|
|
|
|
|
|
bool signedMode = Op1Signed | Op2Signed;
|
|
|
|
unsigned OpMinSize = std::max(Op1MinSize, Op2MinSize);
|
|
|
|
|
|
|
|
if (OpMinSize <= 7)
|
|
|
|
return LT.first * 3; // pmullw/sext
|
|
|
|
if (!signedMode && OpMinSize <= 8)
|
|
|
|
return LT.first * 3; // pmullw/zext
|
|
|
|
if (OpMinSize <= 15)
|
|
|
|
return LT.first * 5; // pmullw/pmulhw/pshuf
|
|
|
|
if (!signedMode && OpMinSize <= 16)
|
|
|
|
return LT.first * 5; // pmullw/pmulhw/pshuf
|
|
|
|
}
|
2018-03-25 23:58:12 +08:00
|
|
|
|
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 16:23:37 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTable, ISD,
|
|
|
|
LT.second)) {
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-08 00:53:30 +08:00
|
|
|
if ((ISD == ISD::SDIV || ISD == ISD::SREM || ISD == ISD::UDIV ||
|
|
|
|
ISD == ISD::UREM) &&
|
2018-05-22 18:40:09 +08:00
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) &&
|
2014-08-25 12:56:54 +08:00
|
|
|
Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {
|
2018-07-08 00:53:30 +08:00
|
|
|
if (ISD == ISD::SDIV || ISD == ISD::SREM) {
|
2018-07-06 00:56:28 +08:00
|
|
|
// On X86, vector signed division by constants power-of-two are
|
|
|
|
// normally expanded to the sequence SRA + SRL + ADD + SRA.
|
|
|
|
// The OperandValue properties may not be the same as that of the previous
|
|
|
|
// operation; conservatively assume OP_None.
|
|
|
|
int Cost =
|
|
|
|
2 * getArithmeticInstrCost(Instruction::AShr, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
|
|
|
Cost += getArithmeticInstrCost(Instruction::LShr, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
|
|
|
Cost += getArithmeticInstrCost(Instruction::Add, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
|
|
|
|
2018-07-08 00:53:30 +08:00
|
|
|
if (ISD == ISD::SREM) {
|
|
|
|
// For SREM: (X % C) is the equivalent of (X - (X/C)*C)
|
|
|
|
Cost += getArithmeticInstrCost(Instruction::Mul, Ty, Op1Info, Op2Info);
|
|
|
|
Cost += getArithmeticInstrCost(Instruction::Sub, Ty, Op1Info, Op2Info);
|
|
|
|
}
|
|
|
|
|
2018-07-06 00:56:28 +08:00
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Vector unsigned division/remainder will be simplified to shifts/masks.
|
|
|
|
if (ISD == ISD::UDIV)
|
|
|
|
return getArithmeticInstrCost(Instruction::LShr, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
|
|
|
|
2019-11-07 03:10:13 +08:00
|
|
|
else // UREM
|
2018-07-06 00:56:28 +08:00
|
|
|
return getArithmeticInstrCost(Instruction::And, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
2014-08-25 12:56:54 +08:00
|
|
|
}
|
|
|
|
|
2016-10-21 02:00:35 +08:00
|
|
|
static const CostTblEntry AVX512BWUniformConstCostTable[] = {
|
2017-01-08 22:14:36 +08:00
|
|
|
{ ISD::SHL, MVT::v64i8, 2 }, // psllw + pand.
|
|
|
|
{ ISD::SRL, MVT::v64i8, 2 }, // psrlw + pand.
|
|
|
|
{ ISD::SRA, MVT::v64i8, 4 }, // psrlw, pand, pxor, psubb.
|
2016-10-21 02:00:35 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
|
|
|
|
ST->hasBWI()) {
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512BWUniformConstCostTable, ISD,
|
|
|
|
LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512UniformConstCostTable[] = {
|
2017-01-15 03:24:23 +08:00
|
|
|
{ ISD::SRA, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SRA, MVT::v4i64, 1 },
|
|
|
|
{ ISD::SRA, MVT::v8i64, 1 },
|
2016-10-21 02:00:35 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
|
|
|
|
ST->hasAVX512()) {
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512UniformConstCostTable, ISD,
|
|
|
|
LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry AVX2UniformConstCostTable[] = {
|
2017-01-08 22:14:36 +08:00
|
|
|
{ ISD::SHL, MVT::v32i8, 2 }, // psllw + pand.
|
|
|
|
{ ISD::SRL, MVT::v32i8, 2 }, // psrlw + pand.
|
|
|
|
{ ISD::SRA, MVT::v32i8, 4 }, // psrlw, pand, pxor, psubb.
|
|
|
|
|
2015-07-07 06:35:19 +08:00
|
|
|
{ ISD::SRA, MVT::v4i64, 4 }, // 2 x psrad + shuffle.
|
2014-04-26 22:53:05 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
|
|
|
|
ST->hasAVX2()) {
|
2015-10-27 12:14:24 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(AVX2UniformConstCostTable, ISD,
|
|
|
|
LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
2014-04-26 22:53:05 +08:00
|
|
|
}
|
|
|
|
|
2016-10-21 02:00:35 +08:00
|
|
|
static const CostTblEntry SSE2UniformConstCostTable[] = {
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::SHL, MVT::v16i8, 2 }, // psllw + pand.
|
|
|
|
{ ISD::SRL, MVT::v16i8, 2 }, // psrlw + pand.
|
|
|
|
{ ISD::SRA, MVT::v16i8, 4 }, // psrlw, pand, pxor, psubb.
|
|
|
|
|
|
|
|
{ ISD::SHL, MVT::v32i8, 4+2 }, // 2*(psllw + pand) + split.
|
|
|
|
{ ISD::SRL, MVT::v32i8, 4+2 }, // 2*(psrlw + pand) + split.
|
|
|
|
{ ISD::SRA, MVT::v32i8, 8+2 }, // 2*(psrlw, pand, pxor, psubb) + split.
|
2018-10-25 01:30:29 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
// XOP has faster vXi8 shifts.
|
|
|
|
if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
|
|
|
|
ST->hasSSE2() && !ST->hasXOP()) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(SSE2UniformConstCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512BWConstCostTable[] = {
|
2018-10-25 02:44:12 +08:00
|
|
|
{ ISD::SDIV, MVT::v64i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::SREM, MVT::v64i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v64i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::UREM, MVT::v64i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
2018-10-25 01:30:29 +08:00
|
|
|
{ ISD::SDIV, MVT::v32i16, 6 }, // vpmulhw sequence
|
|
|
|
{ ISD::SREM, MVT::v32i16, 8 }, // vpmulhw+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v32i16, 6 }, // vpmulhuw sequence
|
|
|
|
{ ISD::UREM, MVT::v32i16, 8 }, // vpmulhuw+mul+sub sequence
|
|
|
|
};
|
|
|
|
|
|
|
|
if ((Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) &&
|
|
|
|
ST->hasBWI()) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512BWConstCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512ConstCostTable[] = {
|
|
|
|
{ ISD::SDIV, MVT::v16i32, 15 }, // vpmuldq sequence
|
|
|
|
{ ISD::SREM, MVT::v16i32, 17 }, // vpmuldq+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v16i32, 15 }, // vpmuludq sequence
|
|
|
|
{ ISD::UREM, MVT::v16i32, 17 }, // vpmuludq+mul+sub sequence
|
|
|
|
};
|
|
|
|
|
|
|
|
if ((Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) &&
|
|
|
|
ST->hasAVX512()) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512ConstCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2ConstCostTable[] = {
|
2018-10-25 02:44:12 +08:00
|
|
|
{ ISD::SDIV, MVT::v32i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::SREM, MVT::v32i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v32i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::UREM, MVT::v32i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
2018-10-25 01:30:29 +08:00
|
|
|
{ ISD::SDIV, MVT::v16i16, 6 }, // vpmulhw sequence
|
|
|
|
{ ISD::SREM, MVT::v16i16, 8 }, // vpmulhw+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v16i16, 6 }, // vpmulhuw sequence
|
|
|
|
{ ISD::UREM, MVT::v16i16, 8 }, // vpmulhuw+mul+sub sequence
|
|
|
|
{ ISD::SDIV, MVT::v8i32, 15 }, // vpmuldq sequence
|
|
|
|
{ ISD::SREM, MVT::v8i32, 19 }, // vpmuldq+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v8i32, 15 }, // vpmuludq sequence
|
|
|
|
{ ISD::UREM, MVT::v8i32, 19 }, // vpmuludq+mul+sub sequence
|
|
|
|
};
|
2017-05-15 02:52:15 +08:00
|
|
|
|
2018-10-25 01:30:29 +08:00
|
|
|
if ((Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) &&
|
|
|
|
ST->hasAVX2()) {
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2ConstCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2ConstCostTable[] = {
|
2018-10-25 02:44:12 +08:00
|
|
|
{ ISD::SDIV, MVT::v32i8, 28+2 }, // 4*ext+4*pmulhw sequence + split.
|
|
|
|
{ ISD::SREM, MVT::v32i8, 32+2 }, // 4*ext+4*pmulhw+mul+sub sequence + split.
|
|
|
|
{ ISD::SDIV, MVT::v16i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::SREM, MVT::v16i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
|
|
|
{ ISD::UDIV, MVT::v32i8, 28+2 }, // 4*ext+4*pmulhw sequence + split.
|
|
|
|
{ ISD::UREM, MVT::v32i8, 32+2 }, // 4*ext+4*pmulhw+mul+sub sequence + split.
|
|
|
|
{ ISD::UDIV, MVT::v16i8, 14 }, // 2*ext+2*pmulhw sequence
|
|
|
|
{ ISD::UREM, MVT::v16i8, 16 }, // 2*ext+2*pmulhw+mul+sub sequence
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::SDIV, MVT::v16i16, 12+2 }, // 2*pmulhw sequence + split.
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::SREM, MVT::v16i16, 16+2 }, // 2*pmulhw+mul+sub sequence + split.
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::SDIV, MVT::v8i16, 6 }, // pmulhw sequence
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::SREM, MVT::v8i16, 8 }, // pmulhw+mul+sub sequence
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::UDIV, MVT::v16i16, 12+2 }, // 2*pmulhuw sequence + split.
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::UREM, MVT::v16i16, 16+2 }, // 2*pmulhuw+mul+sub sequence + split.
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::UDIV, MVT::v8i16, 6 }, // pmulhuw sequence
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::UREM, MVT::v8i16, 8 }, // pmulhuw+mul+sub sequence
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::SDIV, MVT::v8i32, 38+2 }, // 2*pmuludq sequence + split.
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::SREM, MVT::v8i32, 48+2 }, // 2*pmuludq+mul+sub sequence + split.
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::SDIV, MVT::v4i32, 19 }, // pmuludq sequence
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::SREM, MVT::v4i32, 24 }, // pmuludq+mul+sub sequence
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::UDIV, MVT::v8i32, 30+2 }, // 2*pmuludq sequence + split.
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::UREM, MVT::v8i32, 40+2 }, // 2*pmuludq+mul+sub sequence + split.
|
2017-05-15 02:52:15 +08:00
|
|
|
{ ISD::UDIV, MVT::v4i32, 15 }, // pmuludq sequence
|
2018-07-08 00:53:30 +08:00
|
|
|
{ ISD::UREM, MVT::v4i32, 20 }, // pmuludq+mul+sub sequence
|
2016-10-21 02:00:35 +08:00
|
|
|
};
|
|
|
|
|
2018-10-25 01:30:29 +08:00
|
|
|
if ((Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) &&
|
2016-10-21 02:00:35 +08:00
|
|
|
ST->hasSSE2()) {
|
|
|
|
// pmuldq sequence.
|
|
|
|
if (ISD == ISD::SDIV && LT.second == MVT::v8i32 && ST->hasAVX())
|
2017-05-15 02:52:15 +08:00
|
|
|
return LT.first * 32;
|
2018-07-08 00:53:30 +08:00
|
|
|
if (ISD == ISD::SREM && LT.second == MVT::v8i32 && ST->hasAVX())
|
|
|
|
return LT.first * 38;
|
2016-10-21 02:00:35 +08:00
|
|
|
if (ISD == ISD::SDIV && LT.second == MVT::v4i32 && ST->hasSSE41())
|
|
|
|
return LT.first * 15;
|
2018-07-08 00:53:30 +08:00
|
|
|
if (ISD == ISD::SREM && LT.second == MVT::v4i32 && ST->hasSSE41())
|
|
|
|
return LT.first * 20;
|
2016-10-21 02:00:35 +08:00
|
|
|
|
2018-10-25 01:30:29 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(SSE2ConstCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-10-21 02:00:35 +08:00
|
|
|
}
|
|
|
|
|
2017-01-08 21:12:03 +08:00
|
|
|
static const CostTblEntry AVX2UniformCostTable[] = {
|
|
|
|
// Uniform splats are cheaper for the following instructions.
|
|
|
|
{ ISD::SHL, MVT::v16i16, 1 }, // psllw.
|
|
|
|
{ ISD::SRL, MVT::v16i16, 1 }, // psrlw.
|
|
|
|
{ ISD::SRA, MVT::v16i16, 1 }, // psraw.
|
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasAVX2() &&
|
|
|
|
((Op2Info == TargetTransformInfo::OK_UniformConstantValue) ||
|
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformValue))) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX2UniformCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2UniformCostTable[] = {
|
|
|
|
// Uniform splats are cheaper for the following instructions.
|
|
|
|
{ ISD::SHL, MVT::v8i16, 1 }, // psllw.
|
|
|
|
{ ISD::SHL, MVT::v4i32, 1 }, // pslld
|
|
|
|
{ ISD::SHL, MVT::v2i64, 1 }, // psllq.
|
|
|
|
|
|
|
|
{ ISD::SRL, MVT::v8i16, 1 }, // psrlw.
|
|
|
|
{ ISD::SRL, MVT::v4i32, 1 }, // psrld.
|
|
|
|
{ ISD::SRL, MVT::v2i64, 1 }, // psrlq.
|
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v8i16, 1 }, // psraw.
|
|
|
|
{ ISD::SRA, MVT::v4i32, 1 }, // psrad.
|
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE2() &&
|
|
|
|
((Op2Info == TargetTransformInfo::OK_UniformConstantValue) ||
|
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformValue))) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(SSE2UniformCostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2016-10-27 23:27:00 +08:00
|
|
|
static const CostTblEntry AVX512DQCostTable[] = {
|
|
|
|
{ ISD::MUL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::MUL, MVT::v4i64, 1 },
|
|
|
|
{ ISD::MUL, MVT::v8i64, 1 }
|
|
|
|
};
|
|
|
|
|
|
|
|
// Look for AVX512DQ lowering tricks for custom cases.
|
2017-01-06 06:48:02 +08:00
|
|
|
if (ST->hasDQI())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512DQCostTable, ISD, LT.second))
|
2016-10-27 23:27:00 +08:00
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2016-10-21 00:39:11 +08:00
|
|
|
static const CostTblEntry AVX512BWCostTable[] = {
|
2017-01-16 04:44:00 +08:00
|
|
|
{ ISD::SHL, MVT::v8i16, 1 }, // vpsllvw
|
|
|
|
{ ISD::SRL, MVT::v8i16, 1 }, // vpsrlvw
|
|
|
|
{ ISD::SRA, MVT::v8i16, 1 }, // vpsravw
|
|
|
|
|
|
|
|
{ ISD::SHL, MVT::v16i16, 1 }, // vpsllvw
|
|
|
|
{ ISD::SRL, MVT::v16i16, 1 }, // vpsrlvw
|
|
|
|
{ ISD::SRA, MVT::v16i16, 1 }, // vpsravw
|
|
|
|
|
2017-01-08 01:54:10 +08:00
|
|
|
{ ISD::SHL, MVT::v32i16, 1 }, // vpsllvw
|
|
|
|
{ ISD::SRL, MVT::v32i16, 1 }, // vpsrlvw
|
|
|
|
{ ISD::SRA, MVT::v32i16, 1 }, // vpsravw
|
|
|
|
|
2017-01-11 18:36:51 +08:00
|
|
|
{ ISD::SHL, MVT::v64i8, 11 }, // vpblendvb sequence.
|
|
|
|
{ ISD::SRL, MVT::v64i8, 11 }, // vpblendvb sequence.
|
|
|
|
{ ISD::SRA, MVT::v64i8, 24 }, // vpblendvb sequence.
|
|
|
|
|
2016-11-14 23:54:24 +08:00
|
|
|
{ ISD::MUL, MVT::v64i8, 11 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v32i8, 4 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v16i8, 4 }, // extend/pmullw/trunc sequence.
|
2016-10-21 00:39:11 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
// Look for AVX512BW lowering tricks for custom cases.
|
2017-01-06 06:48:02 +08:00
|
|
|
if (ST->hasBWI())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512BWCostTable, ISD, LT.second))
|
2016-10-21 00:39:11 +08:00
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry AVX512CostTable[] = {
|
2017-01-06 19:12:53 +08:00
|
|
|
{ ISD::SHL, MVT::v16i32, 1 },
|
|
|
|
{ ISD::SRL, MVT::v16i32, 1 },
|
|
|
|
{ ISD::SRA, MVT::v16i32, 1 },
|
2017-01-15 03:24:23 +08:00
|
|
|
|
2017-01-06 19:12:53 +08:00
|
|
|
{ ISD::SHL, MVT::v8i64, 1 },
|
|
|
|
{ ISD::SRL, MVT::v8i64, 1 },
|
2017-01-15 03:24:23 +08:00
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SRA, MVT::v4i64, 1 },
|
2017-01-06 19:12:53 +08:00
|
|
|
{ ISD::SRA, MVT::v8i64, 1 },
|
|
|
|
|
|
|
|
{ ISD::MUL, MVT::v32i8, 13 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v16i8, 5 }, // extend/pmullw/trunc sequence.
|
2018-02-11 03:27:10 +08:00
|
|
|
{ ISD::MUL, MVT::v16i32, 1 }, // pmulld (Skylake from agner.org)
|
|
|
|
{ ISD::MUL, MVT::v8i32, 1 }, // pmulld (Skylake from agner.org)
|
|
|
|
{ ISD::MUL, MVT::v4i32, 1 }, // pmulld (Skylake from agner.org)
|
2017-01-06 19:12:53 +08:00
|
|
|
{ ISD::MUL, MVT::v8i64, 8 }, // 3*pmuludq/3*shift/2*add
|
|
|
|
|
2018-02-27 06:10:17 +08:00
|
|
|
{ ISD::FADD, MVT::v8f64, 1 }, // Skylake from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v8f64, 1 }, // Skylake from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v8f64, 1 }, // Skylake from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::FADD, MVT::v16f32, 1 }, // Skylake from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v16f32, 1 }, // Skylake from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v16f32, 1 }, // Skylake from http://www.agner.org/
|
2014-09-16 15:57:37 +08:00
|
|
|
};
|
|
|
|
|
2017-01-06 06:48:02 +08:00
|
|
|
if (ST->hasAVX512())
|
2015-10-27 12:14:24 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
2015-09-30 16:17:50 +08:00
|
|
|
|
2017-01-08 05:47:10 +08:00
|
|
|
static const CostTblEntry AVX2ShiftCostTable[] = {
|
2013-03-21 06:01:10 +08:00
|
|
|
// Shifts on v4i64/v8i32 on AVX2 is legal even though we declare to
|
|
|
|
// customize them to detect the cases where shift amount is a scalar one.
|
|
|
|
{ ISD::SHL, MVT::v4i32, 1 },
|
|
|
|
{ ISD::SRL, MVT::v4i32, 1 },
|
|
|
|
{ ISD::SRA, MVT::v4i32, 1 },
|
|
|
|
{ ISD::SHL, MVT::v8i32, 1 },
|
|
|
|
{ ISD::SRL, MVT::v8i32, 1 },
|
|
|
|
{ ISD::SRA, MVT::v8i32, 1 },
|
|
|
|
{ ISD::SHL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SRL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SHL, MVT::v4i64, 1 },
|
|
|
|
{ ISD::SRL, MVT::v4i64, 1 },
|
2015-09-30 16:17:50 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
// Look for AVX2 lowering tricks.
|
|
|
|
if (ST->hasAVX2()) {
|
|
|
|
if (ISD == ISD::SHL && LT.second == MVT::v16i16 &&
|
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue))
|
|
|
|
// On AVX2, a packed v16i16 shift left by a constant build_vector
|
|
|
|
// is lowered into a vector multiply (vpmullw).
|
2018-04-25 23:22:03 +08:00
|
|
|
return getArithmeticInstrCost(Instruction::Mul, Ty, Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None,
|
|
|
|
TargetTransformInfo::OP_None);
|
2013-04-04 05:46:05 +08:00
|
|
|
|
2017-01-08 05:47:10 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(AVX2ShiftCostTable, ISD, LT.second))
|
2015-10-27 12:14:24 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2015-09-30 16:17:50 +08:00
|
|
|
}
|
|
|
|
|
2017-01-08 05:47:10 +08:00
|
|
|
static const CostTblEntry XOPShiftCostTable[] = {
|
2015-09-30 16:17:50 +08:00
|
|
|
// 128bit shifts take 1cy, but right shifts require negation beforehand.
|
|
|
|
{ ISD::SHL, MVT::v16i8, 1 },
|
|
|
|
{ ISD::SRL, MVT::v16i8, 2 },
|
|
|
|
{ ISD::SRA, MVT::v16i8, 2 },
|
|
|
|
{ ISD::SHL, MVT::v8i16, 1 },
|
|
|
|
{ ISD::SRL, MVT::v8i16, 2 },
|
|
|
|
{ ISD::SRA, MVT::v8i16, 2 },
|
|
|
|
{ ISD::SHL, MVT::v4i32, 1 },
|
|
|
|
{ ISD::SRL, MVT::v4i32, 2 },
|
|
|
|
{ ISD::SRA, MVT::v4i32, 2 },
|
|
|
|
{ ISD::SHL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SRL, MVT::v2i64, 2 },
|
|
|
|
{ ISD::SRA, MVT::v2i64, 2 },
|
|
|
|
// 256bit shifts require splitting if AVX2 didn't catch them above.
|
2017-05-14 21:38:53 +08:00
|
|
|
{ ISD::SHL, MVT::v32i8, 2+2 },
|
|
|
|
{ ISD::SRL, MVT::v32i8, 4+2 },
|
|
|
|
{ ISD::SRA, MVT::v32i8, 4+2 },
|
|
|
|
{ ISD::SHL, MVT::v16i16, 2+2 },
|
|
|
|
{ ISD::SRL, MVT::v16i16, 4+2 },
|
|
|
|
{ ISD::SRA, MVT::v16i16, 4+2 },
|
|
|
|
{ ISD::SHL, MVT::v8i32, 2+2 },
|
|
|
|
{ ISD::SRL, MVT::v8i32, 4+2 },
|
|
|
|
{ ISD::SRA, MVT::v8i32, 4+2 },
|
|
|
|
{ ISD::SHL, MVT::v4i64, 2+2 },
|
|
|
|
{ ISD::SRL, MVT::v4i64, 4+2 },
|
|
|
|
{ ISD::SRA, MVT::v4i64, 4+2 },
|
2015-09-30 16:17:50 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
// Look for XOP lowering tricks.
|
2018-11-14 00:40:10 +08:00
|
|
|
if (ST->hasXOP()) {
|
|
|
|
// If the right shift is constant then we'll fold the negation so
|
|
|
|
// it's as cheap as a left shift.
|
|
|
|
int ShiftISD = ISD;
|
|
|
|
if ((ShiftISD == ISD::SRL || ShiftISD == ISD::SRA) &&
|
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformConstantValue ||
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue))
|
|
|
|
ShiftISD = ISD::SHL;
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(XOPShiftCostTable, ShiftISD, LT.second))
|
2016-10-21 00:39:11 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2018-11-14 00:40:10 +08:00
|
|
|
}
|
2016-10-21 00:39:11 +08:00
|
|
|
|
2017-01-08 21:12:03 +08:00
|
|
|
static const CostTblEntry SSE2UniformShiftCostTable[] = {
|
2016-08-05 06:48:03 +08:00
|
|
|
// Uniform splats are cheaper for the following instructions.
|
2017-05-15 04:02:34 +08:00
|
|
|
{ ISD::SHL, MVT::v16i16, 2+2 }, // 2*psllw + split.
|
|
|
|
{ ISD::SHL, MVT::v8i32, 2+2 }, // 2*pslld + split.
|
|
|
|
{ ISD::SHL, MVT::v4i64, 2+2 }, // 2*psllq + split.
|
|
|
|
|
|
|
|
{ ISD::SRL, MVT::v16i16, 2+2 }, // 2*psrlw + split.
|
|
|
|
{ ISD::SRL, MVT::v8i32, 2+2 }, // 2*psrld + split.
|
|
|
|
{ ISD::SRL, MVT::v4i64, 2+2 }, // 2*psrlq + split.
|
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v16i16, 2+2 }, // 2*psraw + split.
|
|
|
|
{ ISD::SRA, MVT::v8i32, 2+2 }, // 2*psrad + split.
|
|
|
|
{ ISD::SRA, MVT::v2i64, 4 }, // 2*psrad + shuffle.
|
|
|
|
{ ISD::SRA, MVT::v4i64, 8+2 }, // 2*(2*psrad + shuffle) + split.
|
2013-04-05 07:26:24 +08:00
|
|
|
};
|
|
|
|
|
2016-08-05 06:48:03 +08:00
|
|
|
if (ST->hasSSE2() &&
|
|
|
|
((Op2Info == TargetTransformInfo::OK_UniformConstantValue) ||
|
|
|
|
(Op2Info == TargetTransformInfo::OK_UniformValue))) {
|
2017-05-15 04:25:42 +08:00
|
|
|
|
|
|
|
// Handle AVX2 uniform v4i64 ISD::SRA, it's not worth a table.
|
|
|
|
if (ISD == ISD::SRA && LT.second == MVT::v4i64 && ST->hasAVX2())
|
|
|
|
return LT.first * 4; // 2*psrad + shuffle.
|
|
|
|
|
2016-08-05 06:48:03 +08:00
|
|
|
if (const auto *Entry =
|
2017-01-08 21:12:03 +08:00
|
|
|
CostTableLookup(SSE2UniformShiftCostTable, ISD, LT.second))
|
2015-10-27 12:14:24 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2013-04-05 07:26:24 +08:00
|
|
|
}
|
|
|
|
|
2014-02-13 07:43:47 +08:00
|
|
|
if (ISD == ISD::SHL &&
|
|
|
|
Op2Info == TargetTransformInfo::OK_NonUniformConstantValue) {
|
2015-10-25 11:15:29 +08:00
|
|
|
MVT VT = LT.second;
|
2015-10-17 21:23:38 +08:00
|
|
|
// Vector shift left by non uniform constant can be lowered
|
2017-01-08 05:33:00 +08:00
|
|
|
// into vector multiply.
|
|
|
|
if (((VT == MVT::v8i16 || VT == MVT::v4i32) && ST->hasSSE2()) ||
|
|
|
|
((VT == MVT::v16i16 || VT == MVT::v8i32) && ST->hasAVX()))
|
2014-02-13 07:43:47 +08:00
|
|
|
ISD = ISD::MUL;
|
|
|
|
}
|
2013-04-05 07:26:24 +08:00
|
|
|
|
2017-01-08 05:47:10 +08:00
|
|
|
static const CostTblEntry AVX2CostTable[] = {
|
|
|
|
{ ISD::SHL, MVT::v32i8, 11 }, // vpblendvb sequence.
|
|
|
|
{ ISD::SHL, MVT::v16i16, 10 }, // extend/vpsrlvd/pack sequence.
|
|
|
|
|
|
|
|
{ ISD::SRL, MVT::v32i8, 11 }, // vpblendvb sequence.
|
|
|
|
{ ISD::SRL, MVT::v16i16, 10 }, // extend/vpsrlvd/pack sequence.
|
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v32i8, 24 }, // vpblendvb sequence.
|
|
|
|
{ ISD::SRA, MVT::v16i16, 10 }, // extend/vpsravd/pack sequence.
|
|
|
|
{ ISD::SRA, MVT::v2i64, 4 }, // srl/xor/sub sequence.
|
|
|
|
{ ISD::SRA, MVT::v4i64, 4 }, // srl/xor/sub sequence.
|
|
|
|
|
|
|
|
{ ISD::SUB, MVT::v32i8, 1 }, // psubb
|
|
|
|
{ ISD::ADD, MVT::v32i8, 1 }, // paddb
|
|
|
|
{ ISD::SUB, MVT::v16i16, 1 }, // psubw
|
|
|
|
{ ISD::ADD, MVT::v16i16, 1 }, // paddw
|
|
|
|
{ ISD::SUB, MVT::v8i32, 1 }, // psubd
|
|
|
|
{ ISD::ADD, MVT::v8i32, 1 }, // paddd
|
|
|
|
{ ISD::SUB, MVT::v4i64, 1 }, // psubq
|
|
|
|
{ ISD::ADD, MVT::v4i64, 1 }, // paddq
|
|
|
|
|
|
|
|
{ ISD::MUL, MVT::v32i8, 17 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v16i8, 7 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v16i16, 1 }, // pmullw
|
2018-02-11 03:27:10 +08:00
|
|
|
{ ISD::MUL, MVT::v8i32, 2 }, // pmulld (Haswell from agner.org)
|
2017-01-08 05:47:10 +08:00
|
|
|
{ ISD::MUL, MVT::v4i64, 8 }, // 3*pmuludq/3*shift/2*add
|
|
|
|
|
2018-02-27 06:10:17 +08:00
|
|
|
{ ISD::FADD, MVT::v4f64, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::v8f32, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v4f64, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v8f32, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v4f64, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v8f32, 1 }, // Haswell from http://www.agner.org/
|
|
|
|
|
2017-01-08 05:47:10 +08:00
|
|
|
{ ISD::FDIV, MVT::f32, 7 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 7 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v8f32, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::f64, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f64, 28 }, // Haswell from http://www.agner.org/
|
|
|
|
};
|
|
|
|
|
|
|
|
// Look for AVX2 lowering tricks for custom cases.
|
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2017-01-08 01:03:51 +08:00
|
|
|
static const CostTblEntry AVX1CostTable[] = {
|
|
|
|
// We don't have to scalarize unsupported ops. We can issue two half-sized
|
|
|
|
// operations and we only need to extract the upper YMM half.
|
|
|
|
// Two ops + 1 extract + 1 insert = 4.
|
2017-01-08 02:19:25 +08:00
|
|
|
{ ISD::MUL, MVT::v16i16, 4 },
|
|
|
|
{ ISD::MUL, MVT::v8i32, 4 },
|
|
|
|
{ ISD::SUB, MVT::v32i8, 4 },
|
|
|
|
{ ISD::ADD, MVT::v32i8, 4 },
|
|
|
|
{ ISD::SUB, MVT::v16i16, 4 },
|
|
|
|
{ ISD::ADD, MVT::v16i16, 4 },
|
|
|
|
{ ISD::SUB, MVT::v8i32, 4 },
|
|
|
|
{ ISD::ADD, MVT::v8i32, 4 },
|
|
|
|
{ ISD::SUB, MVT::v4i64, 4 },
|
|
|
|
{ ISD::ADD, MVT::v4i64, 4 },
|
2017-01-08 01:03:51 +08:00
|
|
|
|
|
|
|
// A v4i64 multiply is custom lowered as two split v2i64 vectors that then
|
|
|
|
// are lowered as a series of long multiplies(3), shifts(3) and adds(2)
|
|
|
|
// Because we believe v4i64 to be a legal type, we must also include the
|
|
|
|
// extract+insert in the cost table. Therefore, the cost here is 18
|
|
|
|
// instead of 8.
|
2017-01-08 02:19:25 +08:00
|
|
|
{ ISD::MUL, MVT::v4i64, 18 },
|
|
|
|
|
|
|
|
{ ISD::MUL, MVT::v32i8, 26 }, // extend/pmullw/trunc sequence.
|
|
|
|
|
|
|
|
{ ISD::FDIV, MVT::f32, 14 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 14 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v8f32, 28 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::f64, 22 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 22 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f64, 44 }, // SNB from http://www.agner.org/
|
2017-01-08 01:03:51 +08:00
|
|
|
};
|
|
|
|
|
2017-01-08 01:27:39 +08:00
|
|
|
if (ST->hasAVX())
|
2017-01-08 01:03:51 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2017-01-06 03:19:39 +08:00
|
|
|
static const CostTblEntry SSE42CostTable[] = {
|
2018-02-27 06:10:17 +08:00
|
|
|
{ ISD::FADD, MVT::f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::f32, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::v2f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::v4f32, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::FSUB, MVT::f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::f32 , 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v2f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v4f32, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::FMUL, MVT::f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::f32, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v2f64, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FMUL, MVT::v4f32, 1 }, // Nehalem from http://www.agner.org/
|
|
|
|
|
2017-01-06 03:19:39 +08:00
|
|
|
{ ISD::FDIV, MVT::f32, 14 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 14 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::f64, 22 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 22 }, // Nehalem from http://www.agner.org/
|
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE42())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE42CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2016-10-24 00:49:04 +08:00
|
|
|
static const CostTblEntry SSE41CostTable[] = {
|
2017-05-15 04:52:11 +08:00
|
|
|
{ ISD::SHL, MVT::v16i8, 11 }, // pblendvb sequence.
|
|
|
|
{ ISD::SHL, MVT::v32i8, 2*11+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SHL, MVT::v8i16, 14 }, // pblendvb sequence.
|
|
|
|
{ ISD::SHL, MVT::v16i16, 2*14+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SHL, MVT::v4i32, 4 }, // pslld/paddd/cvttps2dq/pmulld
|
|
|
|
{ ISD::SHL, MVT::v8i32, 2*4+2 }, // pslld/paddd/cvttps2dq/pmulld + split
|
|
|
|
|
|
|
|
{ ISD::SRL, MVT::v16i8, 12 }, // pblendvb sequence.
|
|
|
|
{ ISD::SRL, MVT::v32i8, 2*12+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SRL, MVT::v8i16, 14 }, // pblendvb sequence.
|
|
|
|
{ ISD::SRL, MVT::v16i16, 2*14+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SRL, MVT::v4i32, 11 }, // Shift each lane + blend.
|
|
|
|
{ ISD::SRL, MVT::v8i32, 2*11+2 }, // Shift each lane + blend + split.
|
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v16i8, 24 }, // pblendvb sequence.
|
|
|
|
{ ISD::SRA, MVT::v32i8, 2*24+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SRA, MVT::v8i16, 14 }, // pblendvb sequence.
|
|
|
|
{ ISD::SRA, MVT::v16i16, 2*14+2 }, // pblendvb sequence + split.
|
|
|
|
{ ISD::SRA, MVT::v4i32, 12 }, // Shift each lane + blend.
|
|
|
|
{ ISD::SRA, MVT::v8i32, 2*12+2 }, // Shift each lane + blend + split.
|
|
|
|
|
2018-02-11 03:27:10 +08:00
|
|
|
{ ISD::MUL, MVT::v4i32, 2 } // pmulld (Nehalem from agner.org)
|
2016-10-24 00:49:04 +08:00
|
|
|
};
|
|
|
|
|
2017-01-06 06:48:02 +08:00
|
|
|
if (ST->hasSSE41())
|
2016-10-24 00:49:04 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(SSE41CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry SSE2CostTable[] = {
|
2013-04-04 05:46:05 +08:00
|
|
|
// We don't correctly identify costs of casts because they are marked as
|
|
|
|
// custom.
|
2017-05-15 04:52:11 +08:00
|
|
|
{ ISD::SHL, MVT::v16i8, 26 }, // cmpgtb sequence.
|
|
|
|
{ ISD::SHL, MVT::v8i16, 32 }, // cmpgtb sequence.
|
|
|
|
{ ISD::SHL, MVT::v4i32, 2*5 }, // We optimized this using mul.
|
|
|
|
{ ISD::SHL, MVT::v2i64, 4 }, // splat+shuffle sequence.
|
|
|
|
{ ISD::SHL, MVT::v4i64, 2*4+2 }, // splat+shuffle sequence + split.
|
|
|
|
|
|
|
|
{ ISD::SRL, MVT::v16i8, 26 }, // cmpgtb sequence.
|
|
|
|
{ ISD::SRL, MVT::v8i16, 32 }, // cmpgtb sequence.
|
|
|
|
{ ISD::SRL, MVT::v4i32, 16 }, // Shift each lane + blend.
|
|
|
|
{ ISD::SRL, MVT::v2i64, 4 }, // splat+shuffle sequence.
|
|
|
|
{ ISD::SRL, MVT::v4i64, 2*4+2 }, // splat+shuffle sequence + split.
|
|
|
|
|
|
|
|
{ ISD::SRA, MVT::v16i8, 54 }, // unpacked cmpgtb sequence.
|
|
|
|
{ ISD::SRA, MVT::v8i16, 32 }, // cmpgtb sequence.
|
|
|
|
{ ISD::SRA, MVT::v4i32, 16 }, // Shift each lane + blend.
|
|
|
|
{ ISD::SRA, MVT::v2i64, 12 }, // srl/xor/sub sequence.
|
|
|
|
{ ISD::SRA, MVT::v4i64, 2*12+2 }, // srl/xor/sub sequence+split.
|
|
|
|
|
|
|
|
{ ISD::MUL, MVT::v16i8, 12 }, // extend/pmullw/trunc sequence.
|
|
|
|
{ ISD::MUL, MVT::v8i16, 1 }, // pmullw
|
|
|
|
{ ISD::MUL, MVT::v4i32, 6 }, // 3*pmuludq/4*shuffle
|
|
|
|
{ ISD::MUL, MVT::v2i64, 8 }, // 3*pmuludq/3*shift/2*add
|
|
|
|
|
|
|
|
{ ISD::FDIV, MVT::f32, 23 }, // Pentium IV from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 39 }, // Pentium IV from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::f64, 38 }, // Pentium IV from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v2f64, 69 }, // Pentium IV from http://www.agner.org/
|
2019-01-05 00:55:57 +08:00
|
|
|
|
|
|
|
{ ISD::FADD, MVT::f32, 2 }, // Pentium IV from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::f64, 2 }, // Pentium IV from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::FSUB, MVT::f32, 2 }, // Pentium IV from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::f64, 2 }, // Pentium IV from http://www.agner.org/
|
2013-04-04 05:46:05 +08:00
|
|
|
};
|
|
|
|
|
2017-01-06 06:48:02 +08:00
|
|
|
if (ST->hasSSE2())
|
2015-10-27 12:14:24 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTable, ISD, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
2013-04-04 05:46:05 +08:00
|
|
|
|
2017-01-06 06:48:02 +08:00
|
|
|
static const CostTblEntry SSE1CostTable[] = {
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::FDIV, MVT::f32, 17 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::FDIV, MVT::v4f32, 34 }, // Pentium III from http://www.agner.org/
|
2019-01-05 00:55:57 +08:00
|
|
|
|
|
|
|
{ ISD::FADD, MVT::f32, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::FADD, MVT::v4f32, 2 }, // Pentium III from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::FSUB, MVT::f32, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::FSUB, MVT::v4f32, 2 }, // Pentium III from http://www.agner.org/
|
2019-01-07 00:21:42 +08:00
|
|
|
|
|
|
|
{ ISD::ADD, MVT::i8, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::ADD, MVT::i16, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::ADD, MVT::i32, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
|
|
|
|
{ ISD::SUB, MVT::i8, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::SUB, MVT::i16, 1 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::SUB, MVT::i32, 1 }, // Pentium III from http://www.agner.org/
|
2016-10-31 20:10:53 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE1())
|
2017-01-06 06:48:02 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(SSE1CostTable, ISD, LT.second))
|
2016-10-31 20:10:53 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2017-01-06 06:48:02 +08:00
|
|
|
|
2018-04-26 04:59:16 +08:00
|
|
|
// It is not a good idea to vectorize division. We have to scalarize it and
|
|
|
|
// in the process we will often end up having to spilling regular
|
|
|
|
// registers. The overhead of division is going to dominate most kernels
|
|
|
|
// anyways so try hard to prevent vectorization of division - it is
|
|
|
|
// generally a bad idea. Assume somewhat arbitrarily that we have to be able
|
|
|
|
// to hide "20 cycles" for each lane.
|
2018-07-08 00:53:30 +08:00
|
|
|
if (LT.second.isVector() && (ISD == ISD::SDIV || ISD == ISD::SREM ||
|
|
|
|
ISD == ISD::UDIV || ISD == ISD::UREM)) {
|
2018-04-26 04:59:16 +08:00
|
|
|
int ScalarCost = getArithmeticInstrCost(
|
|
|
|
Opcode, Ty->getScalarType(), Op1Info, Op2Info,
|
|
|
|
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None);
|
|
|
|
return 20 * LT.first * LT.second.getVectorNumElements() * ScalarCost;
|
|
|
|
}
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// Fallback to the default implementation.
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
|
|
|
|
Type *SubTp) {
|
2017-01-05 22:33:32 +08:00
|
|
|
// 64-bit packed float vectors (v2f32) are widened to type v4f32.
|
2019-08-08 00:24:26 +08:00
|
|
|
// 64-bit packed integer vectors (v2i32) are widened to type v4i32.
|
2017-01-05 22:33:32 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2018-10-24 00:45:26 +08:00
|
|
|
// Treat Transpose as 2-op shuffles - there's no difference in lowering.
|
|
|
|
if (Kind == TTI::SK_Transpose)
|
|
|
|
Kind = TTI::SK_PermuteTwoSrc;
|
|
|
|
|
2017-01-06 01:56:19 +08:00
|
|
|
// For Broadcasts we are splatting the first element from the first input
|
|
|
|
// register, so only need to reference that input and all the output
|
|
|
|
// registers are the same.
|
|
|
|
if (Kind == TTI::SK_Broadcast)
|
|
|
|
LT.first = 1;
|
|
|
|
|
2018-11-12 23:48:06 +08:00
|
|
|
// Subvector extractions are free if they start at the beginning of a
|
|
|
|
// vector and cheap if the subvectors are aligned.
|
|
|
|
if (Kind == TTI::SK_ExtractSubvector && LT.second.isVector()) {
|
|
|
|
int NumElts = LT.second.getVectorNumElements();
|
|
|
|
if ((Index % NumElts) == 0)
|
|
|
|
return 0;
|
|
|
|
std::pair<int, MVT> SubLT = TLI->getTypeLegalizationCost(DL, SubTp);
|
|
|
|
if (SubLT.second.isVector()) {
|
|
|
|
int NumSubElts = SubLT.second.getVectorNumElements();
|
|
|
|
if ((Index % NumSubElts) == 0 && (NumElts % NumSubElts) == 0)
|
|
|
|
return SubLT.first;
|
2019-08-16 01:29:42 +08:00
|
|
|
// Handle some cases for widening legalization. For now we only handle
|
|
|
|
// cases where the original subvector was naturally aligned and evenly
|
|
|
|
// fit in its legalized subvector type.
|
|
|
|
// FIXME: Remove some of the alignment restrictions.
|
|
|
|
// FIXME: We can use permq for 64-bit or larger extracts from 256-bit
|
|
|
|
// vectors.
|
|
|
|
int OrigSubElts = SubTp->getVectorNumElements();
|
2019-09-30 07:32:37 +08:00
|
|
|
if (NumSubElts > OrigSubElts &&
|
2019-08-16 01:29:42 +08:00
|
|
|
(Index % OrigSubElts) == 0 && (NumSubElts % OrigSubElts) == 0 &&
|
|
|
|
LT.second.getVectorElementType() ==
|
|
|
|
SubLT.second.getVectorElementType() &&
|
|
|
|
LT.second.getVectorElementType().getSizeInBits() ==
|
|
|
|
Tp->getVectorElementType()->getPrimitiveSizeInBits()) {
|
|
|
|
assert(NumElts >= NumSubElts && NumElts > OrigSubElts &&
|
|
|
|
"Unexpected number of elements!");
|
|
|
|
Type *VecTy = VectorType::get(Tp->getVectorElementType(),
|
|
|
|
LT.second.getVectorNumElements());
|
|
|
|
Type *SubTy = VectorType::get(Tp->getVectorElementType(),
|
|
|
|
SubLT.second.getVectorNumElements());
|
|
|
|
int ExtractIndex = alignDown((Index % NumElts), NumSubElts);
|
|
|
|
int ExtractCost = getShuffleCost(TTI::SK_ExtractSubvector, VecTy,
|
|
|
|
ExtractIndex, SubTy);
|
|
|
|
|
|
|
|
// If the original size is 32-bits or more, we can use pshufd. Otherwise
|
|
|
|
// if we have SSSE3 we can use pshufb.
|
|
|
|
if (SubTp->getPrimitiveSizeInBits() >= 32 || ST->hasSSSE3())
|
|
|
|
return ExtractCost + 1; // pshufd or pshufb
|
|
|
|
|
|
|
|
assert(SubTp->getPrimitiveSizeInBits() == 16 &&
|
|
|
|
"Unexpected vector size");
|
|
|
|
|
|
|
|
return ExtractCost + 2; // worst case pshufhw + pshufd
|
|
|
|
}
|
2018-11-12 23:48:06 +08:00
|
|
|
}
|
|
|
|
}
|
2018-11-10 03:04:27 +08:00
|
|
|
|
2017-01-06 01:56:19 +08:00
|
|
|
// We are going to permute multiple sources and the result will be in multiple
|
|
|
|
// destinations. Providing an accurate cost only for splits where the element
|
|
|
|
// type remains the same.
|
|
|
|
if (Kind == TTI::SK_PermuteSingleSrc && LT.first != 1) {
|
|
|
|
MVT LegalVT = LT.second;
|
2018-01-10 03:08:22 +08:00
|
|
|
if (LegalVT.isVector() &&
|
|
|
|
LegalVT.getVectorElementType().getSizeInBits() ==
|
2017-01-06 01:56:19 +08:00
|
|
|
Tp->getVectorElementType()->getPrimitiveSizeInBits() &&
|
|
|
|
LegalVT.getVectorNumElements() < Tp->getVectorNumElements()) {
|
|
|
|
|
|
|
|
unsigned VecTySize = DL.getTypeStoreSize(Tp);
|
|
|
|
unsigned LegalVTSize = LegalVT.getStoreSize();
|
|
|
|
// Number of source vectors after legalization:
|
|
|
|
unsigned NumOfSrcs = (VecTySize + LegalVTSize - 1) / LegalVTSize;
|
|
|
|
// Number of destination vectors after legalization:
|
|
|
|
unsigned NumOfDests = LT.first;
|
|
|
|
|
|
|
|
Type *SingleOpTy = VectorType::get(Tp->getVectorElementType(),
|
|
|
|
LegalVT.getVectorNumElements());
|
|
|
|
|
|
|
|
unsigned NumOfShuffles = (NumOfSrcs - 1) * NumOfDests;
|
|
|
|
return NumOfShuffles *
|
|
|
|
getShuffleCost(TTI::SK_PermuteTwoSrc, SingleOpTy, 0, nullptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
|
|
|
|
}
|
|
|
|
|
|
|
|
// For 2-input shuffles, we must account for splitting the 2 inputs into many.
|
|
|
|
if (Kind == TTI::SK_PermuteTwoSrc && LT.first != 1) {
|
2017-01-02 18:37:52 +08:00
|
|
|
// We assume that source and destination have the same vector type.
|
|
|
|
int NumOfDests = LT.first;
|
|
|
|
int NumOfShufflesPerDest = LT.first * 2 - 1;
|
2017-01-06 01:56:19 +08:00
|
|
|
LT.first = NumOfDests * NumOfShufflesPerDest;
|
2014-06-20 12:32:48 +08:00
|
|
|
}
|
|
|
|
|
2017-01-06 01:56:19 +08:00
|
|
|
static const CostTblEntry AVX512VBMIShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Reverse, MVT::v64i8, 1}, // vpermb
|
|
|
|
{TTI::SK_Reverse, MVT::v32i8, 1}, // vpermb
|
2017-01-06 01:56:19 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v64i8, 1}, // vpermb
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i8, 1}, // vpermb
|
2017-01-06 01:56:19 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v64i8, 1}, // vpermt2b
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i8, 1}, // vpermt2b
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i8, 1} // vpermt2b
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasVBMI())
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512VBMIShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512BWShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v32i16, 1}, // vpbroadcastw
|
|
|
|
{TTI::SK_Broadcast, MVT::v64i8, 1}, // vpbroadcastb
|
|
|
|
|
|
|
|
{TTI::SK_Reverse, MVT::v32i16, 1}, // vpermw
|
|
|
|
{TTI::SK_Reverse, MVT::v16i16, 1}, // vpermw
|
|
|
|
{TTI::SK_Reverse, MVT::v64i8, 2}, // pshufb + vshufi64x2
|
|
|
|
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i16, 1}, // vpermw
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i16, 1}, // vpermw
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i16, 1}, // vpermw
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v64i8, 8}, // extend to v32i16
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i8, 3}, // vpermw + zext/trunc
|
|
|
|
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i16, 1}, // vpermt2w
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i16, 1}, // vpermt2w
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i16, 1}, // vpermt2w
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i8, 3}, // zext + vpermt2w + trunc
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v64i8, 19}, // 6 * v32i8 + 1
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i8, 3} // zext + vpermt2w + trunc
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasBWI())
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512BWShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v8f64, 1}, // vbroadcastpd
|
|
|
|
{TTI::SK_Broadcast, MVT::v16f32, 1}, // vbroadcastps
|
|
|
|
{TTI::SK_Broadcast, MVT::v8i64, 1}, // vpbroadcastq
|
|
|
|
{TTI::SK_Broadcast, MVT::v16i32, 1}, // vpbroadcastd
|
|
|
|
|
|
|
|
{TTI::SK_Reverse, MVT::v8f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_Reverse, MVT::v16f32, 1}, // vpermps
|
|
|
|
{TTI::SK_Reverse, MVT::v8i64, 1}, // vpermq
|
|
|
|
{TTI::SK_Reverse, MVT::v16i32, 1}, // vpermd
|
|
|
|
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v2f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16f32, 1}, // vpermps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8f32, 1}, // vpermps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4f32, 1}, // vpermps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i64, 1}, // vpermq
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i64, 1}, // vpermq
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v2i64, 1}, // vpermq
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i32, 1}, // vpermd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i32, 1}, // vpermd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i32, 1}, // vpermd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i8, 1}, // pshufb
|
|
|
|
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8f64, 1}, // vpermt2pd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16f32, 1}, // vpermt2ps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i64, 1}, // vpermt2q
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i32, 1}, // vpermt2d
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4f64, 1}, // vpermt2pd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8f32, 1}, // vpermt2ps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4i64, 1}, // vpermt2q
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i32, 1}, // vpermt2d
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v2f64, 1}, // vpermt2pd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4f32, 1}, // vpermt2ps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v2i64, 1}, // vpermt2q
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4i32, 1} // vpermt2d
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v4f64, 1}, // vbroadcastpd
|
|
|
|
{TTI::SK_Broadcast, MVT::v8f32, 1}, // vbroadcastps
|
|
|
|
{TTI::SK_Broadcast, MVT::v4i64, 1}, // vpbroadcastq
|
|
|
|
{TTI::SK_Broadcast, MVT::v8i32, 1}, // vpbroadcastd
|
|
|
|
{TTI::SK_Broadcast, MVT::v16i16, 1}, // vpbroadcastw
|
|
|
|
{TTI::SK_Broadcast, MVT::v32i8, 1}, // vpbroadcastb
|
|
|
|
|
|
|
|
{TTI::SK_Reverse, MVT::v4f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_Reverse, MVT::v8f32, 1}, // vpermps
|
|
|
|
{TTI::SK_Reverse, MVT::v4i64, 1}, // vpermq
|
|
|
|
{TTI::SK_Reverse, MVT::v8i32, 1}, // vpermd
|
|
|
|
{TTI::SK_Reverse, MVT::v16i16, 2}, // vperm2i128 + pshufb
|
|
|
|
{TTI::SK_Reverse, MVT::v32i8, 2}, // vperm2i128 + pshufb
|
|
|
|
|
|
|
|
{TTI::SK_Select, MVT::v16i16, 1}, // vpblendvb
|
|
|
|
{TTI::SK_Select, MVT::v32i8, 1}, // vpblendvb
|
|
|
|
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4f64, 1}, // vpermpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8f32, 1}, // vpermps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i64, 1}, // vpermq
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i32, 1}, // vpermd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i16, 4}, // vperm2i128 + 2*vpshufb
|
2017-02-03 04:27:13 +08:00
|
|
|
// + vpblendvb
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i8, 4}, // vperm2i128 + 2*vpshufb
|
2017-08-11 02:29:34 +08:00
|
|
|
// + vpblendvb
|
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4f64, 3}, // 2*vpermpd + vblendpd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8f32, 3}, // 2*vpermps + vblendps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4i64, 3}, // 2*vpermq + vpblendd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i32, 3}, // 2*vpermd + vpblendd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i16, 7}, // 2*vperm2i128 + 4*vpshufb
|
|
|
|
// + vpblendvb
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i8, 7}, // 2*vperm2i128 + 4*vpshufb
|
|
|
|
// + vpblendvb
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2017-08-16 21:50:20 +08:00
|
|
|
static const CostTblEntry XOPShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4f64, 2}, // vperm2f128 + vpermil2pd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8f32, 2}, // vperm2f128 + vpermil2ps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i64, 2}, // vperm2f128 + vpermil2pd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i32, 2}, // vperm2f128 + vpermil2ps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i16, 4}, // vextractf128 + 2*vpperm
|
|
|
|
// + vinsertf128
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i8, 4}, // vextractf128 + 2*vpperm
|
|
|
|
// + vinsertf128
|
|
|
|
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i16, 9}, // 2*vextractf128 + 6*vpperm
|
|
|
|
// + vinsertf128
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i16, 1}, // vpperm
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i8, 9}, // 2*vextractf128 + 6*vpperm
|
|
|
|
// + vinsertf128
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i8, 1}, // vpperm
|
2017-08-16 21:50:20 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasXOP())
|
|
|
|
if (const auto *Entry = CostTableLookup(XOPShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2017-01-06 01:56:19 +08:00
|
|
|
static const CostTblEntry AVX1ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v4f64, 2}, // vperm2f128 + vpermilpd
|
|
|
|
{TTI::SK_Broadcast, MVT::v8f32, 2}, // vperm2f128 + vpermilps
|
|
|
|
{TTI::SK_Broadcast, MVT::v4i64, 2}, // vperm2f128 + vpermilpd
|
|
|
|
{TTI::SK_Broadcast, MVT::v8i32, 2}, // vperm2f128 + vpermilps
|
|
|
|
{TTI::SK_Broadcast, MVT::v16i16, 3}, // vpshuflw + vpshufd + vinsertf128
|
|
|
|
{TTI::SK_Broadcast, MVT::v32i8, 2}, // vpshufb + vinsertf128
|
|
|
|
|
|
|
|
{TTI::SK_Reverse, MVT::v4f64, 2}, // vperm2f128 + vpermilpd
|
|
|
|
{TTI::SK_Reverse, MVT::v8f32, 2}, // vperm2f128 + vpermilps
|
|
|
|
{TTI::SK_Reverse, MVT::v4i64, 2}, // vperm2f128 + vpermilpd
|
|
|
|
{TTI::SK_Reverse, MVT::v8i32, 2}, // vperm2f128 + vpermilps
|
|
|
|
{TTI::SK_Reverse, MVT::v16i16, 4}, // vextractf128 + 2*pshufb
|
|
|
|
// + vinsertf128
|
|
|
|
{TTI::SK_Reverse, MVT::v32i8, 4}, // vextractf128 + 2*pshufb
|
|
|
|
// + vinsertf128
|
|
|
|
|
|
|
|
{TTI::SK_Select, MVT::v4i64, 1}, // vblendpd
|
|
|
|
{TTI::SK_Select, MVT::v4f64, 1}, // vblendpd
|
|
|
|
{TTI::SK_Select, MVT::v8i32, 1}, // vblendps
|
|
|
|
{TTI::SK_Select, MVT::v8f32, 1}, // vblendps
|
|
|
|
{TTI::SK_Select, MVT::v16i16, 3}, // vpand + vpandn + vpor
|
|
|
|
{TTI::SK_Select, MVT::v32i8, 3}, // vpand + vpandn + vpor
|
|
|
|
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4f64, 2}, // vperm2f128 + vshufpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i64, 2}, // vperm2f128 + vshufpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8f32, 4}, // 2*vperm2f128 + 2*vshufps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i32, 4}, // 2*vperm2f128 + 2*vshufps
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i16, 8}, // vextractf128 + 4*pshufb
|
2017-08-11 01:27:20 +08:00
|
|
|
// + 2*por + vinsertf128
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v32i8, 8}, // vextractf128 + 4*pshufb
|
2017-08-11 01:27:20 +08:00
|
|
|
// + 2*por + vinsertf128
|
2017-08-11 03:02:51 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4f64, 3}, // 2*vperm2f128 + vshufpd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v4i64, 3}, // 2*vperm2f128 + vshufpd
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8f32, 4}, // 2*vperm2f128 + 2*vshufps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i32, 4}, // 2*vperm2f128 + 2*vshufps
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i16, 15}, // 2*vextractf128 + 8*pshufb
|
|
|
|
// + 4*por + vinsertf128
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v32i8, 15}, // 2*vextractf128 + 8*pshufb
|
|
|
|
// + 4*por + vinsertf128
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry SSE41ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Select, MVT::v2i64, 1}, // pblendw
|
|
|
|
{TTI::SK_Select, MVT::v2f64, 1}, // movsd
|
|
|
|
{TTI::SK_Select, MVT::v4i32, 1}, // pblendw
|
|
|
|
{TTI::SK_Select, MVT::v4f32, 1}, // blendps
|
|
|
|
{TTI::SK_Select, MVT::v8i16, 1}, // pblendw
|
|
|
|
{TTI::SK_Select, MVT::v16i8, 1} // pblendvb
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE41())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE41ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry SSSE3ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v8i16, 1}, // pshufb
|
|
|
|
{TTI::SK_Broadcast, MVT::v16i8, 1}, // pshufb
|
2017-01-06 01:56:19 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Reverse, MVT::v8i16, 1}, // pshufb
|
|
|
|
{TTI::SK_Reverse, MVT::v16i8, 1}, // pshufb
|
2017-01-06 01:56:19 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Select, MVT::v8i16, 3}, // 2*pshufb + por
|
|
|
|
{TTI::SK_Select, MVT::v16i8, 3}, // 2*pshufb + por
|
2017-02-03 04:27:13 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i16, 1}, // pshufb
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v16i8, 1}, // pshufb
|
2017-08-11 01:27:20 +08:00
|
|
|
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v8i16, 3}, // 2*pshufb + por
|
|
|
|
{TTI::SK_PermuteTwoSrc, MVT::v16i8, 3}, // 2*pshufb + por
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSSE3())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSSE3ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2ShuffleTbl[] = {
|
2018-11-10 03:04:27 +08:00
|
|
|
{TTI::SK_Broadcast, MVT::v2f64, 1}, // shufpd
|
|
|
|
{TTI::SK_Broadcast, MVT::v2i64, 1}, // pshufd
|
|
|
|
{TTI::SK_Broadcast, MVT::v4i32, 1}, // pshufd
|
|
|
|
{TTI::SK_Broadcast, MVT::v8i16, 2}, // pshuflw + pshufd
|
|
|
|
{TTI::SK_Broadcast, MVT::v16i8, 3}, // unpck + pshuflw + pshufd
|
|
|
|
|
|
|
|
{TTI::SK_Reverse, MVT::v2f64, 1}, // shufpd
|
|
|
|
{TTI::SK_Reverse, MVT::v2i64, 1}, // pshufd
|
|
|
|
{TTI::SK_Reverse, MVT::v4i32, 1}, // pshufd
|
|
|
|
{TTI::SK_Reverse, MVT::v8i16, 3}, // pshuflw + pshufhw + pshufd
|
|
|
|
{TTI::SK_Reverse, MVT::v16i8, 9}, // 2*pshuflw + 2*pshufhw
|
|
|
|
// + 2*pshufd + 2*unpck + packus
|
|
|
|
|
|
|
|
{TTI::SK_Select, MVT::v2i64, 1}, // movsd
|
|
|
|
{TTI::SK_Select, MVT::v2f64, 1}, // movsd
|
|
|
|
{TTI::SK_Select, MVT::v4i32, 2}, // 2*shufps
|
|
|
|
{TTI::SK_Select, MVT::v8i16, 3}, // pand + pandn + por
|
|
|
|
{TTI::SK_Select, MVT::v16i8, 3}, // pand + pandn + por
|
|
|
|
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v2f64, 1}, // shufpd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v2i64, 1}, // pshufd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v4i32, 1}, // pshufd
|
|
|
|
{TTI::SK_PermuteSingleSrc, MVT::v8i16, 5}, // 2*pshuflw + 2*pshufhw
|
2017-08-11 01:27:20 +08:00
|
|
|
// + pshufd/unpck
|
|
|
|
{ TTI::SK_PermuteSingleSrc, MVT::v16i8, 10 }, // 2*pshuflw + 2*pshufhw
|
|
|
|
// + 2*pshufd + 2*unpck + 2*packus
|
|
|
|
|
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v2f64, 1 }, // shufpd
|
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v2i64, 1 }, // shufpd
|
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v4i32, 2 }, // 2*{unpck,movsd,pshufd}
|
2017-08-11 03:32:35 +08:00
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v8i16, 8 }, // blend+permute
|
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v16i8, 13 }, // blend+permute
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
static const CostTblEntry SSE1ShuffleTbl[] = {
|
2017-08-11 01:27:20 +08:00
|
|
|
{ TTI::SK_Broadcast, MVT::v4f32, 1 }, // shufps
|
|
|
|
{ TTI::SK_Reverse, MVT::v4f32, 1 }, // shufps
|
[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
2018-06-13 00:12:29 +08:00
|
|
|
{ TTI::SK_Select, MVT::v4f32, 2 }, // 2*shufps
|
2017-08-11 01:27:20 +08:00
|
|
|
{ TTI::SK_PermuteSingleSrc, MVT::v4f32, 1 }, // shufps
|
|
|
|
{ TTI::SK_PermuteTwoSrc, MVT::v4f32, 2 }, // 2*shufps
|
2017-01-06 01:56:19 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (ST->hasSSE1())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE1ShuffleTbl, Kind, LT.second))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2017-04-12 19:49:08 +08:00
|
|
|
int X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
|
|
|
|
const Instruction *I) {
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
|
2015-12-11 08:31:39 +08:00
|
|
|
// FIXME: Need a better design of the cost table to handle non-simple types of
|
|
|
|
// potential massive combinations (elem_num x src_type x dst_type).
|
|
|
|
|
2018-11-29 02:11:39 +08:00
|
|
|
static const TypeConversionCostTblEntry AVX512BWConversionTbl[] {
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i8, 1 },
|
|
|
|
|
|
|
|
// Mask sign extend has an instruction.
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i1, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i8, MVT::v16i1, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i1, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v32i8, MVT::v32i1, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v32i16, MVT::v32i1, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v64i8, MVT::v64i1, 1 },
|
|
|
|
|
|
|
|
// Mask zero extend is a load + broadcast.
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i8, MVT::v16i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v32i8, MVT::v32i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v32i16, MVT::v32i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v64i8, MVT::v64i1, 2 },
|
|
|
|
};
|
|
|
|
|
2015-12-02 16:59:47 +08:00
|
|
|
static const TypeConversionCostTblEntry AVX512DQConversionTbl[] = {
|
2016-11-24 22:46:55 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f32, MVT::v2i64, 1 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i64, 1 },
|
2016-11-23 22:01:18 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i64, 1 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 1 },
|
2016-11-23 21:42:09 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i64, 1 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i64, 1 },
|
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i64, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 1 },
|
2016-07-18 03:02:27 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 1 },
|
|
|
|
|
2016-11-24 22:46:55 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f32, 1 },
|
2016-11-23 22:01:18 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v4i64, MVT::v4f32, 1 },
|
2016-11-23 21:42:09 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v8i64, MVT::v8f32, 1 },
|
2016-11-24 22:46:55 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f64, 1 },
|
2016-11-23 22:01:18 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v4i64, MVT::v4f64, 1 },
|
2016-11-23 21:42:09 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v8i64, MVT::v8f64, 1 },
|
|
|
|
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f32, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f32, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f32, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v4i64, MVT::v4f64, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i64, MVT::v8f64, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
};
|
|
|
|
|
2016-07-12 05:39:44 +08:00
|
|
|
// TODO: For AVX512DQ + AVX512VL, we also have cheap casts for 128-bit and
|
|
|
|
// 256-bit wide vectors.
|
|
|
|
|
2015-12-02 16:59:47 +08:00
|
|
|
static const TypeConversionCostTblEntry AVX512FConversionTbl[] = {
|
2014-09-16 15:57:37 +08:00
|
|
|
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 1 },
|
|
|
|
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v16f32, 3 },
|
|
|
|
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 1 },
|
|
|
|
|
|
|
|
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 1 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 1 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, 1 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, 1 },
|
|
|
|
|
|
|
|
// v16i1 -> v16i32 - load + broadcast
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i1, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i1, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 1 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i8, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i16, 1 },
|
2019-08-14 22:52:39 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i16, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i64, MVT::v8i32, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i64, MVT::v8i32, 1 },
|
2014-09-16 15:57:37 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
|
2014-11-13 19:46:16 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
|
2014-11-13 19:46:16 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
|
2014-11-13 19:46:16 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i1, 4 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i1, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 2 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 2 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i8, 2 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i8, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 5 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 2 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i16, 2 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i16, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i32, 2 },
|
2016-07-18 03:02:27 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 1 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v16f32, MVT::v16i32, 1 },
|
2016-07-18 03:02:27 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f32, MVT::v2i64, 5 },
|
2018-10-25 21:06:20 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 26 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
|
2018-10-25 21:06:20 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 5 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 5 },
|
2015-12-02 16:59:47 +08:00
|
|
|
|
2020-01-06 21:16:43 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f32, MVT::i64, 1 },
|
2018-10-25 20:42:10 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f64, MVT::i64, 1 },
|
2019-09-03 13:57:22 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::i64, MVT::f32, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::i64, MVT::f64, 1 },
|
2018-10-25 20:42:10 +08:00
|
|
|
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
|
2018-08-27 02:47:44 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 1 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 1 },
|
2019-12-27 13:46:29 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f64, 1 },
|
2017-09-07 15:40:34 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f64, 2 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i8, MVT::v8f64, 2 },
|
2015-12-02 16:59:47 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v16i32, MVT::v16f32, 1 },
|
2017-09-07 15:40:34 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v16i16, MVT::v16f32, 2 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v16i8, MVT::v16f32, 2 },
|
2014-09-16 15:57:37 +08:00
|
|
|
};
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const TypeConversionCostTblEntry AVX2ConversionTbl[] = {
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i1, 3 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i1, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1, 3 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i1, 3 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 1 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 1 },
|
|
|
|
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i64, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i64, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, 4 },
|
2014-09-16 15:57:37 +08:00
|
|
|
|
|
|
|
{ ISD::FP_EXTEND, MVT::v8f64, MVT::v8f32, 3 },
|
|
|
|
{ ISD::FP_ROUND, MVT::v8f32, MVT::v8f64, 3 },
|
[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.
Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.
Correctness testing has been done for the whole range of uint32_t with the
following program:
uint4 v = (uint4) {0,1,2,3};
uint32_t i;
//Check correctness over entire range for uint4 -> float4 conversion
for( i = 0; i < 1U << (32-2); i++ )
{
float4 t = test(v);
float4 c = correct(v);
if( 0xf != _mm_movemask_ps( t == c ))
{
printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
return -1;
}
v += 4;
}
Where "correct" is the old lowering and "test" the new one.
The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).
rdar://problem/18153096>
llvm-svn: 221657
2014-11-11 10:23:47 +08:00
|
|
|
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 8 },
|
2014-02-07 02:18:36 +08:00
|
|
|
};
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const TypeConversionCostTblEntry AVXConversionTbl[] = {
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i1, 6 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i1, 4 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1, 7 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i1, 4 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 4 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 4 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 4 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 4 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 4 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 4 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 4 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 4 },
|
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 4 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 4 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 5 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i64, 4 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i64, 4 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 4 },
|
2019-09-23 00:46:15 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i64, 11 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, 9 },
|
2014-02-07 02:18:36 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, 9 },
|
2019-09-23 00:46:15 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i64, 11 },
|
2013-04-01 18:23:49 +08:00
|
|
|
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i1, 3 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i1, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i1, 8 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i8, 3 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i8, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i8, 8 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i16, 3 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i16, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i32, 1 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i32, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i32, 1 },
|
2013-04-01 18:23:49 +08:00
|
|
|
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i1, 7 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i1, 7 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i1, 6 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 2 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i8, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 5 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i16, 2 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i16, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i16, 5 },
|
2016-07-12 05:39:44 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 6 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 6 },
|
2013-04-01 18:23:49 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i32, 6 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i32, 9 },
|
2018-10-25 21:06:20 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 6 },
|
2014-03-28 06:27:41 +08:00
|
|
|
// The generic code to compute the scalar overhead is currently broken.
|
|
|
|
// Workaround this limitation by estimating the scalarization overhead
|
|
|
|
// here. We have roughly 10 instructions per scalar element.
|
|
|
|
// Multiply that by the vector width.
|
|
|
|
// FIXME: remove that when PR19268 is fixed.
|
2016-07-12 05:39:44 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
|
2016-07-18 03:02:27 +08:00
|
|
|
|
2013-01-21 04:57:20 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 7 },
|
2014-03-31 02:07:13 +08:00
|
|
|
// This node is expanded into scalarized operations but BasicTTI is overly
|
|
|
|
// optimistic estimating its cost. It computes 3 per element (one
|
|
|
|
// vector-extract, one scalar conversion and one vector-insert). The
|
|
|
|
// problem is that the inserts form a read-modify-write chain so latency
|
|
|
|
// should be factored in too. Inflating the cost per element by 1.
|
|
|
|
{ ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 8*4 },
|
2014-04-01 05:54:48 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },
|
2016-07-12 05:39:44 +08:00
|
|
|
|
|
|
|
{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 },
|
|
|
|
{ ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 },
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
};
|
|
|
|
|
2015-12-11 08:31:39 +08:00
|
|
|
static const TypeConversionCostTblEntry SSE41ConversionTbl[] = {
|
2016-06-11 01:01:05 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 2 },
|
2016-06-11 01:01:05 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i16, MVT::v4i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i8, 2 },
|
2015-12-11 08:31:39 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i8, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 4 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i16, 1 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 4 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 4 },
|
2015-12-11 08:31:39 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 1 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 1 },
|
2015-12-11 08:31:39 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },
|
2016-07-18 03:02:27 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 6 },
|
2019-09-22 20:04:38 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1 }, // PSHUFB
|
2016-07-07 02:26:48 +08:00
|
|
|
|
2020-01-06 21:16:43 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f32, MVT::i64, 4 },
|
2018-10-25 20:42:10 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f64, MVT::i64, 4 },
|
2015-12-11 08:31:39 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const TypeConversionCostTblEntry SSE2ConversionTbl[] = {
|
2015-07-19 23:36:12 +08:00
|
|
|
// These are somewhat magic numbers justified by looking at the output of
|
|
|
|
// Intel's IACA, running some kernels and making sure when we take
|
|
|
|
// legalization into account the throughput will be overestimated.
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v16i8, 8 },
|
2015-07-19 23:36:12 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v8i16, 15 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v8i16, 8*10 },
|
2016-07-07 03:15:54 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i32, 5 },
|
2019-09-30 07:32:37 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v4i32, 2*10 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 2*10 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v2i64, 15 },
|
|
|
|
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },
|
2015-12-11 08:31:39 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v16i8, 8 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v8i16, 15 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v8i16, 8*10 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v4i32, 4*10 },
|
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i32, 8 },
|
2018-10-25 21:06:20 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 6 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v2i64, 15 },
|
2016-06-11 01:01:05 +08:00
|
|
|
|
2019-11-13 14:44:48 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v4i16, MVT::v4f32, 2 },
|
|
|
|
{ ISD::FP_TO_SINT, MVT::v2i16, MVT::v2f64, 2 },
|
|
|
|
|
2016-10-18 15:42:15 +08:00
|
|
|
{ ISD::FP_TO_SINT, MVT::v2i32, MVT::v2f64, 3 },
|
|
|
|
|
2020-01-06 21:16:43 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f32, MVT::i64, 6 },
|
2018-10-25 20:42:10 +08:00
|
|
|
{ ISD::UINT_TO_FP, MVT::f64, MVT::i64, 6 },
|
2020-01-06 21:16:43 +08:00
|
|
|
|
2019-09-03 13:57:22 +08:00
|
|
|
{ ISD::FP_TO_UINT, MVT::i64, MVT::f32, 4 },
|
|
|
|
{ ISD::FP_TO_UINT, MVT::i64, MVT::f64, 4 },
|
2018-10-25 20:42:10 +08:00
|
|
|
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i16, MVT::v4i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i16, MVT::v4i8, 6 },
|
2015-12-11 08:31:39 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i8, 2 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i8, 3 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i8, 4 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i8, 8 },
|
2015-12-11 08:31:39 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i16, MVT::v8i8, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i16, MVT::v8i8, 2 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i8, 6 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i8, 6 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 3 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i8, 9 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i8, 12 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 1 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i16, 2 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i16, 3 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 10 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 3 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 4 },
|
|
|
|
{ ISD::ZERO_EXTEND, MVT::v16i32, MVT::v16i16, 6 },
|
2016-07-18 03:02:27 +08:00
|
|
|
{ ISD::SIGN_EXTEND, MVT::v16i32, MVT::v16i16, 8 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 3 },
|
|
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 5 },
|
2015-12-11 08:31:39 +08:00
|
|
|
|
2019-09-23 00:46:15 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i16, 2 }, // PAND+PACKUSWB
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i16, 4 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 2 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },
|
2019-09-23 00:46:15 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i32, 3 }, // PAND+3*PACKUSWB
|
|
|
|
{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i32, 1 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 3 },
|
2015-12-11 08:31:39 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 3 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 4 },
|
2016-07-07 02:26:48 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 7 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 5 },
|
|
|
|
{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 10 },
|
2019-09-22 20:04:38 +08:00
|
|
|
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 4 }, // PAND+3*PACKUSWB
|
|
|
|
{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 2 }, // PSHUFD+PSHUFLW
|
|
|
|
{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1 }, // PSHUFD
|
2015-07-19 23:36:12 +08:00
|
|
|
};
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LTSrc = TLI->getTypeLegalizationCost(DL, Src);
|
|
|
|
std::pair<int, MVT> LTDest = TLI->getTypeLegalizationCost(DL, Dst);
|
2015-07-19 23:36:12 +08:00
|
|
|
|
|
|
|
if (ST->hasSSE2() && !ST->hasAVX()) {
|
2015-12-11 08:31:39 +08:00
|
|
|
if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,
|
2015-10-27 12:14:24 +08:00
|
|
|
LTDest.second, LTSrc.second))
|
|
|
|
return LTSrc.first * Entry->Cost;
|
2015-07-19 23:36:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
EVT SrcTy = TLI->getValueType(DL, Src);
|
|
|
|
EVT DstTy = TLI->getValueType(DL, Dst);
|
|
|
|
|
|
|
|
// The function getSimpleVT only handles simple value types.
|
|
|
|
if (!SrcTy.isSimple() || !DstTy.isSimple())
|
|
|
|
return BaseT::getCastInstrCost(Opcode, Dst, Src);
|
|
|
|
|
2018-11-29 02:11:42 +08:00
|
|
|
MVT SimpleSrcTy = SrcTy.getSimpleVT();
|
|
|
|
MVT SimpleDstTy = DstTy.getSimpleVT();
|
|
|
|
|
|
|
|
// Make sure that neither type is going to be split before using the
|
|
|
|
// AVX512 tables. This handles -mprefer-vector-width=256
|
|
|
|
// with -min-legal-vector-width<=256
|
|
|
|
if (TLI->getTypeAction(SimpleSrcTy) != TargetLowering::TypeSplitVector &&
|
|
|
|
TLI->getTypeAction(SimpleDstTy) != TargetLowering::TypeSplitVector) {
|
|
|
|
if (ST->hasBWI())
|
|
|
|
if (const auto *Entry = ConvertCostTableLookup(AVX512BWConversionTbl, ISD,
|
|
|
|
SimpleDstTy, SimpleSrcTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasDQI())
|
|
|
|
if (const auto *Entry = ConvertCostTableLookup(AVX512DQConversionTbl, ISD,
|
|
|
|
SimpleDstTy, SimpleSrcTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = ConvertCostTableLookup(AVX512FConversionTbl, ISD,
|
|
|
|
SimpleDstTy, SimpleSrcTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
}
|
2015-12-02 16:59:47 +08:00
|
|
|
|
2014-02-07 02:18:36 +08:00
|
|
|
if (ST->hasAVX2()) {
|
2015-10-27 12:14:24 +08:00
|
|
|
if (const auto *Entry = ConvertCostTableLookup(AVX2ConversionTbl, ISD,
|
2018-11-29 02:11:42 +08:00
|
|
|
SimpleDstTy, SimpleSrcTy))
|
2015-10-27 12:14:24 +08:00
|
|
|
return Entry->Cost;
|
2014-02-07 02:18:36 +08:00
|
|
|
}
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
if (ST->hasAVX()) {
|
2015-10-27 12:14:24 +08:00
|
|
|
if (const auto *Entry = ConvertCostTableLookup(AVXConversionTbl, ISD,
|
2018-11-29 02:11:42 +08:00
|
|
|
SimpleDstTy, SimpleSrcTy))
|
2015-12-11 08:31:39 +08:00
|
|
|
return Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ST->hasSSE41()) {
|
|
|
|
if (const auto *Entry = ConvertCostTableLookup(SSE41ConversionTbl, ISD,
|
2018-11-29 02:11:42 +08:00
|
|
|
SimpleDstTy, SimpleSrcTy))
|
2019-08-22 16:18:45 +08:00
|
|
|
return Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2015-12-11 08:31:39 +08:00
|
|
|
if (ST->hasSSE2()) {
|
|
|
|
if (const auto *Entry = ConvertCostTableLookup(SSE2ConversionTbl, ISD,
|
2018-11-29 02:11:42 +08:00
|
|
|
SimpleDstTy, SimpleSrcTy))
|
2015-10-27 12:14:24 +08:00
|
|
|
return Entry->Cost;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2017-11-07 22:23:44 +08:00
|
|
|
return BaseT::getCastInstrCost(Opcode, Dst, Src, I);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2017-04-12 19:49:08 +08:00
|
|
|
int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy,
|
|
|
|
const Instruction *I) {
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// Legalize the type.
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
|
|
|
MVT MTy = LT.second;
|
|
|
|
|
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
|
2019-01-22 20:29:38 +08:00
|
|
|
unsigned ExtraCost = 0;
|
|
|
|
if (I && (Opcode == Instruction::ICmp || Opcode == Instruction::FCmp)) {
|
|
|
|
// Some vector comparison predicates cost extra instructions.
|
|
|
|
if (MTy.isVector() &&
|
|
|
|
!((ST->hasXOP() && (!ST->hasAVX2() || MTy.is128BitVector())) ||
|
|
|
|
(ST->hasAVX512() && 32 <= MTy.getScalarSizeInBits()) ||
|
|
|
|
ST->hasBWI())) {
|
|
|
|
switch (cast<CmpInst>(I)->getPredicate()) {
|
|
|
|
case CmpInst::Predicate::ICMP_NE:
|
|
|
|
// xor(cmpeq(x,y),-1)
|
|
|
|
ExtraCost = 1;
|
|
|
|
break;
|
|
|
|
case CmpInst::Predicate::ICMP_SGE:
|
|
|
|
case CmpInst::Predicate::ICMP_SLE:
|
|
|
|
// xor(cmpgt(x,y),-1)
|
|
|
|
ExtraCost = 1;
|
|
|
|
break;
|
|
|
|
case CmpInst::Predicate::ICMP_ULT:
|
|
|
|
case CmpInst::Predicate::ICMP_UGT:
|
|
|
|
// cmpgt(xor(x,signbit),xor(y,signbit))
|
|
|
|
// xor(cmpeq(pmaxu(x,y),x),-1)
|
|
|
|
ExtraCost = 2;
|
|
|
|
break;
|
|
|
|
case CmpInst::Predicate::ICMP_ULE:
|
|
|
|
case CmpInst::Predicate::ICMP_UGE:
|
|
|
|
if ((ST->hasSSE41() && MTy.getScalarSizeInBits() == 32) ||
|
|
|
|
(ST->hasSSE2() && MTy.getScalarSizeInBits() < 32)) {
|
|
|
|
// cmpeq(psubus(x,y),0)
|
|
|
|
// cmpeq(pminu(x,y),x)
|
|
|
|
ExtraCost = 1;
|
|
|
|
} else {
|
|
|
|
// xor(cmpgt(xor(x,signbit),xor(y,signbit)),-1)
|
|
|
|
ExtraCost = 3;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-26 18:14:38 +08:00
|
|
|
static const CostTblEntry SLMCostTbl[] = {
|
|
|
|
// slm pcmpeq/pcmpgt throughput is 2
|
|
|
|
{ ISD::SETCC, MVT::v2i64, 2 },
|
|
|
|
};
|
|
|
|
|
2019-01-20 20:28:13 +08:00
|
|
|
static const CostTblEntry AVX512BWCostTbl[] = {
|
|
|
|
{ ISD::SETCC, MVT::v32i16, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v64i8, 1 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v32i16, 1 },
|
|
|
|
{ ISD::SELECT, MVT::v64i8, 1 },
|
2016-05-10 05:14:38 +08:00
|
|
|
};
|
|
|
|
|
2019-01-20 20:28:13 +08:00
|
|
|
static const CostTblEntry AVX512CostTbl[] = {
|
|
|
|
{ ISD::SETCC, MVT::v8i64, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v16i32, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v8f64, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v16f32, 1 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v8i64, 1 },
|
|
|
|
{ ISD::SELECT, MVT::v16i32, 1 },
|
|
|
|
{ ISD::SELECT, MVT::v8f64, 1 },
|
|
|
|
{ ISD::SELECT, MVT::v16f32, 1 },
|
2019-01-20 20:28:13 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2CostTbl[] = {
|
|
|
|
{ ISD::SETCC, MVT::v4i64, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v8i32, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v16i16, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v32i8, 1 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v4i64, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v8i32, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v16i16, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v32i8, 1 }, // pblendvb
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
};
|
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry AVX1CostTbl[] = {
|
2013-01-21 04:57:20 +08:00
|
|
|
{ ISD::SETCC, MVT::v4f64, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v8f32, 1 },
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// AVX1 does not support 8-wide integer compare.
|
2013-01-21 04:57:20 +08:00
|
|
|
{ ISD::SETCC, MVT::v4i64, 4 },
|
|
|
|
{ ISD::SETCC, MVT::v8i32, 4 },
|
|
|
|
{ ISD::SETCC, MVT::v16i16, 4 },
|
|
|
|
{ ISD::SETCC, MVT::v32i8, 4 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v4f64, 1 }, // vblendvpd
|
|
|
|
{ ISD::SELECT, MVT::v8f32, 1 }, // vblendvps
|
|
|
|
{ ISD::SELECT, MVT::v4i64, 1 }, // vblendvpd
|
|
|
|
{ ISD::SELECT, MVT::v8i32, 1 }, // vblendvps
|
|
|
|
{ ISD::SELECT, MVT::v16i16, 3 }, // vandps + vandnps + vorps
|
|
|
|
{ ISD::SELECT, MVT::v32i8, 3 }, // vandps + vandnps + vorps
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
};
|
|
|
|
|
2019-01-20 20:28:13 +08:00
|
|
|
static const CostTblEntry SSE42CostTbl[] = {
|
|
|
|
{ ISD::SETCC, MVT::v2f64, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v4f32, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v2i64, 1 },
|
2014-09-16 15:57:37 +08:00
|
|
|
};
|
|
|
|
|
2019-01-20 21:55:01 +08:00
|
|
|
static const CostTblEntry SSE41CostTbl[] = {
|
|
|
|
{ ISD::SELECT, MVT::v2f64, 1 }, // blendvpd
|
|
|
|
{ ISD::SELECT, MVT::v4f32, 1 }, // blendvps
|
|
|
|
{ ISD::SELECT, MVT::v2i64, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v4i32, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v8i16, 1 }, // pblendvb
|
|
|
|
{ ISD::SELECT, MVT::v16i8, 1 }, // pblendvb
|
|
|
|
};
|
|
|
|
|
2019-01-20 20:28:13 +08:00
|
|
|
static const CostTblEntry SSE2CostTbl[] = {
|
2019-01-20 21:21:43 +08:00
|
|
|
{ ISD::SETCC, MVT::v2f64, 2 },
|
|
|
|
{ ISD::SETCC, MVT::f64, 1 },
|
2019-01-20 20:28:13 +08:00
|
|
|
{ ISD::SETCC, MVT::v2i64, 8 },
|
|
|
|
{ ISD::SETCC, MVT::v4i32, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v8i16, 1 },
|
|
|
|
{ ISD::SETCC, MVT::v16i8, 1 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v2f64, 3 }, // andpd + andnpd + orpd
|
|
|
|
{ ISD::SELECT, MVT::v2i64, 3 }, // pand + pandn + por
|
|
|
|
{ ISD::SELECT, MVT::v4i32, 3 }, // pand + pandn + por
|
|
|
|
{ ISD::SELECT, MVT::v8i16, 3 }, // pand + pandn + por
|
|
|
|
{ ISD::SELECT, MVT::v16i8, 3 }, // pand + pandn + por
|
2018-04-07 21:24:33 +08:00
|
|
|
};
|
|
|
|
|
2019-01-20 21:21:43 +08:00
|
|
|
static const CostTblEntry SSE1CostTbl[] = {
|
|
|
|
{ ISD::SETCC, MVT::v4f32, 2 },
|
|
|
|
{ ISD::SETCC, MVT::f32, 1 },
|
2019-01-20 21:55:01 +08:00
|
|
|
|
|
|
|
{ ISD::SELECT, MVT::v4f32, 3 }, // andps + andnps + orps
|
2019-01-20 21:21:43 +08:00
|
|
|
};
|
|
|
|
|
2019-09-26 18:14:38 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTbl, ISD, MTy))
|
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
|
|
|
|
2018-04-07 21:24:33 +08:00
|
|
|
if (ST->hasBWI())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
2018-04-07 21:24:33 +08:00
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
2014-09-16 15:57:37 +08:00
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasSSE42())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE42CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2019-01-20 21:55:01 +08:00
|
|
|
if (ST->hasSSE41())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE41CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
2019-01-20 21:55:01 +08:00
|
|
|
|
2016-05-10 05:14:38 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
2016-05-10 05:14:38 +08:00
|
|
|
|
2019-01-20 21:21:43 +08:00
|
|
|
if (ST->hasSSE1())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE1CostTbl, ISD, MTy))
|
2019-01-22 20:29:38 +08:00
|
|
|
return LT.first * (ExtraCost + Entry->Cost);
|
2019-01-20 21:21:43 +08:00
|
|
|
|
2017-04-12 19:49:08 +08:00
|
|
|
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, I);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2017-06-07 00:45:25 +08:00
|
|
|
unsigned X86TTIImpl::getAtomicMemIntrinsicMaxElementSize() const { return 16; }
|
|
|
|
|
2016-05-24 16:17:50 +08:00
|
|
|
int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
|
2017-03-14 14:35:36 +08:00
|
|
|
ArrayRef<Type *> Tys, FastMathFlags FMF,
|
|
|
|
unsigned ScalarizationCostPassed) {
|
2016-07-20 18:41:28 +08:00
|
|
|
// Costs should match the codegen from:
|
|
|
|
// BITREVERSE: llvm\test\CodeGen\X86\vector-bitreverse.ll
|
|
|
|
// BSWAP: llvm\test\CodeGen\X86\bswap-vector.ll
|
2016-08-04 18:51:41 +08:00
|
|
|
// CTLZ: llvm\test\CodeGen\X86\vector-lzcnt-*.ll
|
2016-07-20 18:41:28 +08:00
|
|
|
// CTPOP: llvm\test\CodeGen\X86\vector-popcnt-*.ll
|
2016-08-04 18:51:41 +08:00
|
|
|
// CTTZ: llvm\test\CodeGen\X86\vector-tzcnt-*.ll
|
2017-05-18 05:02:18 +08:00
|
|
|
static const CostTblEntry AVX512CDCostTbl[] = {
|
|
|
|
{ ISD::CTLZ, MVT::v8i64, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i32, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v32i16, 8 },
|
|
|
|
{ ISD::CTLZ, MVT::v64i8, 20 },
|
|
|
|
{ ISD::CTLZ, MVT::v4i64, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v8i32, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i16, 4 },
|
|
|
|
{ ISD::CTLZ, MVT::v32i8, 10 },
|
|
|
|
{ ISD::CTLZ, MVT::v2i64, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v4i32, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::v8i16, 4 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i8, 4 },
|
|
|
|
};
|
2017-05-18 03:20:20 +08:00
|
|
|
static const CostTblEntry AVX512BWCostTbl[] = {
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i64, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i32, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v32i16, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v64i8, 5 },
|
2017-05-18 05:02:18 +08:00
|
|
|
{ ISD::CTLZ, MVT::v8i64, 23 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i32, 22 },
|
|
|
|
{ ISD::CTLZ, MVT::v32i16, 18 },
|
|
|
|
{ ISD::CTLZ, MVT::v64i8, 17 },
|
2017-05-18 18:42:34 +08:00
|
|
|
{ ISD::CTPOP, MVT::v8i64, 7 },
|
|
|
|
{ ISD::CTPOP, MVT::v16i32, 11 },
|
|
|
|
{ ISD::CTPOP, MVT::v32i16, 9 },
|
|
|
|
{ ISD::CTPOP, MVT::v64i8, 6 },
|
2017-05-18 04:22:54 +08:00
|
|
|
{ ISD::CTTZ, MVT::v8i64, 10 },
|
|
|
|
{ ISD::CTTZ, MVT::v16i32, 14 },
|
|
|
|
{ ISD::CTTZ, MVT::v32i16, 12 },
|
|
|
|
{ ISD::CTTZ, MVT::v64i8, 9 },
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::SADDSAT, MVT::v32i16, 1 },
|
|
|
|
{ ISD::SADDSAT, MVT::v64i8, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v32i16, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v64i8, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v32i16, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v64i8, 1 },
|
|
|
|
{ ISD::USUBSAT, MVT::v32i16, 1 },
|
|
|
|
{ ISD::USUBSAT, MVT::v64i8, 1 },
|
2017-05-18 03:20:20 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry AVX512CostTbl[] = {
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i64, 36 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i32, 24 },
|
2017-05-18 05:02:18 +08:00
|
|
|
{ ISD::CTLZ, MVT::v8i64, 29 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i32, 35 },
|
2017-05-18 18:42:34 +08:00
|
|
|
{ ISD::CTPOP, MVT::v8i64, 16 },
|
|
|
|
{ ISD::CTPOP, MVT::v16i32, 24 },
|
2017-05-18 04:22:54 +08:00
|
|
|
{ ISD::CTTZ, MVT::v8i64, 20 },
|
|
|
|
{ ISD::CTTZ, MVT::v16i32, 28 },
|
2019-01-16 02:43:41 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v16i32, 2 }, // pmaxud + psubd
|
|
|
|
{ ISD::USUBSAT, MVT::v2i64, 2 }, // pmaxuq + psubq
|
|
|
|
{ ISD::USUBSAT, MVT::v4i64, 2 }, // pmaxuq + psubq
|
|
|
|
{ ISD::USUBSAT, MVT::v8i64, 2 }, // pmaxuq + psubq
|
2019-01-29 03:19:09 +08:00
|
|
|
{ ISD::UADDSAT, MVT::v16i32, 3 }, // not + pminud + paddd
|
|
|
|
{ ISD::UADDSAT, MVT::v2i64, 3 }, // not + pminuq + paddq
|
|
|
|
{ ISD::UADDSAT, MVT::v4i64, 3 }, // not + pminuq + paddq
|
|
|
|
{ ISD::UADDSAT, MVT::v8i64, 3 }, // not + pminuq + paddq
|
2017-05-18 03:20:20 +08:00
|
|
|
};
|
2016-05-24 16:17:50 +08:00
|
|
|
static const CostTblEntry XOPCostTbl[] = {
|
|
|
|
{ ISD::BITREVERSE, MVT::v4i64, 4 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i32, 4 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i16, 4 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v32i8, 4 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v2i64, 1 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v4i32, 1 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i16, 1 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i8, 1 },
|
|
|
|
{ ISD::BITREVERSE, MVT::i64, 3 },
|
|
|
|
{ ISD::BITREVERSE, MVT::i32, 3 },
|
|
|
|
{ ISD::BITREVERSE, MVT::i16, 3 },
|
|
|
|
{ ISD::BITREVERSE, MVT::i8, 3 }
|
|
|
|
};
|
2016-06-12 03:23:02 +08:00
|
|
|
static const CostTblEntry AVX2CostTbl[] = {
|
|
|
|
{ ISD::BITREVERSE, MVT::v4i64, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i32, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i16, 5 },
|
2016-06-21 07:08:21 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::v32i8, 5 },
|
|
|
|
{ ISD::BSWAP, MVT::v4i64, 1 },
|
|
|
|
{ ISD::BSWAP, MVT::v8i32, 1 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::BSWAP, MVT::v16i16, 1 },
|
2016-08-04 18:51:41 +08:00
|
|
|
{ ISD::CTLZ, MVT::v4i64, 23 },
|
|
|
|
{ ISD::CTLZ, MVT::v8i32, 18 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i16, 14 },
|
|
|
|
{ ISD::CTLZ, MVT::v32i8, 9 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::CTPOP, MVT::v4i64, 7 },
|
|
|
|
{ ISD::CTPOP, MVT::v8i32, 11 },
|
|
|
|
{ ISD::CTPOP, MVT::v16i16, 9 },
|
2016-08-04 18:51:41 +08:00
|
|
|
{ ISD::CTPOP, MVT::v32i8, 6 },
|
|
|
|
{ ISD::CTTZ, MVT::v4i64, 10 },
|
|
|
|
{ ISD::CTTZ, MVT::v8i32, 14 },
|
|
|
|
{ ISD::CTTZ, MVT::v16i16, 12 },
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::CTTZ, MVT::v32i8, 9 },
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::SADDSAT, MVT::v16i16, 1 },
|
|
|
|
{ ISD::SADDSAT, MVT::v32i8, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v16i16, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v32i8, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v16i16, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v32i8, 1 },
|
2019-01-29 03:19:09 +08:00
|
|
|
{ ISD::UADDSAT, MVT::v8i32, 3 }, // not + pminud + paddd
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v16i16, 1 },
|
|
|
|
{ ISD::USUBSAT, MVT::v32i8, 1 },
|
2019-01-16 02:43:41 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v8i32, 2 }, // pmaxud + psubd
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::FSQRT, MVT::f32, 7 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 7 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v8f32, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::f64, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v2f64, 14 }, // Haswell from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f64, 28 }, // Haswell from http://www.agner.org/
|
2016-06-12 03:23:02 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry AVX1CostTbl[] = {
|
2017-05-08 04:58:55 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::v4i64, 12 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i32, 12 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i16, 12 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::BITREVERSE, MVT::v32i8, 12 }, // 2 x 128-bit Op + extract/insert
|
2016-06-21 07:08:21 +08:00
|
|
|
{ ISD::BSWAP, MVT::v4i64, 4 },
|
|
|
|
{ ISD::BSWAP, MVT::v8i32, 4 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::BSWAP, MVT::v16i16, 4 },
|
2017-05-08 04:58:55 +08:00
|
|
|
{ ISD::CTLZ, MVT::v4i64, 48 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTLZ, MVT::v8i32, 38 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTLZ, MVT::v16i16, 30 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTLZ, MVT::v32i8, 20 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTPOP, MVT::v4i64, 16 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTPOP, MVT::v8i32, 24 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTPOP, MVT::v16i16, 20 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTPOP, MVT::v32i8, 14 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTTZ, MVT::v4i64, 22 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTTZ, MVT::v8i32, 30 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTTZ, MVT::v16i16, 26 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::CTTZ, MVT::v32i8, 20 }, // 2 x 128-bit Op + extract/insert
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::SADDSAT, MVT::v16i16, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::SADDSAT, MVT::v32i8, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::SSUBSAT, MVT::v16i16, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::SSUBSAT, MVT::v32i8, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::UADDSAT, MVT::v16i16, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::UADDSAT, MVT::v32i8, 4 }, // 2 x 128-bit Op + extract/insert
|
2019-01-29 03:19:09 +08:00
|
|
|
{ ISD::UADDSAT, MVT::v8i32, 8 }, // 2 x 128-bit Op + extract/insert
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v16i16, 4 }, // 2 x 128-bit Op + extract/insert
|
|
|
|
{ ISD::USUBSAT, MVT::v32i8, 4 }, // 2 x 128-bit Op + extract/insert
|
2019-01-16 02:43:41 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v8i32, 6 }, // 2 x 128-bit Op + extract/insert
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::FSQRT, MVT::f32, 14 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 14 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v8f32, 28 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::f64, 21 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v2f64, 21 }, // SNB from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f64, 43 }, // SNB from http://www.agner.org/
|
|
|
|
};
|
2018-03-25 23:58:12 +08:00
|
|
|
static const CostTblEntry GLMCostTbl[] = {
|
|
|
|
{ ISD::FSQRT, MVT::f32, 19 }, // sqrtss
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 37 }, // sqrtps
|
|
|
|
{ ISD::FSQRT, MVT::f64, 34 }, // sqrtsd
|
|
|
|
{ ISD::FSQRT, MVT::v2f64, 67 }, // sqrtpd
|
|
|
|
};
|
|
|
|
static const CostTblEntry SLMCostTbl[] = {
|
|
|
|
{ ISD::FSQRT, MVT::f32, 20 }, // sqrtss
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 40 }, // sqrtps
|
|
|
|
{ ISD::FSQRT, MVT::f64, 35 }, // sqrtsd
|
|
|
|
{ ISD::FSQRT, MVT::v2f64, 70 }, // sqrtpd
|
|
|
|
};
|
2016-10-31 20:10:53 +08:00
|
|
|
static const CostTblEntry SSE42CostTbl[] = {
|
2019-01-16 02:43:41 +08:00
|
|
|
{ ISD::USUBSAT, MVT::v4i32, 2 }, // pmaxud + psubd
|
2019-01-29 03:19:09 +08:00
|
|
|
{ ISD::UADDSAT, MVT::v4i32, 3 }, // not + pminud + paddd
|
2017-03-15 19:57:42 +08:00
|
|
|
{ ISD::FSQRT, MVT::f32, 18 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 18 }, // Nehalem from http://www.agner.org/
|
2016-06-12 03:23:02 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry SSSE3CostTbl[] = {
|
|
|
|
{ ISD::BITREVERSE, MVT::v2i64, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v4i32, 5 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i16, 5 },
|
2016-06-21 07:08:21 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::v16i8, 5 },
|
|
|
|
{ ISD::BSWAP, MVT::v2i64, 1 },
|
|
|
|
{ ISD::BSWAP, MVT::v4i32, 1 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::BSWAP, MVT::v8i16, 1 },
|
2016-08-04 18:51:41 +08:00
|
|
|
{ ISD::CTLZ, MVT::v2i64, 23 },
|
|
|
|
{ ISD::CTLZ, MVT::v4i32, 18 },
|
|
|
|
{ ISD::CTLZ, MVT::v8i16, 14 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i8, 9 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::CTPOP, MVT::v2i64, 7 },
|
|
|
|
{ ISD::CTPOP, MVT::v4i32, 11 },
|
|
|
|
{ ISD::CTPOP, MVT::v8i16, 9 },
|
2016-08-04 18:51:41 +08:00
|
|
|
{ ISD::CTPOP, MVT::v16i8, 6 },
|
|
|
|
{ ISD::CTTZ, MVT::v2i64, 10 },
|
|
|
|
{ ISD::CTTZ, MVT::v4i32, 14 },
|
|
|
|
{ ISD::CTTZ, MVT::v8i16, 12 },
|
|
|
|
{ ISD::CTTZ, MVT::v16i8, 9 }
|
2016-06-21 07:08:21 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry SSE2CostTbl[] = {
|
2017-03-16 03:34:55 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::v2i64, 29 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v4i32, 27 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v8i16, 27 },
|
|
|
|
{ ISD::BITREVERSE, MVT::v16i8, 20 },
|
2016-06-21 07:08:21 +08:00
|
|
|
{ ISD::BSWAP, MVT::v2i64, 7 },
|
|
|
|
{ ISD::BSWAP, MVT::v4i32, 7 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::BSWAP, MVT::v8i16, 7 },
|
2016-11-08 22:10:28 +08:00
|
|
|
{ ISD::CTLZ, MVT::v2i64, 25 },
|
|
|
|
{ ISD::CTLZ, MVT::v4i32, 26 },
|
|
|
|
{ ISD::CTLZ, MVT::v8i16, 20 },
|
|
|
|
{ ISD::CTLZ, MVT::v16i8, 17 },
|
2016-07-20 18:41:28 +08:00
|
|
|
{ ISD::CTPOP, MVT::v2i64, 12 },
|
|
|
|
{ ISD::CTPOP, MVT::v4i32, 15 },
|
|
|
|
{ ISD::CTPOP, MVT::v8i16, 13 },
|
2016-08-04 18:51:41 +08:00
|
|
|
{ ISD::CTPOP, MVT::v16i8, 10 },
|
|
|
|
{ ISD::CTTZ, MVT::v2i64, 14 },
|
|
|
|
{ ISD::CTTZ, MVT::v4i32, 18 },
|
|
|
|
{ ISD::CTTZ, MVT::v8i16, 16 },
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::CTTZ, MVT::v16i8, 13 },
|
2019-01-03 19:38:42 +08:00
|
|
|
{ ISD::SADDSAT, MVT::v8i16, 1 },
|
|
|
|
{ ISD::SADDSAT, MVT::v16i8, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v8i16, 1 },
|
|
|
|
{ ISD::SSUBSAT, MVT::v16i8, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v8i16, 1 },
|
|
|
|
{ ISD::UADDSAT, MVT::v16i8, 1 },
|
|
|
|
{ ISD::USUBSAT, MVT::v8i16, 1 },
|
|
|
|
{ ISD::USUBSAT, MVT::v16i8, 1 },
|
2016-10-31 20:10:53 +08:00
|
|
|
{ ISD::FSQRT, MVT::f64, 32 }, // Nehalem from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v2f64, 32 }, // Nehalem from http://www.agner.org/
|
|
|
|
};
|
|
|
|
static const CostTblEntry SSE1CostTbl[] = {
|
2017-03-15 19:57:42 +08:00
|
|
|
{ ISD::FSQRT, MVT::f32, 28 }, // Pentium III from http://www.agner.org/
|
|
|
|
{ ISD::FSQRT, MVT::v4f32, 56 }, // Pentium III from http://www.agner.org/
|
2016-06-12 03:23:02 +08:00
|
|
|
};
|
2019-10-15 00:30:17 +08:00
|
|
|
static const CostTblEntry LZCNT64CostTbl[] = { // 64-bit targets
|
|
|
|
{ ISD::CTLZ, MVT::i64, 1 },
|
|
|
|
};
|
|
|
|
static const CostTblEntry LZCNT32CostTbl[] = { // 32 or 64-bit targets
|
|
|
|
{ ISD::CTLZ, MVT::i32, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::i16, 1 },
|
|
|
|
{ ISD::CTLZ, MVT::i8, 1 },
|
|
|
|
};
|
2019-10-14 22:07:43 +08:00
|
|
|
static const CostTblEntry POPCNT64CostTbl[] = { // 64-bit targets
|
|
|
|
{ ISD::CTPOP, MVT::i64, 1 },
|
|
|
|
};
|
|
|
|
static const CostTblEntry POPCNT32CostTbl[] = { // 32 or 64-bit targets
|
|
|
|
{ ISD::CTPOP, MVT::i32, 1 },
|
|
|
|
{ ISD::CTPOP, MVT::i16, 1 },
|
|
|
|
{ ISD::CTPOP, MVT::i8, 1 },
|
|
|
|
};
|
2017-03-16 03:34:55 +08:00
|
|
|
static const CostTblEntry X64CostTbl[] = { // 64-bit targets
|
2019-01-24 20:10:20 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::i64, 14 },
|
2019-10-15 00:30:17 +08:00
|
|
|
{ ISD::CTLZ, MVT::i64, 4 }, // BSR+XOR or BSR+XOR+CMOV
|
2019-10-14 22:07:43 +08:00
|
|
|
{ ISD::CTPOP, MVT::i64, 10 },
|
2019-01-24 21:36:45 +08:00
|
|
|
{ ISD::SADDO, MVT::i64, 1 },
|
2019-01-24 20:10:20 +08:00
|
|
|
{ ISD::UADDO, MVT::i64, 1 },
|
2017-03-16 03:34:55 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry X86CostTbl[] = { // 32 or 64-bit targets
|
|
|
|
{ ISD::BITREVERSE, MVT::i32, 14 },
|
|
|
|
{ ISD::BITREVERSE, MVT::i16, 14 },
|
2019-01-24 20:10:20 +08:00
|
|
|
{ ISD::BITREVERSE, MVT::i8, 11 },
|
2019-10-15 00:30:17 +08:00
|
|
|
{ ISD::CTLZ, MVT::i32, 4 }, // BSR+XOR or BSR+XOR+CMOV
|
|
|
|
{ ISD::CTLZ, MVT::i16, 4 }, // BSR+XOR or BSR+XOR+CMOV
|
|
|
|
{ ISD::CTLZ, MVT::i8, 4 }, // BSR+XOR or BSR+XOR+CMOV
|
2019-10-14 22:07:43 +08:00
|
|
|
{ ISD::CTPOP, MVT::i32, 8 },
|
|
|
|
{ ISD::CTPOP, MVT::i16, 9 },
|
|
|
|
{ ISD::CTPOP, MVT::i8, 7 },
|
2019-01-24 21:36:45 +08:00
|
|
|
{ ISD::SADDO, MVT::i32, 1 },
|
|
|
|
{ ISD::SADDO, MVT::i16, 1 },
|
|
|
|
{ ISD::SADDO, MVT::i8, 1 },
|
2019-01-24 20:10:20 +08:00
|
|
|
{ ISD::UADDO, MVT::i32, 1 },
|
|
|
|
{ ISD::UADDO, MVT::i16, 1 },
|
|
|
|
{ ISD::UADDO, MVT::i8, 1 },
|
2017-03-16 03:34:55 +08:00
|
|
|
};
|
2016-05-24 16:17:50 +08:00
|
|
|
|
2019-01-24 20:10:20 +08:00
|
|
|
Type *OpTy = RetTy;
|
2016-05-24 16:17:50 +08:00
|
|
|
unsigned ISD = ISD::DELETED_NODE;
|
|
|
|
switch (IID) {
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
case Intrinsic::bitreverse:
|
|
|
|
ISD = ISD::BITREVERSE;
|
|
|
|
break;
|
2016-06-21 07:08:21 +08:00
|
|
|
case Intrinsic::bswap:
|
|
|
|
ISD = ISD::BSWAP;
|
|
|
|
break;
|
2016-08-04 18:51:41 +08:00
|
|
|
case Intrinsic::ctlz:
|
|
|
|
ISD = ISD::CTLZ;
|
|
|
|
break;
|
2016-07-20 18:41:28 +08:00
|
|
|
case Intrinsic::ctpop:
|
|
|
|
ISD = ISD::CTPOP;
|
|
|
|
break;
|
2016-08-04 18:51:41 +08:00
|
|
|
case Intrinsic::cttz:
|
|
|
|
ISD = ISD::CTTZ;
|
|
|
|
break;
|
2019-01-03 19:38:42 +08:00
|
|
|
case Intrinsic::sadd_sat:
|
|
|
|
ISD = ISD::SADDSAT;
|
|
|
|
break;
|
|
|
|
case Intrinsic::ssub_sat:
|
|
|
|
ISD = ISD::SSUBSAT;
|
|
|
|
break;
|
|
|
|
case Intrinsic::uadd_sat:
|
|
|
|
ISD = ISD::UADDSAT;
|
|
|
|
break;
|
|
|
|
case Intrinsic::usub_sat:
|
|
|
|
ISD = ISD::USUBSAT;
|
|
|
|
break;
|
2016-10-31 20:10:53 +08:00
|
|
|
case Intrinsic::sqrt:
|
|
|
|
ISD = ISD::FSQRT;
|
|
|
|
break;
|
2019-01-24 21:36:45 +08:00
|
|
|
case Intrinsic::sadd_with_overflow:
|
|
|
|
case Intrinsic::ssub_with_overflow:
|
|
|
|
// SSUBO has same costs so don't duplicate.
|
|
|
|
ISD = ISD::SADDO;
|
|
|
|
OpTy = RetTy->getContainedType(0);
|
|
|
|
break;
|
2019-01-24 20:10:20 +08:00
|
|
|
case Intrinsic::uadd_with_overflow:
|
|
|
|
case Intrinsic::usub_with_overflow:
|
|
|
|
// USUBO has same costs so don't duplicate.
|
|
|
|
ISD = ISD::UADDO;
|
|
|
|
OpTy = RetTy->getContainedType(0);
|
|
|
|
break;
|
2016-05-24 16:17:50 +08:00
|
|
|
}
|
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ISD != ISD::DELETED_NODE) {
|
|
|
|
// Legalize the type.
|
2019-01-24 20:10:20 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, OpTy);
|
2018-11-20 02:57:31 +08:00
|
|
|
MVT MTy = LT.second;
|
2016-05-24 16:17:50 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
// Attempt to lookup cost.
|
2019-12-06 02:24:10 +08:00
|
|
|
if (ST->useGLMDivSqrtCosts())
|
2018-11-20 02:57:31 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(GLMCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2018-03-25 23:58:12 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2018-03-25 23:58:12 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasCDI())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CDCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-05-18 05:02:18 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasBWI())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-05-18 03:20:20 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-05-18 03:20:20 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasXOP())
|
|
|
|
if (const auto *Entry = CostTableLookup(XOPCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-05-24 16:17:50 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-06-12 03:23:02 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-06-12 03:23:02 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasSSE42())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE42CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-10-31 20:10:53 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasSSSE3())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSSE3CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-06-12 03:23:02 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-06-21 07:08:21 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasSSE1())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE1CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2016-10-31 20:10:53 +08:00
|
|
|
|
2019-10-15 00:30:17 +08:00
|
|
|
if (ST->hasLZCNT()) {
|
|
|
|
if (ST->is64Bit())
|
|
|
|
if (const auto *Entry = CostTableLookup(LZCNT64CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (const auto *Entry = CostTableLookup(LZCNT32CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2019-10-14 22:07:43 +08:00
|
|
|
if (ST->hasPOPCNT()) {
|
|
|
|
if (ST->is64Bit())
|
|
|
|
if (const auto *Entry = CostTableLookup(POPCNT64CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (const auto *Entry = CostTableLookup(POPCNT32CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2019-10-15 00:30:17 +08:00
|
|
|
// TODO - add BMI (TZCNT) scalar handling
|
2019-10-14 22:07:43 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->is64Bit())
|
|
|
|
if (const auto *Entry = CostTableLookup(X64CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-03-16 03:34:55 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(X86CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
2017-03-16 03:34:55 +08:00
|
|
|
|
2017-03-14 14:35:36 +08:00
|
|
|
return BaseT::getIntrinsicInstrCost(IID, RetTy, Tys, FMF, ScalarizationCostPassed);
|
2016-05-24 16:17:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
|
2018-11-13 02:27:54 +08:00
|
|
|
ArrayRef<Value *> Args, FastMathFlags FMF,
|
|
|
|
unsigned VF) {
|
|
|
|
static const CostTblEntry AVX512CostTbl[] = {
|
|
|
|
{ ISD::ROTL, MVT::v8i64, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v4i64, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v16i32, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v8i32, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v4i32, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v8i64, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v4i64, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v2i64, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v16i32, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v8i32, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v4i32, 1 }
|
|
|
|
};
|
2018-11-13 20:09:27 +08:00
|
|
|
// XOP: ROTL = VPROT(X,Y), ROTR = VPROT(X,SUB(0,Y))
|
2018-11-13 02:27:54 +08:00
|
|
|
static const CostTblEntry XOPCostTbl[] = {
|
|
|
|
{ ISD::ROTL, MVT::v4i64, 4 },
|
|
|
|
{ ISD::ROTL, MVT::v8i32, 4 },
|
|
|
|
{ ISD::ROTL, MVT::v16i16, 4 },
|
|
|
|
{ ISD::ROTL, MVT::v32i8, 4 },
|
|
|
|
{ ISD::ROTL, MVT::v2i64, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v4i32, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v8i16, 1 },
|
|
|
|
{ ISD::ROTL, MVT::v16i8, 1 },
|
|
|
|
{ ISD::ROTR, MVT::v4i64, 6 },
|
|
|
|
{ ISD::ROTR, MVT::v8i32, 6 },
|
|
|
|
{ ISD::ROTR, MVT::v16i16, 6 },
|
|
|
|
{ ISD::ROTR, MVT::v32i8, 6 },
|
|
|
|
{ ISD::ROTR, MVT::v2i64, 2 },
|
|
|
|
{ ISD::ROTR, MVT::v4i32, 2 },
|
|
|
|
{ ISD::ROTR, MVT::v8i16, 2 },
|
|
|
|
{ ISD::ROTR, MVT::v16i8, 2 }
|
|
|
|
};
|
|
|
|
static const CostTblEntry X64CostTbl[] = { // 64-bit targets
|
|
|
|
{ ISD::ROTL, MVT::i64, 1 },
|
2018-11-14 20:24:50 +08:00
|
|
|
{ ISD::ROTR, MVT::i64, 1 },
|
2018-12-05 19:12:12 +08:00
|
|
|
{ ISD::FSHL, MVT::i64, 4 }
|
2018-11-13 02:27:54 +08:00
|
|
|
};
|
|
|
|
static const CostTblEntry X86CostTbl[] = { // 32 or 64-bit targets
|
|
|
|
{ ISD::ROTL, MVT::i32, 1 },
|
|
|
|
{ ISD::ROTL, MVT::i16, 1 },
|
|
|
|
{ ISD::ROTL, MVT::i8, 1 },
|
|
|
|
{ ISD::ROTR, MVT::i32, 1 },
|
|
|
|
{ ISD::ROTR, MVT::i16, 1 },
|
2018-11-14 20:24:50 +08:00
|
|
|
{ ISD::ROTR, MVT::i8, 1 },
|
2018-12-05 19:12:12 +08:00
|
|
|
{ ISD::FSHL, MVT::i32, 4 },
|
|
|
|
{ ISD::FSHL, MVT::i16, 4 },
|
|
|
|
{ ISD::FSHL, MVT::i8, 4 }
|
2018-11-13 02:27:54 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
unsigned ISD = ISD::DELETED_NODE;
|
|
|
|
switch (IID) {
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
case Intrinsic::fshl:
|
2018-12-05 19:12:12 +08:00
|
|
|
ISD = ISD::FSHL;
|
2018-11-13 02:27:54 +08:00
|
|
|
if (Args[0] == Args[1])
|
|
|
|
ISD = ISD::ROTL;
|
|
|
|
break;
|
|
|
|
case Intrinsic::fshr:
|
2018-12-05 19:12:12 +08:00
|
|
|
// FSHR has same costs so don't duplicate.
|
|
|
|
ISD = ISD::FSHL;
|
2018-11-13 02:27:54 +08:00
|
|
|
if (Args[0] == Args[1])
|
|
|
|
ISD = ISD::ROTR;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ISD != ISD::DELETED_NODE) {
|
|
|
|
// Legalize the type.
|
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, RetTy);
|
|
|
|
MVT MTy = LT.second;
|
2018-11-13 02:27:54 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
// Attempt to lookup cost.
|
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2018-11-13 02:27:54 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->hasXOP())
|
|
|
|
if (const auto *Entry = CostTableLookup(XOPCostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2018-11-13 02:27:54 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (ST->is64Bit())
|
|
|
|
if (const auto *Entry = CostTableLookup(X64CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2018-11-13 02:27:54 +08:00
|
|
|
|
2018-11-20 02:57:31 +08:00
|
|
|
if (const auto *Entry = CostTableLookup(X86CostTbl, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
2018-11-13 02:27:54 +08:00
|
|
|
|
2017-03-14 14:35:36 +08:00
|
|
|
return BaseT::getIntrinsicInstrCost(IID, RetTy, Args, FMF, VF);
|
2016-05-24 16:17:50 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
|
2019-11-28 02:33:11 +08:00
|
|
|
static const CostTblEntry SLMCostTbl[] = {
|
|
|
|
{ ISD::EXTRACT_VECTOR_ELT, MVT::i8, 4 },
|
|
|
|
{ ISD::EXTRACT_VECTOR_ELT, MVT::i16, 4 },
|
|
|
|
{ ISD::EXTRACT_VECTOR_ELT, MVT::i32, 4 },
|
|
|
|
{ ISD::EXTRACT_VECTOR_ELT, MVT::i64, 7 }
|
|
|
|
};
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
assert(Val->isVectorTy() && "This must be a vector type");
|
|
|
|
|
2016-05-26 01:27:54 +08:00
|
|
|
Type *ScalarType = Val->getScalarType();
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
if (Index != -1U) {
|
|
|
|
// Legalize the type.
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Val);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
|
|
|
// This type is legalized to a scalar type.
|
|
|
|
if (!LT.second.isVector())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
// The type may be split. Normalize the index to the new type.
|
|
|
|
unsigned Width = LT.second.getVectorNumElements();
|
|
|
|
Index = Index % Width;
|
|
|
|
|
2019-12-07 02:29:31 +08:00
|
|
|
if (Index == 0) {
|
|
|
|
// Floating point scalars are already located in index #0.
|
|
|
|
if (ScalarType->isFloatingPointTy())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
// Assume movd/movq XMM <-> GPR is relatively cheap on all targets.
|
|
|
|
if (ScalarType->isIntegerTy())
|
|
|
|
return 1;
|
|
|
|
}
|
2019-11-28 02:33:11 +08:00
|
|
|
|
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
|
|
assert(ISD && "Unexpected vector opcode");
|
|
|
|
MVT MScalarTy = LT.second.getScalarType();
|
|
|
|
if (ST->isSLM())
|
|
|
|
if (auto *Entry = CostTableLookup(SLMCostTbl, ISD, MScalarTy))
|
|
|
|
return LT.first * Entry->Cost;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2016-05-26 01:27:54 +08:00
|
|
|
// Add to the base cost if we know that the extracted element of a vector is
|
|
|
|
// destined to be moved to and used in the integer register file.
|
|
|
|
int RegisterFileMoveCost = 0;
|
|
|
|
if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())
|
|
|
|
RegisterFileMoveCost = 1;
|
|
|
|
|
|
|
|
return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
}
|
|
|
|
|
2019-10-22 23:16:52 +08:00
|
|
|
int X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
|
|
|
|
MaybeAlign Alignment, unsigned AddressSpace,
|
|
|
|
const Instruction *I) {
|
2013-12-05 13:44:44 +08:00
|
|
|
// Handle non-power-of-two vectors such as <3 x float>
|
2013-06-28 01:52:04 +08:00
|
|
|
if (VectorType *VTy = dyn_cast<VectorType>(Src)) {
|
|
|
|
unsigned NumElem = VTy->getVectorNumElements();
|
|
|
|
|
|
|
|
// Handle a few common cases:
|
|
|
|
// <3 x float>
|
|
|
|
if (NumElem == 3 && VTy->getScalarSizeInBits() == 32)
|
|
|
|
// Cost = 64 bit store + extract + 32 bit store.
|
|
|
|
return 3;
|
|
|
|
|
|
|
|
// <3 x double>
|
|
|
|
if (NumElem == 3 && VTy->getScalarSizeInBits() == 64)
|
|
|
|
// Cost = 128 bit store + unpack + 64 bit store.
|
|
|
|
return 3;
|
|
|
|
|
2013-12-05 13:44:44 +08:00
|
|
|
// Assume that all other non-power-of-two numbers are scalarized.
|
2013-06-28 01:52:04 +08:00
|
|
|
if (!isPowerOf2_32(NumElem)) {
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = BaseT::getMemoryOpCost(Opcode, VTy->getScalarType(), Alignment,
|
|
|
|
AddressSpace);
|
|
|
|
int SplitCost = getScalarizationOverhead(Src, Opcode == Instruction::Load,
|
|
|
|
Opcode == Instruction::Store);
|
2013-06-28 01:52:04 +08:00
|
|
|
return NumElem * Cost + SplitCost;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
// Legalize the type.
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Src);
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
assert((Opcode == Instruction::Load || Opcode == Instruction::Store) &&
|
|
|
|
"Invalid Opcode");
|
|
|
|
|
|
|
|
// Each load/store unit costs 1.
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = LT.first * 1;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
2016-03-10 06:23:33 +08:00
|
|
|
// This isn't exactly right. We're using slow unaligned 32-byte accesses as a
|
|
|
|
// proxy for a double-pumped AVX memory interface such as on Sandybridge.
|
|
|
|
if (LT.second.getStoreSize() == 32 && ST->isUnalignedMem32Slow())
|
|
|
|
Cost *= 2;
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
|
|
|
|
return Cost;
|
|
|
|
}
|
2013-07-13 03:16:07 +08:00
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int X86TTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *SrcTy,
|
|
|
|
unsigned Alignment,
|
|
|
|
unsigned AddressSpace) {
|
2019-06-03 02:06:42 +08:00
|
|
|
bool IsLoad = (Instruction::Load == Opcode);
|
|
|
|
bool IsStore = (Instruction::Store == Opcode);
|
|
|
|
|
2015-01-25 16:44:46 +08:00
|
|
|
VectorType *SrcVTy = dyn_cast<VectorType>(SrcTy);
|
|
|
|
if (!SrcVTy)
|
|
|
|
// To calculate scalar take the regular cost, without mask
|
2019-10-22 23:16:52 +08:00
|
|
|
return getMemoryOpCost(Opcode, SrcTy, MaybeAlign(Alignment), AddressSpace);
|
2015-01-25 16:44:46 +08:00
|
|
|
|
|
|
|
unsigned NumElem = SrcVTy->getVectorNumElements();
|
|
|
|
VectorType *MaskTy =
|
2019-06-03 02:06:42 +08:00
|
|
|
VectorType::get(Type::getInt8Ty(SrcVTy->getContext()), NumElem);
|
2019-10-14 18:00:21 +08:00
|
|
|
if ((IsLoad && !isLegalMaskedLoad(SrcVTy, MaybeAlign(Alignment))) ||
|
|
|
|
(IsStore && !isLegalMaskedStore(SrcVTy, MaybeAlign(Alignment))) ||
|
|
|
|
!isPowerOf2_32(NumElem)) {
|
2015-01-25 16:44:46 +08:00
|
|
|
// Scalarization
|
2015-08-06 02:08:10 +08:00
|
|
|
int MaskSplitCost = getScalarizationOverhead(MaskTy, false, true);
|
|
|
|
int ScalarCompareCost = getCmpSelInstrCost(
|
2016-04-14 12:36:40 +08:00
|
|
|
Instruction::ICmp, Type::getInt8Ty(SrcVTy->getContext()), nullptr);
|
2015-08-06 02:08:10 +08:00
|
|
|
int BranchCost = getCFInstrCost(Instruction::Br);
|
|
|
|
int MaskCmpCost = NumElem * (BranchCost + ScalarCompareCost);
|
|
|
|
|
2019-06-03 02:06:42 +08:00
|
|
|
int ValueSplitCost = getScalarizationOverhead(SrcVTy, IsLoad, IsStore);
|
2015-08-06 02:08:10 +08:00
|
|
|
int MemopCost =
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
NumElem * BaseT::getMemoryOpCost(Opcode, SrcVTy->getScalarType(),
|
2019-10-22 23:16:52 +08:00
|
|
|
MaybeAlign(Alignment), AddressSpace);
|
2015-01-25 16:44:46 +08:00
|
|
|
return MemopCost + ValueSplitCost + MaskSplitCost + MaskCmpCost;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Legalize the type.
|
2015-08-06 02:08:10 +08:00
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, SrcVTy);
|
2015-10-29 02:15:46 +08:00
|
|
|
auto VT = TLI->getValueType(DL, SrcVTy);
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = 0;
|
2015-10-29 02:15:46 +08:00
|
|
|
if (VT.isSimple() && LT.second != VT.getSimpleVT() &&
|
2015-01-25 16:44:46 +08:00
|
|
|
LT.second.getVectorNumElements() == NumElem)
|
|
|
|
// Promotion requires expand/truncate for data and a shuffle for mask.
|
2019-04-07 21:26:09 +08:00
|
|
|
Cost += getShuffleCost(TTI::SK_PermuteTwoSrc, SrcVTy, 0, nullptr) +
|
|
|
|
getShuffleCost(TTI::SK_PermuteTwoSrc, MaskTy, 0, nullptr);
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
|
2015-01-25 16:44:46 +08:00
|
|
|
else if (LT.second.getVectorNumElements() > NumElem) {
|
|
|
|
VectorType *NewMaskTy = VectorType::get(MaskTy->getVectorElementType(),
|
|
|
|
LT.second.getVectorNumElements());
|
|
|
|
// Expanding requires fill mask with zeroes
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
Cost += getShuffleCost(TTI::SK_InsertSubvector, NewMaskTy, 0, MaskTy);
|
2015-01-25 16:44:46 +08:00
|
|
|
}
|
2019-06-03 02:06:42 +08:00
|
|
|
|
[CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.
e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;
Differential Revision: https://reviews.llvm.org/D61257
llvm-svn: 362338
2019-06-03 04:37:02 +08:00
|
|
|
// Pre-AVX512 - each maskmov load costs 2 + store costs ~8.
|
2015-01-25 16:44:46 +08:00
|
|
|
if (!ST->hasAVX512())
|
[CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.
e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;
Differential Revision: https://reviews.llvm.org/D61257
llvm-svn: 362338
2019-06-03 04:37:02 +08:00
|
|
|
return Cost + LT.first * (IsLoad ? 2 : 8);
|
2015-01-25 16:44:46 +08:00
|
|
|
|
|
|
|
// AVX-512 masked load/store is cheapper
|
2019-06-03 02:06:42 +08:00
|
|
|
return Cost + LT.first;
|
2015-01-25 16:44:46 +08:00
|
|
|
}
|
|
|
|
|
2017-01-05 22:03:41 +08:00
|
|
|
int X86TTIImpl::getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
|
|
|
|
const SCEV *Ptr) {
|
2013-07-13 03:16:07 +08:00
|
|
|
// Address computations in vectorized code with non-consecutive addresses will
|
|
|
|
// likely result in more instructions compared to scalar code where the
|
|
|
|
// computation can more often be merged into the index mode. The resulting
|
|
|
|
// extra micro-ops can significantly decrease throughput.
|
2019-05-06 04:03:51 +08:00
|
|
|
const unsigned NumVectorInstToHideOverhead = 10;
|
2013-07-13 03:16:07 +08:00
|
|
|
|
2017-01-05 22:03:41 +08:00
|
|
|
// Cost modeling of Strided Access Computation is hidden by the indexing
|
|
|
|
// modes of X86 regardless of the stride value. We dont believe that there
|
|
|
|
// is a difference between constant strided access in gerenal and constant
|
|
|
|
// strided value which is less than or equal to 64.
|
|
|
|
// Even in the case of (loop invariant) stride whose value is not known at
|
|
|
|
// compile time, the address computation will not incur more than one extra
|
|
|
|
// ADD instruction.
|
|
|
|
if (Ty->isVectorTy() && SE) {
|
|
|
|
if (!BaseT::isStridedAccess(Ptr))
|
|
|
|
return NumVectorInstToHideOverhead;
|
|
|
|
if (!BaseT::getConstantStrideStep(SE, Ptr))
|
|
|
|
return 1;
|
|
|
|
}
|
2013-07-13 03:16:07 +08:00
|
|
|
|
2017-01-05 22:03:41 +08:00
|
|
|
return BaseT::getAddressComputationCost(Ty, SE, Ptr);
|
2013-07-13 03:16:07 +08:00
|
|
|
}
|
2013-09-20 01:48:48 +08:00
|
|
|
|
2017-07-31 22:19:32 +08:00
|
|
|
int X86TTIImpl::getArithmeticReductionCost(unsigned Opcode, Type *ValTy,
|
|
|
|
bool IsPairwise) {
|
2014-12-04 13:20:33 +08:00
|
|
|
// We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
|
|
|
|
// and make it as the cost.
|
|
|
|
|
2019-11-07 01:55:23 +08:00
|
|
|
static const CostTblEntry SLMCostTblPairWise[] = {
|
|
|
|
{ ISD::FADD, MVT::v2f64, 3 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 5 },
|
|
|
|
};
|
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
static const CostTblEntry SSE2CostTblPairWise[] = {
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::FADD, MVT::v2f64, 2 },
|
|
|
|
{ ISD::FADD, MVT::v4f32, 4 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 2 }, // The data reported by the IACA tool is "1.6".
|
2019-08-08 00:24:26 +08:00
|
|
|
{ ISD::ADD, MVT::v2i32, 2 }, // FIXME: chosen to be less than v4i32.
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::ADD, MVT::v4i32, 3 }, // The data reported by the IACA tool is "3.5".
|
2019-08-08 00:24:26 +08:00
|
|
|
{ ISD::ADD, MVT::v2i16, 3 }, // FIXME: chosen to be less than v4i16
|
|
|
|
{ ISD::ADD, MVT::v4i16, 4 }, // FIXME: chosen to be less than v8i16
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::ADD, MVT::v8i16, 5 },
|
2019-10-12 21:21:50 +08:00
|
|
|
{ ISD::ADD, MVT::v2i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v4i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v8i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v16i8, 3 },
|
2013-09-20 01:48:48 +08:00
|
|
|
};
|
2014-12-04 13:20:33 +08:00
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry AVX1CostTblPairWise[] = {
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::FADD, MVT::v4f64, 5 },
|
|
|
|
{ ISD::FADD, MVT::v8f32, 7 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 1 }, // The data reported by the IACA tool is "1.5".
|
|
|
|
{ ISD::ADD, MVT::v4i64, 5 }, // The data reported by the IACA tool is "4.8".
|
|
|
|
{ ISD::ADD, MVT::v8i32, 5 },
|
2019-10-12 21:21:50 +08:00
|
|
|
{ ISD::ADD, MVT::v16i16, 6 },
|
|
|
|
{ ISD::ADD, MVT::v32i8, 4 },
|
2013-09-20 01:48:48 +08:00
|
|
|
};
|
|
|
|
|
2019-11-07 01:55:23 +08:00
|
|
|
static const CostTblEntry SLMCostTblNoPairWise[] = {
|
|
|
|
{ ISD::FADD, MVT::v2f64, 3 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 5 },
|
|
|
|
};
|
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
static const CostTblEntry SSE2CostTblNoPairWise[] = {
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::FADD, MVT::v2f64, 2 },
|
|
|
|
{ ISD::FADD, MVT::v4f32, 4 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 2 }, // The data reported by the IACA tool is "1.6".
|
2019-08-08 00:24:26 +08:00
|
|
|
{ ISD::ADD, MVT::v2i32, 2 }, // FIXME: chosen to be less than v4i32
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::ADD, MVT::v4i32, 3 }, // The data reported by the IACA tool is "3.3".
|
2019-08-08 00:24:26 +08:00
|
|
|
{ ISD::ADD, MVT::v2i16, 2 }, // The data reported by the IACA tool is "4.3".
|
|
|
|
{ ISD::ADD, MVT::v4i16, 3 }, // The data reported by the IACA tool is "4.3".
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::ADD, MVT::v8i16, 4 }, // The data reported by the IACA tool is "4.3".
|
2019-10-12 21:21:50 +08:00
|
|
|
{ ISD::ADD, MVT::v2i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v4i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v8i8, 2 },
|
|
|
|
{ ISD::ADD, MVT::v16i8, 3 },
|
2013-09-20 01:48:48 +08:00
|
|
|
};
|
2014-12-04 13:20:33 +08:00
|
|
|
|
2015-10-28 12:02:12 +08:00
|
|
|
static const CostTblEntry AVX1CostTblNoPairWise[] = {
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::FADD, MVT::v4f64, 3 },
|
2019-10-12 21:21:50 +08:00
|
|
|
{ ISD::FADD, MVT::v4f32, 3 },
|
2013-09-20 01:48:48 +08:00
|
|
|
{ ISD::FADD, MVT::v8f32, 4 },
|
|
|
|
{ ISD::ADD, MVT::v2i64, 1 }, // The data reported by the IACA tool is "1.5".
|
|
|
|
{ ISD::ADD, MVT::v4i64, 3 },
|
|
|
|
{ ISD::ADD, MVT::v8i32, 5 },
|
2019-10-12 21:21:50 +08:00
|
|
|
{ ISD::ADD, MVT::v16i16, 5 },
|
|
|
|
{ ISD::ADD, MVT::v32i8, 4 },
|
2013-09-20 01:48:48 +08:00
|
|
|
};
|
2014-12-04 13:20:33 +08:00
|
|
|
|
2019-08-08 00:24:26 +08:00
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
|
|
|
|
// Before legalizing the type, give a chance to look up illegal narrow types
|
|
|
|
// in the table.
|
|
|
|
// FIXME: Is there a better way to do this?
|
|
|
|
EVT VT = TLI->getValueType(DL, ValTy);
|
2019-09-30 07:32:37 +08:00
|
|
|
if (VT.isSimple()) {
|
2019-08-08 00:24:26 +08:00
|
|
|
MVT MTy = VT.getSimpleVT();
|
|
|
|
if (IsPairwise) {
|
2019-11-07 01:55:23 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTblPairWise, ISD, MTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
2019-08-08 00:24:26 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblPairWise, ISD, MTy))
|
2019-08-08 00:24:26 +08:00
|
|
|
return Entry->Cost;
|
|
|
|
} else {
|
2019-11-07 01:55:23 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTblNoPairWise, ISD, MTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
2019-08-08 00:24:26 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
|
|
|
|
return Entry->Cost;
|
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblNoPairWise, ISD, MTy))
|
2019-08-08 00:24:26 +08:00
|
|
|
return Entry->Cost;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
|
|
|
|
|
|
|
|
MVT MTy = LT.second;
|
|
|
|
|
2013-09-20 01:48:48 +08:00
|
|
|
if (IsPairwise) {
|
2019-11-07 01:55:23 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2014-12-04 13:20:33 +08:00
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblPairWise, ISD, MTy))
|
2015-10-27 12:14:24 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2013-09-20 01:48:48 +08:00
|
|
|
} else {
|
2019-11-07 01:55:23 +08:00
|
|
|
if (ST->isSLM())
|
|
|
|
if (const auto *Entry = CostTableLookup(SLMCostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
2015-10-27 12:14:24 +08:00
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2014-12-04 13:20:33 +08:00
|
|
|
|
2019-10-12 21:21:50 +08:00
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblNoPairWise, ISD, MTy))
|
2015-10-27 12:14:24 +08:00
|
|
|
return LT.first * Entry->Cost;
|
2013-09-20 01:48:48 +08:00
|
|
|
}
|
|
|
|
|
2019-11-05 14:23:16 +08:00
|
|
|
// FIXME: These assume a naive kshift+binop lowering, which is probably
|
|
|
|
// conservative in most cases.
|
|
|
|
// FIXME: This doesn't cost large types like v128i1 correctly.
|
|
|
|
static const CostTblEntry AVX512BoolReduction[] = {
|
|
|
|
{ ISD::AND, MVT::v2i1, 3 },
|
|
|
|
{ ISD::AND, MVT::v4i1, 5 },
|
|
|
|
{ ISD::AND, MVT::v8i1, 7 },
|
|
|
|
{ ISD::AND, MVT::v16i1, 9 },
|
|
|
|
{ ISD::AND, MVT::v32i1, 11 },
|
|
|
|
{ ISD::AND, MVT::v64i1, 13 },
|
|
|
|
{ ISD::OR, MVT::v2i1, 3 },
|
|
|
|
{ ISD::OR, MVT::v4i1, 5 },
|
|
|
|
{ ISD::OR, MVT::v8i1, 7 },
|
|
|
|
{ ISD::OR, MVT::v16i1, 9 },
|
|
|
|
{ ISD::OR, MVT::v32i1, 11 },
|
|
|
|
{ ISD::OR, MVT::v64i1, 13 },
|
|
|
|
};
|
|
|
|
|
2019-04-17 18:58:19 +08:00
|
|
|
static const CostTblEntry AVX2BoolReduction[] = {
|
|
|
|
{ ISD::AND, MVT::v16i16, 2 }, // vpmovmskb + cmp
|
|
|
|
{ ISD::AND, MVT::v32i8, 2 }, // vpmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v16i16, 2 }, // vpmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v32i8, 2 }, // vpmovmskb + cmp
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX1BoolReduction[] = {
|
|
|
|
{ ISD::AND, MVT::v4i64, 2 }, // vmovmskpd + cmp
|
|
|
|
{ ISD::AND, MVT::v8i32, 2 }, // vmovmskps + cmp
|
|
|
|
{ ISD::AND, MVT::v16i16, 4 }, // vextractf128 + vpand + vpmovmskb + cmp
|
|
|
|
{ ISD::AND, MVT::v32i8, 4 }, // vextractf128 + vpand + vpmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v4i64, 2 }, // vmovmskpd + cmp
|
|
|
|
{ ISD::OR, MVT::v8i32, 2 }, // vmovmskps + cmp
|
|
|
|
{ ISD::OR, MVT::v16i16, 4 }, // vextractf128 + vpor + vpmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v32i8, 4 }, // vextractf128 + vpor + vpmovmskb + cmp
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2BoolReduction[] = {
|
|
|
|
{ ISD::AND, MVT::v2i64, 2 }, // movmskpd + cmp
|
|
|
|
{ ISD::AND, MVT::v4i32, 2 }, // movmskps + cmp
|
|
|
|
{ ISD::AND, MVT::v8i16, 2 }, // pmovmskb + cmp
|
|
|
|
{ ISD::AND, MVT::v16i8, 2 }, // pmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v2i64, 2 }, // movmskpd + cmp
|
|
|
|
{ ISD::OR, MVT::v4i32, 2 }, // movmskps + cmp
|
|
|
|
{ ISD::OR, MVT::v8i16, 2 }, // pmovmskb + cmp
|
|
|
|
{ ISD::OR, MVT::v16i8, 2 }, // pmovmskb + cmp
|
|
|
|
};
|
|
|
|
|
|
|
|
// Handle bool allof/anyof patterns.
|
2019-10-27 01:26:04 +08:00
|
|
|
if (!IsPairwise && ValTy->getVectorElementType()->isIntegerTy(1)) {
|
2019-11-05 14:23:16 +08:00
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512BoolReduction, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2019-04-17 18:58:19 +08:00
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2BoolReduction, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1BoolReduction, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2BoolReduction, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
}
|
|
|
|
|
2017-07-31 22:19:32 +08:00
|
|
|
return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);
|
2013-09-20 01:48:48 +08:00
|
|
|
}
|
|
|
|
|
2017-09-08 21:49:36 +08:00
|
|
|
int X86TTIImpl::getMinMaxReductionCost(Type *ValTy, Type *CondTy,
|
|
|
|
bool IsPairwise, bool IsUnsigned) {
|
|
|
|
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
|
|
|
|
|
|
|
|
MVT MTy = LT.second;
|
|
|
|
|
|
|
|
int ISD;
|
|
|
|
if (ValTy->isIntOrIntVectorTy()) {
|
|
|
|
ISD = IsUnsigned ? ISD::UMIN : ISD::SMIN;
|
|
|
|
} else {
|
|
|
|
assert(ValTy->isFPOrFPVectorTy() &&
|
|
|
|
"Expected float point or integer vector type.");
|
|
|
|
ISD = ISD::FMINNUM;
|
|
|
|
}
|
|
|
|
|
|
|
|
// We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
|
|
|
|
// and make it as the cost.
|
|
|
|
|
2019-05-12 01:12:52 +08:00
|
|
|
static const CostTblEntry SSE1CostTblPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v4f32, 4},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2CostTblPairWise[] = {
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::FMINNUM, MVT::v2f64, 3},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v2i64, 6},
|
|
|
|
{ISD::UMIN, MVT::v2i64, 8},
|
|
|
|
{ISD::SMIN, MVT::v4i32, 6},
|
|
|
|
{ISD::UMIN, MVT::v4i32, 8},
|
|
|
|
{ISD::SMIN, MVT::v8i16, 4},
|
|
|
|
{ISD::UMIN, MVT::v8i16, 6},
|
|
|
|
{ISD::SMIN, MVT::v16i8, 8},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 6},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE41CostTblPairWise[] = {
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::FMINNUM, MVT::v4f32, 2},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v2i64, 9},
|
|
|
|
{ISD::UMIN, MVT::v2i64,10},
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is "1.5"
|
|
|
|
{ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is "1.8"
|
|
|
|
{ISD::SMIN, MVT::v8i16, 2},
|
|
|
|
{ISD::UMIN, MVT::v8i16, 2},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i8, 3},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 3},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE42CostTblPairWise[] = {
|
|
|
|
{ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is "6.8"
|
|
|
|
{ISD::UMIN, MVT::v2i64, 8}, // The data reported by the IACA is "8.6"
|
2017-09-08 21:49:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX1CostTblPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v4f32, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v4f64, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v8f32, 2},
|
|
|
|
{ISD::SMIN, MVT::v2i64, 3},
|
|
|
|
{ISD::UMIN, MVT::v2i64, 3},
|
|
|
|
{ISD::SMIN, MVT::v4i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v4i32, 1},
|
|
|
|
{ISD::SMIN, MVT::v8i16, 1},
|
|
|
|
{ISD::UMIN, MVT::v8i16, 1},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i8, 2},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 2},
|
|
|
|
{ISD::SMIN, MVT::v4i64, 7},
|
|
|
|
{ISD::UMIN, MVT::v4i64, 7},
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::SMIN, MVT::v8i32, 3},
|
|
|
|
{ISD::UMIN, MVT::v8i32, 3},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i16, 3},
|
|
|
|
{ISD::UMIN, MVT::v16i16, 3},
|
|
|
|
{ISD::SMIN, MVT::v32i8, 3},
|
|
|
|
{ISD::UMIN, MVT::v32i8, 3},
|
2017-09-08 21:49:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2CostTblPairWise[] = {
|
|
|
|
{ISD::SMIN, MVT::v4i64, 2},
|
|
|
|
{ISD::UMIN, MVT::v4i64, 2},
|
|
|
|
{ISD::SMIN, MVT::v8i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v8i32, 1},
|
|
|
|
{ISD::SMIN, MVT::v16i16, 1},
|
|
|
|
{ISD::UMIN, MVT::v16i16, 1},
|
|
|
|
{ISD::SMIN, MVT::v32i8, 2},
|
|
|
|
{ISD::UMIN, MVT::v32i8, 2},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512CostTblPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v8f64, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v16f32, 2},
|
|
|
|
{ISD::SMIN, MVT::v8i64, 2},
|
|
|
|
{ISD::UMIN, MVT::v8i64, 2},
|
|
|
|
{ISD::SMIN, MVT::v16i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v16i32, 1},
|
|
|
|
};
|
|
|
|
|
2019-05-12 01:12:52 +08:00
|
|
|
static const CostTblEntry SSE1CostTblNoPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v4f32, 4},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE2CostTblNoPairWise[] = {
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::FMINNUM, MVT::v2f64, 3},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v2i64, 6},
|
|
|
|
{ISD::UMIN, MVT::v2i64, 8},
|
|
|
|
{ISD::SMIN, MVT::v4i32, 6},
|
|
|
|
{ISD::UMIN, MVT::v4i32, 8},
|
|
|
|
{ISD::SMIN, MVT::v8i16, 4},
|
|
|
|
{ISD::UMIN, MVT::v8i16, 6},
|
|
|
|
{ISD::SMIN, MVT::v16i8, 8},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 6},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE41CostTblNoPairWise[] = {
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::FMINNUM, MVT::v4f32, 3},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v2i64, 9},
|
|
|
|
{ISD::UMIN, MVT::v2i64,11},
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is "1.5"
|
|
|
|
{ISD::UMIN, MVT::v4i32, 2}, // The data reported by the IACA is "1.8"
|
|
|
|
{ISD::SMIN, MVT::v8i16, 1}, // The data reported by the IACA is "1.5"
|
|
|
|
{ISD::UMIN, MVT::v8i16, 2}, // The data reported by the IACA is "1.8"
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i8, 3},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 3},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry SSE42CostTblNoPairWise[] = {
|
|
|
|
{ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is "6.8"
|
|
|
|
{ISD::UMIN, MVT::v2i64, 9}, // The data reported by the IACA is "8.6"
|
2017-09-08 21:49:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX1CostTblNoPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v4f32, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v4f64, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v8f32, 1},
|
|
|
|
{ISD::SMIN, MVT::v2i64, 3},
|
|
|
|
{ISD::UMIN, MVT::v2i64, 3},
|
|
|
|
{ISD::SMIN, MVT::v4i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v4i32, 1},
|
|
|
|
{ISD::SMIN, MVT::v8i16, 1},
|
|
|
|
{ISD::UMIN, MVT::v8i16, 1},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i8, 2},
|
|
|
|
{ISD::UMIN, MVT::v16i8, 2},
|
|
|
|
{ISD::SMIN, MVT::v4i64, 7},
|
|
|
|
{ISD::UMIN, MVT::v4i64, 7},
|
2017-09-08 21:49:36 +08:00
|
|
|
{ISD::SMIN, MVT::v8i32, 2},
|
|
|
|
{ISD::UMIN, MVT::v8i32, 2},
|
2019-05-12 01:12:52 +08:00
|
|
|
{ISD::SMIN, MVT::v16i16, 2},
|
|
|
|
{ISD::UMIN, MVT::v16i16, 2},
|
|
|
|
{ISD::SMIN, MVT::v32i8, 2},
|
|
|
|
{ISD::UMIN, MVT::v32i8, 2},
|
2017-09-08 21:49:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2CostTblNoPairWise[] = {
|
|
|
|
{ISD::SMIN, MVT::v4i64, 1},
|
|
|
|
{ISD::UMIN, MVT::v4i64, 1},
|
|
|
|
{ISD::SMIN, MVT::v8i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v8i32, 1},
|
|
|
|
{ISD::SMIN, MVT::v16i16, 1},
|
|
|
|
{ISD::UMIN, MVT::v16i16, 1},
|
|
|
|
{ISD::SMIN, MVT::v32i8, 1},
|
|
|
|
{ISD::UMIN, MVT::v32i8, 1},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX512CostTblNoPairWise[] = {
|
|
|
|
{ISD::FMINNUM, MVT::v8f64, 1},
|
|
|
|
{ISD::FMINNUM, MVT::v16f32, 2},
|
|
|
|
{ISD::SMIN, MVT::v8i64, 1},
|
|
|
|
{ISD::UMIN, MVT::v8i64, 1},
|
|
|
|
{ISD::SMIN, MVT::v16i32, 1},
|
|
|
|
{ISD::UMIN, MVT::v16i32, 1},
|
|
|
|
};
|
|
|
|
|
|
|
|
if (IsPairwise) {
|
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX512CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE42())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2019-05-12 01:12:52 +08:00
|
|
|
|
|
|
|
if (ST->hasSSE41())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE41CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE1())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE1CostTblPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-09-08 21:49:36 +08:00
|
|
|
} else {
|
|
|
|
if (ST->hasAVX512())
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasAVX2())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX2CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasAVX())
|
|
|
|
if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE42())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2019-05-12 01:12:52 +08:00
|
|
|
|
|
|
|
if (ST->hasSSE41())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE41CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE2())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE2CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
|
|
|
|
|
|
|
if (ST->hasSSE1())
|
|
|
|
if (const auto *Entry = CostTableLookup(SSE1CostTblNoPairWise, ISD, MTy))
|
|
|
|
return LT.first * Entry->Cost;
|
2017-09-08 21:49:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsPairwise, IsUnsigned);
|
|
|
|
}
|
|
|
|
|
2018-05-01 23:54:18 +08:00
|
|
|
/// Calculate the cost of materializing a 64-bit value. This helper
|
2014-06-10 08:32:29 +08:00
|
|
|
/// method might only calculate a fraction of a larger immediate. Therefore it
|
|
|
|
/// is valid to return a cost of ZERO.
|
2015-08-06 02:08:10 +08:00
|
|
|
int X86TTIImpl::getIntImmCost(int64_t Val) {
|
2014-06-10 08:32:29 +08:00
|
|
|
if (Val == 0)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-06-10 08:32:29 +08:00
|
|
|
|
|
|
|
if (isInt<32>(Val))
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Basic;
|
2014-06-10 08:32:29 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return 2 * TTI::TCC_Basic;
|
2014-06-10 08:32:29 +08:00
|
|
|
}
|
|
|
|
|
2015-08-06 02:08:10 +08:00
|
|
|
int X86TTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) {
|
2014-01-25 10:02:55 +08:00
|
|
|
assert(Ty->isIntegerTy());
|
|
|
|
|
|
|
|
unsigned BitSize = Ty->getPrimitiveSizeInBits();
|
|
|
|
if (BitSize == 0)
|
|
|
|
return ~0U;
|
|
|
|
|
2014-05-20 05:00:53 +08:00
|
|
|
// Never hoist constants larger than 128bit, because this might lead to
|
|
|
|
// incorrect code generation or assertions in codegen.
|
|
|
|
// Fixme: Create a cost model for types larger than i128 once the codegen
|
|
|
|
// issues have been fixed.
|
|
|
|
if (BitSize > 128)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-05-20 05:00:53 +08:00
|
|
|
|
2014-03-21 14:04:45 +08:00
|
|
|
if (Imm == 0)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-03-21 14:04:45 +08:00
|
|
|
|
2014-06-10 08:32:29 +08:00
|
|
|
// Sign-extend all constants to a multiple of 64-bit.
|
|
|
|
APInt ImmVal = Imm;
|
2018-07-29 02:21:45 +08:00
|
|
|
if (BitSize % 64 != 0)
|
|
|
|
ImmVal = Imm.sext(alignTo(BitSize, 64));
|
2014-06-10 08:32:29 +08:00
|
|
|
|
|
|
|
// Split the constant into 64-bit chunks and calculate the cost for each
|
|
|
|
// chunk.
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = 0;
|
2014-06-10 08:32:29 +08:00
|
|
|
for (unsigned ShiftVal = 0; ShiftVal < BitSize; ShiftVal += 64) {
|
|
|
|
APInt Tmp = ImmVal.ashr(ShiftVal).sextOrTrunc(64);
|
|
|
|
int64_t Val = Tmp.getSExtValue();
|
|
|
|
Cost += getIntImmCost(Val);
|
|
|
|
}
|
2016-04-06 03:27:39 +08:00
|
|
|
// We need at least one instruction to materialize the constant.
|
2015-08-06 02:08:10 +08:00
|
|
|
return std::max(1, Cost);
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
2019-12-12 03:54:58 +08:00
|
|
|
int X86TTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,
|
2015-08-06 02:08:10 +08:00
|
|
|
Type *Ty) {
|
2014-01-25 10:02:55 +08:00
|
|
|
assert(Ty->isIntegerTy());
|
|
|
|
|
|
|
|
unsigned BitSize = Ty->getPrimitiveSizeInBits();
|
2014-05-20 05:00:53 +08:00
|
|
|
// There is no cost model for constants with a bit size of 0. Return TCC_Free
|
|
|
|
// here, so that constant hoisting will ignore this constant.
|
2014-01-25 10:02:55 +08:00
|
|
|
if (BitSize == 0)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-01-25 10:02:55 +08:00
|
|
|
|
2014-03-21 14:04:45 +08:00
|
|
|
unsigned ImmIdx = ~0U;
|
2014-01-25 10:02:55 +08:00
|
|
|
switch (Opcode) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
default:
|
|
|
|
return TTI::TCC_Free;
|
2014-03-21 14:04:45 +08:00
|
|
|
case Instruction::GetElementPtr:
|
2014-04-03 05:45:36 +08:00
|
|
|
// Always hoist the base address of a GetElementPtr. This prevents the
|
|
|
|
// creation of new constants for every base constant that gets constant
|
|
|
|
// folded with the offset.
|
2014-03-26 02:01:25 +08:00
|
|
|
if (Idx == 0)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return 2 * TTI::TCC_Basic;
|
|
|
|
return TTI::TCC_Free;
|
2014-03-21 14:04:45 +08:00
|
|
|
case Instruction::Store:
|
|
|
|
ImmIdx = 0;
|
|
|
|
break;
|
2015-12-21 02:41:54 +08:00
|
|
|
case Instruction::ICmp:
|
|
|
|
// This is an imperfect hack to prevent constant hoisting of
|
|
|
|
// compares that might be trying to check if a 64-bit value fits in
|
|
|
|
// 32-bits. The backend can optimize these cases using a right shift by 32.
|
|
|
|
// Ideally we would check the compare predicate here. There also other
|
|
|
|
// similar immediates the backend can use shifts for.
|
|
|
|
if (Idx == 1 && Imm.getBitWidth() == 64) {
|
|
|
|
uint64_t ImmVal = Imm.getZExtValue();
|
|
|
|
if (ImmVal == 0x100000000ULL || ImmVal == 0xffffffff)
|
|
|
|
return TTI::TCC_Free;
|
|
|
|
}
|
|
|
|
ImmIdx = 1;
|
|
|
|
break;
|
2015-10-06 10:50:24 +08:00
|
|
|
case Instruction::And:
|
|
|
|
// We support 64-bit ANDs with immediates with 32-bits of leading zeroes
|
|
|
|
// by using a 32-bit operation with implicit zero extension. Detect such
|
|
|
|
// immediates here as the normal path expects bit 31 to be sign extended.
|
|
|
|
if (Idx == 1 && Imm.getBitWidth() == 64 && isUInt<32>(Imm.getZExtValue()))
|
|
|
|
return TTI::TCC_Free;
|
2018-07-31 01:29:57 +08:00
|
|
|
ImmIdx = 1;
|
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Instruction::Add:
|
|
|
|
case Instruction::Sub:
|
2018-07-31 01:29:57 +08:00
|
|
|
// For add/sub, we can use the opposite instruction for INT32_MIN.
|
|
|
|
if (Idx == 1 && Imm.getBitWidth() == 64 && Imm.getZExtValue() == 0x80000000)
|
|
|
|
return TTI::TCC_Free;
|
|
|
|
ImmIdx = 1;
|
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Instruction::UDiv:
|
|
|
|
case Instruction::SDiv:
|
|
|
|
case Instruction::URem:
|
|
|
|
case Instruction::SRem:
|
2018-10-12 07:14:35 +08:00
|
|
|
// Division by constant is typically expanded later into a different
|
|
|
|
// instruction sequence. This completely changes the constants.
|
|
|
|
// Report them as "free" to stop ConstantHoist from marking them as opaque.
|
|
|
|
return TTI::TCC_Free;
|
|
|
|
case Instruction::Mul:
|
2014-01-25 10:02:55 +08:00
|
|
|
case Instruction::Or:
|
|
|
|
case Instruction::Xor:
|
2014-03-21 14:04:45 +08:00
|
|
|
ImmIdx = 1;
|
|
|
|
break;
|
2014-05-01 03:17:32 +08:00
|
|
|
// Always return TCC_Free for the shift value of a shift instruction.
|
|
|
|
case Instruction::Shl:
|
|
|
|
case Instruction::LShr:
|
|
|
|
case Instruction::AShr:
|
|
|
|
if (Idx == 1)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-05-01 03:17:32 +08:00
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Instruction::Trunc:
|
|
|
|
case Instruction::ZExt:
|
|
|
|
case Instruction::SExt:
|
|
|
|
case Instruction::IntToPtr:
|
|
|
|
case Instruction::PtrToInt:
|
|
|
|
case Instruction::BitCast:
|
2014-03-21 14:04:45 +08:00
|
|
|
case Instruction::PHI:
|
2014-01-25 10:02:55 +08:00
|
|
|
case Instruction::Call:
|
|
|
|
case Instruction::Select:
|
|
|
|
case Instruction::Ret:
|
|
|
|
case Instruction::Load:
|
2014-03-21 14:04:45 +08:00
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
2014-03-21 14:04:45 +08:00
|
|
|
|
2014-06-10 08:32:29 +08:00
|
|
|
if (Idx == ImmIdx) {
|
2018-07-29 02:21:45 +08:00
|
|
|
int NumConstants = divideCeil(BitSize, 64);
|
2015-08-06 02:08:10 +08:00
|
|
|
int Cost = X86TTIImpl::getIntImmCost(Imm, Ty);
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return (Cost <= NumConstants * TTI::TCC_Basic)
|
2015-08-06 02:08:10 +08:00
|
|
|
? static_cast<int>(TTI::TCC_Free)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
: Cost;
|
2014-06-10 08:32:29 +08:00
|
|
|
}
|
2014-03-21 14:04:45 +08:00
|
|
|
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return X86TTIImpl::getIntImmCost(Imm, Ty);
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
|
|
|
|
2019-12-12 03:54:58 +08:00
|
|
|
int X86TTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
|
|
|
|
const APInt &Imm, Type *Ty) {
|
2014-01-25 10:02:55 +08:00
|
|
|
assert(Ty->isIntegerTy());
|
|
|
|
|
|
|
|
unsigned BitSize = Ty->getPrimitiveSizeInBits();
|
2014-05-20 05:00:53 +08:00
|
|
|
// There is no cost model for constants with a bit size of 0. Return TCC_Free
|
|
|
|
// here, so that constant hoisting will ignore this constant.
|
2014-01-25 10:02:55 +08:00
|
|
|
if (BitSize == 0)
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-01-25 10:02:55 +08:00
|
|
|
|
|
|
|
switch (IID) {
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
default:
|
|
|
|
return TTI::TCC_Free;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Intrinsic::sadd_with_overflow:
|
|
|
|
case Intrinsic::uadd_with_overflow:
|
|
|
|
case Intrinsic::ssub_with_overflow:
|
|
|
|
case Intrinsic::usub_with_overflow:
|
|
|
|
case Intrinsic::smul_with_overflow:
|
|
|
|
case Intrinsic::umul_with_overflow:
|
2014-03-21 14:04:45 +08:00
|
|
|
if ((Idx == 1) && Imm.getBitWidth() <= 64 && isInt<32>(Imm.getSExtValue()))
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-03-26 02:01:23 +08:00
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Intrinsic::experimental_stackmap:
|
2014-03-26 02:01:23 +08:00
|
|
|
if ((Idx < 2) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue())))
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-03-26 02:01:23 +08:00
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
case Intrinsic::experimental_patchpoint_void:
|
|
|
|
case Intrinsic::experimental_patchpoint_i64:
|
2014-03-26 02:01:23 +08:00
|
|
|
if ((Idx < 4) || (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue())))
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return TTI::TCC_Free;
|
2014-03-26 02:01:23 +08:00
|
|
|
break;
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
[PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
2015-01-31 11:43:40 +08:00
|
|
|
return X86TTIImpl::getIntImmCost(Imm, Ty);
|
2014-01-25 10:02:55 +08:00
|
|
|
}
|
2015-07-14 12:03:49 +08:00
|
|
|
|
2017-08-20 20:34:29 +08:00
|
|
|
unsigned X86TTIImpl::getUserCost(const User *U,
|
|
|
|
ArrayRef<const Value *> Operands) {
|
|
|
|
if (isa<StoreInst>(U)) {
|
|
|
|
Value *Ptr = U->getOperand(1);
|
|
|
|
// Store instruction with index and scale costs 2 Uops.
|
|
|
|
// Check the preceding GEP to identify non-const indices.
|
|
|
|
if (auto GEP = dyn_cast<GetElementPtrInst>(Ptr)) {
|
|
|
|
if (!all_of(GEP->indices(), [](Value *V) { return isa<Constant>(V); }))
|
|
|
|
return TTI::TCC_Basic * 2;
|
|
|
|
}
|
|
|
|
return TTI::TCC_Basic;
|
|
|
|
}
|
|
|
|
return BaseT::getUserCost(U, Operands);
|
|
|
|
}
|
|
|
|
|
2015-12-29 04:10:59 +08:00
|
|
|
// Return an average cost of Gather / Scatter instruction, maybe improved later
|
|
|
|
int X86TTIImpl::getGSVectorCost(unsigned Opcode, Type *SrcVTy, Value *Ptr,
|
|
|
|
unsigned Alignment, unsigned AddressSpace) {
|
|
|
|
|
|
|
|
assert(isa<VectorType>(SrcVTy) && "Unexpected type in getGSVectorCost");
|
|
|
|
unsigned VF = SrcVTy->getVectorNumElements();
|
|
|
|
|
|
|
|
// Try to reduce index size from 64 bit (default for GEP)
|
|
|
|
// to 32. It is essential for VF 16. If the index can't be reduced to 32, the
|
|
|
|
// operation will use 16 x 64 indices which do not fit in a zmm and needs
|
|
|
|
// to split. Also check that the base pointer is the same for all lanes,
|
|
|
|
// and that there's at most one variable index.
|
|
|
|
auto getIndexSizeInBits = [](Value *Ptr, const DataLayout& DL) {
|
|
|
|
unsigned IndexSize = DL.getPointerSizeInBits();
|
|
|
|
GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr);
|
|
|
|
if (IndexSize < 64 || !GEP)
|
|
|
|
return IndexSize;
|
2016-05-24 16:17:50 +08:00
|
|
|
|
2015-12-29 04:10:59 +08:00
|
|
|
unsigned NumOfVarIndices = 0;
|
|
|
|
Value *Ptrs = GEP->getPointerOperand();
|
|
|
|
if (Ptrs->getType()->isVectorTy() && !getSplatValue(Ptrs))
|
|
|
|
return IndexSize;
|
|
|
|
for (unsigned i = 1; i < GEP->getNumOperands(); ++i) {
|
|
|
|
if (isa<Constant>(GEP->getOperand(i)))
|
|
|
|
continue;
|
|
|
|
Type *IndxTy = GEP->getOperand(i)->getType();
|
|
|
|
if (IndxTy->isVectorTy())
|
|
|
|
IndxTy = IndxTy->getVectorElementType();
|
|
|
|
if ((IndxTy->getPrimitiveSizeInBits() == 64 &&
|
|
|
|
!isa<SExtInst>(GEP->getOperand(i))) ||
|
|
|
|
++NumOfVarIndices > 1)
|
|
|
|
return IndexSize; // 64
|
|
|
|
}
|
|
|
|
return (unsigned)32;
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
// Trying to reduce IndexSize to 32 bits for vector 16.
|
|
|
|
// By default the IndexSize is equal to pointer size.
|
2017-11-20 16:18:12 +08:00
|
|
|
unsigned IndexSize = (ST->hasAVX512() && VF >= 16)
|
|
|
|
? getIndexSizeInBits(Ptr, DL)
|
|
|
|
: DL.getPointerSizeInBits();
|
2015-12-29 04:10:59 +08:00
|
|
|
|
2016-04-14 12:36:40 +08:00
|
|
|
Type *IndexVTy = VectorType::get(IntegerType::get(SrcVTy->getContext(),
|
2015-12-29 04:10:59 +08:00
|
|
|
IndexSize), VF);
|
|
|
|
std::pair<int, MVT> IdxsLT = TLI->getTypeLegalizationCost(DL, IndexVTy);
|
|
|
|
std::pair<int, MVT> SrcLT = TLI->getTypeLegalizationCost(DL, SrcVTy);
|
|
|
|
int SplitFactor = std::max(IdxsLT.first, SrcLT.first);
|
|
|
|
if (SplitFactor > 1) {
|
|
|
|
// Handle splitting of vector of pointers
|
|
|
|
Type *SplitSrcTy = VectorType::get(SrcVTy->getScalarType(), VF / SplitFactor);
|
|
|
|
return SplitFactor * getGSVectorCost(Opcode, SplitSrcTy, Ptr, Alignment,
|
|
|
|
AddressSpace);
|
|
|
|
}
|
|
|
|
|
|
|
|
// The gather / scatter cost is given by Intel architects. It is a rough
|
|
|
|
// number since we are looking at one instruction in a time.
|
2017-11-20 16:18:12 +08:00
|
|
|
const int GSOverhead = (Opcode == Instruction::Load)
|
|
|
|
? ST->getGatherOverhead()
|
|
|
|
: ST->getScatterOverhead();
|
2015-12-29 04:10:59 +08:00
|
|
|
return GSOverhead + VF * getMemoryOpCost(Opcode, SrcVTy->getScalarType(),
|
2019-10-22 23:16:52 +08:00
|
|
|
MaybeAlign(Alignment), AddressSpace);
|
2015-12-29 04:10:59 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/// Return the cost of full scalarization of gather / scatter operation.
|
|
|
|
///
|
|
|
|
/// Opcode - Load or Store instruction.
|
|
|
|
/// SrcVTy - The type of the data vector that should be gathered or scattered.
|
|
|
|
/// VariableMask - The mask is non-constant at compile time.
|
|
|
|
/// Alignment - Alignment for one element.
|
|
|
|
/// AddressSpace - pointer[s] address space.
|
|
|
|
///
|
|
|
|
int X86TTIImpl::getGSScalarCost(unsigned Opcode, Type *SrcVTy,
|
|
|
|
bool VariableMask, unsigned Alignment,
|
|
|
|
unsigned AddressSpace) {
|
|
|
|
unsigned VF = SrcVTy->getVectorNumElements();
|
|
|
|
|
|
|
|
int MaskUnpackCost = 0;
|
|
|
|
if (VariableMask) {
|
|
|
|
VectorType *MaskTy =
|
2016-04-14 12:36:40 +08:00
|
|
|
VectorType::get(Type::getInt1Ty(SrcVTy->getContext()), VF);
|
2015-12-29 04:10:59 +08:00
|
|
|
MaskUnpackCost = getScalarizationOverhead(MaskTy, false, true);
|
|
|
|
int ScalarCompareCost =
|
2016-04-14 12:36:40 +08:00
|
|
|
getCmpSelInstrCost(Instruction::ICmp, Type::getInt1Ty(SrcVTy->getContext()),
|
2015-12-29 04:10:59 +08:00
|
|
|
nullptr);
|
|
|
|
int BranchCost = getCFInstrCost(Instruction::Br);
|
|
|
|
MaskUnpackCost += VF * (BranchCost + ScalarCompareCost);
|
|
|
|
}
|
|
|
|
|
|
|
|
// The cost of the scalar loads/stores.
|
|
|
|
int MemoryOpCost = VF * getMemoryOpCost(Opcode, SrcVTy->getScalarType(),
|
2019-10-22 23:16:52 +08:00
|
|
|
MaybeAlign(Alignment), AddressSpace);
|
2015-12-29 04:10:59 +08:00
|
|
|
|
|
|
|
int InsertExtractCost = 0;
|
|
|
|
if (Opcode == Instruction::Load)
|
|
|
|
for (unsigned i = 0; i < VF; ++i)
|
|
|
|
// Add the cost of inserting each scalar load into the vector
|
|
|
|
InsertExtractCost +=
|
|
|
|
getVectorInstrCost(Instruction::InsertElement, SrcVTy, i);
|
|
|
|
else
|
|
|
|
for (unsigned i = 0; i < VF; ++i)
|
|
|
|
// Add the cost of extracting each element out of the data vector
|
|
|
|
InsertExtractCost +=
|
|
|
|
getVectorInstrCost(Instruction::ExtractElement, SrcVTy, i);
|
|
|
|
|
|
|
|
return MemoryOpCost + MaskUnpackCost + InsertExtractCost;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Calculate the cost of Gather / Scatter operation
|
|
|
|
int X86TTIImpl::getGatherScatterOpCost(unsigned Opcode, Type *SrcVTy,
|
|
|
|
Value *Ptr, bool VariableMask,
|
|
|
|
unsigned Alignment) {
|
|
|
|
assert(SrcVTy->isVectorTy() && "Unexpected data type for Gather/Scatter");
|
|
|
|
unsigned VF = SrcVTy->getVectorNumElements();
|
|
|
|
PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
|
|
|
|
if (!PtrTy && Ptr->getType()->isVectorTy())
|
|
|
|
PtrTy = dyn_cast<PointerType>(Ptr->getType()->getVectorElementType());
|
|
|
|
assert(PtrTy && "Unexpected type for Ptr argument");
|
|
|
|
unsigned AddressSpace = PtrTy->getAddressSpace();
|
|
|
|
|
|
|
|
bool Scalarize = false;
|
2019-12-18 16:42:53 +08:00
|
|
|
if ((Opcode == Instruction::Load &&
|
|
|
|
!isLegalMaskedGather(SrcVTy, MaybeAlign(Alignment))) ||
|
|
|
|
(Opcode == Instruction::Store &&
|
|
|
|
!isLegalMaskedScatter(SrcVTy, MaybeAlign(Alignment))))
|
2015-12-29 04:10:59 +08:00
|
|
|
Scalarize = true;
|
|
|
|
// Gather / Scatter for vector 2 is not profitable on KNL / SKX
|
|
|
|
// Vector-4 of gather/scatter instruction does not exist on KNL.
|
|
|
|
// We can extend it to 8 elements, but zeroing upper bits of
|
|
|
|
// the mask vector will add more instructions. Right now we give the scalar
|
2017-01-02 18:37:52 +08:00
|
|
|
// cost of vector-4 for KNL. TODO: Check, maybe the gather/scatter instruction
|
|
|
|
// is better in the VariableMask case.
|
2017-11-20 16:18:12 +08:00
|
|
|
if (ST->hasAVX512() && (VF == 2 || (VF == 4 && !ST->hasVLX())))
|
2015-12-29 04:10:59 +08:00
|
|
|
Scalarize = true;
|
|
|
|
|
|
|
|
if (Scalarize)
|
2017-01-02 18:37:52 +08:00
|
|
|
return getGSScalarCost(Opcode, SrcVTy, VariableMask, Alignment,
|
|
|
|
AddressSpace);
|
2015-12-29 04:10:59 +08:00
|
|
|
|
|
|
|
return getGSVectorCost(Opcode, SrcVTy, Ptr, Alignment, AddressSpace);
|
|
|
|
}
|
|
|
|
|
2017-08-08 03:56:34 +08:00
|
|
|
bool X86TTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1,
|
|
|
|
TargetTransformInfo::LSRCost &C2) {
|
|
|
|
// X86 specific here are "instruction number 1st priority".
|
|
|
|
return std::tie(C1.Insns, C1.NumRegs, C1.AddRecCost,
|
|
|
|
C1.NumIVMuls, C1.NumBaseAdds,
|
|
|
|
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
|
|
|
|
std::tie(C2.Insns, C2.NumRegs, C2.AddRecCost,
|
|
|
|
C2.NumIVMuls, C2.NumBaseAdds,
|
|
|
|
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
|
|
|
|
}
|
|
|
|
|
2018-02-06 07:43:05 +08:00
|
|
|
bool X86TTIImpl::canMacroFuseCmp() {
|
2019-03-28 22:12:46 +08:00
|
|
|
return ST->hasMacroFusion() || ST->hasBranchFusion();
|
2018-02-06 07:43:05 +08:00
|
|
|
}
|
|
|
|
|
2019-10-14 18:00:21 +08:00
|
|
|
bool X86TTIImpl::isLegalMaskedLoad(Type *DataTy, MaybeAlign Alignment) {
|
2019-03-08 15:33:43 +08:00
|
|
|
if (!ST->hasAVX())
|
|
|
|
return false;
|
|
|
|
|
2017-11-16 14:02:05 +08:00
|
|
|
// The backend can't handle a single element vector.
|
|
|
|
if (isa<VectorType>(DataTy) && DataTy->getVectorNumElements() == 1)
|
|
|
|
return false;
|
2015-10-19 15:43:38 +08:00
|
|
|
Type *ScalarTy = DataTy->getScalarType();
|
2015-07-14 12:03:49 +08:00
|
|
|
|
2019-03-08 15:33:43 +08:00
|
|
|
if (ScalarTy->isPointerTy())
|
|
|
|
return true;
|
|
|
|
|
|
|
|
if (ScalarTy->isFloatTy() || ScalarTy->isDoubleTy())
|
|
|
|
return true;
|
|
|
|
|
|
|
|
if (!ScalarTy->isIntegerTy())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
unsigned IntWidth = ScalarTy->getIntegerBitWidth();
|
|
|
|
return IntWidth == 32 || IntWidth == 64 ||
|
|
|
|
((IntWidth == 8 || IntWidth == 16) && ST->hasBWI());
|
2015-07-14 12:03:49 +08:00
|
|
|
}
|
2014-12-04 17:40:44 +08:00
|
|
|
|
2019-10-14 18:00:21 +08:00
|
|
|
bool X86TTIImpl::isLegalMaskedStore(Type *DataType, MaybeAlign Alignment) {
|
|
|
|
return isLegalMaskedLoad(DataType, Alignment);
|
2014-12-04 17:40:44 +08:00
|
|
|
}
|
|
|
|
|
2019-09-27 20:54:21 +08:00
|
|
|
bool X86TTIImpl::isLegalNTLoad(Type *DataType, Align Alignment) {
|
2019-06-18 01:20:08 +08:00
|
|
|
unsigned DataSize = DL.getTypeStoreSize(DataType);
|
|
|
|
// The only supported nontemporal loads are for aligned vectors of 16 or 32
|
|
|
|
// bytes. Note that 32-byte nontemporal vector loads are supported by AVX2
|
|
|
|
// (the equivalent stores only require AVX).
|
|
|
|
if (Alignment >= DataSize && (DataSize == 16 || DataSize == 32))
|
|
|
|
return DataSize == 16 ? ST->hasSSE1() : ST->hasAVX2();
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-09-27 20:54:21 +08:00
|
|
|
bool X86TTIImpl::isLegalNTStore(Type *DataType, Align Alignment) {
|
2019-06-18 01:20:08 +08:00
|
|
|
unsigned DataSize = DL.getTypeStoreSize(DataType);
|
|
|
|
|
|
|
|
// SSE4A supports nontemporal stores of float and double at arbitrary
|
|
|
|
// alignment.
|
|
|
|
if (ST->hasSSE4A() && (DataType->isFloatTy() || DataType->isDoubleTy()))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
// Besides the SSE4A subtarget exception above, only aligned stores are
|
|
|
|
// available nontemporaly on any other subtarget. And only stores with a size
|
|
|
|
// of 4..32 bytes (powers of 2, only) are permitted.
|
|
|
|
if (Alignment < DataSize || DataSize < 4 || DataSize > 32 ||
|
|
|
|
!isPowerOf2_32(DataSize))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// 32-byte vector nontemporal stores are supported by AVX (the equivalent
|
|
|
|
// loads require AVX2).
|
|
|
|
if (DataSize == 32)
|
|
|
|
return ST->hasAVX();
|
|
|
|
else if (DataSize == 16)
|
|
|
|
return ST->hasSSE1();
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2019-03-22 01:38:52 +08:00
|
|
|
bool X86TTIImpl::isLegalMaskedExpandLoad(Type *DataTy) {
|
|
|
|
if (!isa<VectorType>(DataTy))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (!ST->hasAVX512())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// The backend can't handle a single element vector.
|
|
|
|
if (DataTy->getVectorNumElements() == 1)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
Type *ScalarTy = DataTy->getVectorElementType();
|
|
|
|
|
|
|
|
if (ScalarTy->isFloatTy() || ScalarTy->isDoubleTy())
|
|
|
|
return true;
|
|
|
|
|
|
|
|
if (!ScalarTy->isIntegerTy())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
unsigned IntWidth = ScalarTy->getIntegerBitWidth();
|
|
|
|
return IntWidth == 32 || IntWidth == 64 ||
|
|
|
|
((IntWidth == 8 || IntWidth == 16) && ST->hasVBMI2());
|
|
|
|
}
|
|
|
|
|
|
|
|
bool X86TTIImpl::isLegalMaskedCompressStore(Type *DataTy) {
|
|
|
|
return isLegalMaskedExpandLoad(DataTy);
|
|
|
|
}
|
|
|
|
|
2019-12-18 16:42:53 +08:00
|
|
|
bool X86TTIImpl::isLegalMaskedGather(Type *DataTy, MaybeAlign Alignment) {
|
2019-03-08 15:33:43 +08:00
|
|
|
// Some CPUs have better gather performance than others.
|
|
|
|
// TODO: Remove the explicit ST->hasAVX512()?, That would mean we would only
|
|
|
|
// enable gather with a -march.
|
|
|
|
if (!(ST->hasAVX512() || (ST->hasFastGather() && ST->hasAVX2())))
|
|
|
|
return false;
|
|
|
|
|
2015-10-25 23:37:55 +08:00
|
|
|
// This function is called now in two cases: from the Loop Vectorizer
|
|
|
|
// and from the Scalarizer.
|
|
|
|
// When the Loop Vectorizer asks about legality of the feature,
|
|
|
|
// the vectorization factor is not calculated yet. The Loop Vectorizer
|
|
|
|
// sends a scalar type and the decision is based on the width of the
|
|
|
|
// scalar element.
|
|
|
|
// Later on, the cost model will estimate usage this intrinsic based on
|
|
|
|
// the vector type.
|
|
|
|
// The Scalarizer asks again about legality. It sends a vector type.
|
|
|
|
// In this case we can reject non-power-of-2 vectors.
|
2017-11-16 14:02:05 +08:00
|
|
|
// We also reject single element vectors as the type legalizer can't
|
|
|
|
// scalarize it.
|
|
|
|
if (isa<VectorType>(DataTy)) {
|
|
|
|
unsigned NumElts = DataTy->getVectorNumElements();
|
|
|
|
if (NumElts == 1 || !isPowerOf2_32(NumElts))
|
|
|
|
return false;
|
|
|
|
}
|
2015-10-25 23:37:55 +08:00
|
|
|
Type *ScalarTy = DataTy->getScalarType();
|
2019-03-08 15:33:43 +08:00
|
|
|
if (ScalarTy->isPointerTy())
|
|
|
|
return true;
|
2015-10-25 23:37:55 +08:00
|
|
|
|
2019-03-08 15:33:43 +08:00
|
|
|
if (ScalarTy->isFloatTy() || ScalarTy->isDoubleTy())
|
|
|
|
return true;
|
|
|
|
|
|
|
|
if (!ScalarTy->isIntegerTy())
|
|
|
|
return false;
|
|
|
|
|
|
|
|
unsigned IntWidth = ScalarTy->getIntegerBitWidth();
|
|
|
|
return IntWidth == 32 || IntWidth == 64;
|
2015-10-25 23:37:55 +08:00
|
|
|
}
|
|
|
|
|
2019-12-18 16:42:53 +08:00
|
|
|
bool X86TTIImpl::isLegalMaskedScatter(Type *DataType, MaybeAlign Alignment) {
|
2017-11-20 16:18:12 +08:00
|
|
|
// AVX2 doesn't support scatter
|
|
|
|
if (!ST->hasAVX512())
|
|
|
|
return false;
|
2019-12-18 16:42:53 +08:00
|
|
|
return isLegalMaskedGather(DataType, Alignment);
|
2015-10-25 23:37:55 +08:00
|
|
|
}
|
|
|
|
|
2017-09-09 21:38:18 +08:00
|
|
|
bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) {
|
|
|
|
EVT VT = TLI->getValueType(DL, DataType);
|
|
|
|
return TLI->isOperationLegal(IsSigned ? ISD::SDIVREM : ISD::UDIVREM, VT);
|
|
|
|
}
|
|
|
|
|
2017-11-28 05:15:43 +08:00
|
|
|
bool X86TTIImpl::isFCmpOrdCheaperThanFCmpZero(Type *Ty) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-07-30 06:09:48 +08:00
|
|
|
bool X86TTIImpl::areInlineCompatible(const Function *Caller,
|
|
|
|
const Function *Callee) const {
|
2015-07-02 09:11:50 +08:00
|
|
|
const TargetMachine &TM = getTLI()->getTargetMachine();
|
|
|
|
|
|
|
|
// Work this as a subsetting of subtarget features.
|
|
|
|
const FeatureBitset &CallerBits =
|
|
|
|
TM.getSubtargetImpl(*Caller)->getFeatureBits();
|
|
|
|
const FeatureBitset &CalleeBits =
|
|
|
|
TM.getSubtargetImpl(*Callee)->getFeatureBits();
|
|
|
|
|
2019-02-20 01:05:11 +08:00
|
|
|
FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;
|
|
|
|
FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;
|
|
|
|
return (RealCallerBits & RealCalleeBits) == RealCalleeBits;
|
2015-07-02 09:11:50 +08:00
|
|
|
}
|
2016-10-21 05:04:31 +08:00
|
|
|
|
2019-02-20 04:12:20 +08:00
|
|
|
bool X86TTIImpl::areFunctionArgsABICompatible(
|
|
|
|
const Function *Caller, const Function *Callee,
|
|
|
|
SmallPtrSetImpl<Argument *> &Args) const {
|
|
|
|
if (!BaseT::areFunctionArgsABICompatible(Caller, Callee, Args))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// If we get here, we know the target features match. If one function
|
|
|
|
// considers 512-bit vectors legal and the other does not, consider them
|
|
|
|
// incompatible.
|
|
|
|
// FIXME Look at the arguments and only consider 512 bit or larger vectors?
|
|
|
|
const TargetMachine &TM = getTLI()->getTargetMachine();
|
|
|
|
|
|
|
|
return TM.getSubtarget<X86Subtarget>(*Caller).useAVX512Regs() ==
|
|
|
|
TM.getSubtarget<X86Subtarget>(*Callee).useAVX512Regs();
|
|
|
|
}
|
|
|
|
|
2019-06-25 16:04:13 +08:00
|
|
|
X86TTIImpl::TTI::MemCmpExpansionOptions
|
|
|
|
X86TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
|
|
|
|
TTI::MemCmpExpansionOptions Options;
|
|
|
|
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
|
|
|
|
Options.NumLoadsPerBlock = 2;
|
|
|
|
if (IsZeroCmp) {
|
|
|
|
// Only enable vector loads for equality comparison. Right now the vector
|
|
|
|
// version is not as fast for three way compare (see #33329).
|
2019-06-26 15:06:49 +08:00
|
|
|
const unsigned PreferredWidth = ST->getPreferVectorWidth();
|
2019-10-04 15:42:34 +08:00
|
|
|
if (PreferredWidth >= 512 && ST->hasAVX512()) Options.LoadSizes.push_back(64);
|
2019-10-31 18:30:53 +08:00
|
|
|
if (PreferredWidth >= 256 && ST->hasAVX()) Options.LoadSizes.push_back(32);
|
2019-06-26 15:06:49 +08:00
|
|
|
if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);
|
2019-10-31 18:30:53 +08:00
|
|
|
// All GPR and vector loads can be unaligned.
|
2018-12-20 21:01:04 +08:00
|
|
|
Options.AllowOverlappingLoads = true;
|
2019-06-25 16:04:13 +08:00
|
|
|
}
|
|
|
|
if (ST->is64Bit()) {
|
|
|
|
Options.LoadSizes.push_back(8);
|
|
|
|
}
|
|
|
|
Options.LoadSizes.push_back(4);
|
|
|
|
Options.LoadSizes.push_back(2);
|
|
|
|
Options.LoadSizes.push_back(1);
|
|
|
|
return Options;
|
2017-06-20 23:58:30 +08:00
|
|
|
}
|
|
|
|
|
2016-10-21 05:04:31 +08:00
|
|
|
bool X86TTIImpl::enableInterleavedAccessVectorization() {
|
|
|
|
// TODO: We expect this to be beneficial regardless of arch,
|
|
|
|
// but there are currently some unexplained performance artifacts on Atom.
|
|
|
|
// As a temporary solution, disable on Atom.
|
2017-01-25 17:14:48 +08:00
|
|
|
return !(ST->isAtom());
|
2016-10-21 05:04:31 +08:00
|
|
|
}
|
2017-01-02 18:37:52 +08:00
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
// Get estimation for interleaved load/store operations for AVX2.
|
|
|
|
// \p Factor is the interleaved-access factor (stride) - number of
|
|
|
|
// (interleaved) elements in the group.
|
|
|
|
// \p Indices contains the indices for a strided load: when the
|
|
|
|
// interleaved load has gaps they indicate which elements are used.
|
|
|
|
// If Indices is empty (or if the number of indices is equal to the size
|
|
|
|
// of the interleaved-access as given in \p Factor) the access has no gaps.
|
|
|
|
//
|
|
|
|
// As opposed to AVX-512, AVX2 does not have generic shuffles that allow
|
|
|
|
// computing the cost using a generic formula as a function of generic
|
|
|
|
// shuffles. We therefore use a lookup table instead, filled according to
|
|
|
|
// the instruction sequences that codegen currently generates.
|
|
|
|
int X86TTIImpl::getInterleavedMemoryOpCostAVX2(unsigned Opcode, Type *VecTy,
|
|
|
|
unsigned Factor,
|
|
|
|
ArrayRef<unsigned> Indices,
|
|
|
|
unsigned Alignment,
|
2018-10-14 16:50:06 +08:00
|
|
|
unsigned AddressSpace,
|
2018-10-31 17:57:56 +08:00
|
|
|
bool UseMaskForCond,
|
|
|
|
bool UseMaskForGaps) {
|
2018-10-14 16:50:06 +08:00
|
|
|
|
2018-10-31 17:57:56 +08:00
|
|
|
if (UseMaskForCond || UseMaskForGaps)
|
2018-10-14 16:50:06 +08:00
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2017-06-25 16:26:25 +08:00
|
|
|
|
|
|
|
// We currently Support only fully-interleaved groups, with no gaps.
|
|
|
|
// TODO: Support also strided loads (interleaved-groups with gaps).
|
|
|
|
if (Indices.size() && Indices.size() != Factor)
|
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
|
|
|
Alignment, AddressSpace);
|
|
|
|
|
|
|
|
// VecTy for interleave memop is <VF*Factor x Elt>.
|
|
|
|
// So, for VF=4, Interleave Factor = 3, Element type = i32 we have
|
|
|
|
// VecTy = <12 x i32>.
|
|
|
|
MVT LegalVT = getTLI()->getTypeLegalizationCost(DL, VecTy).second;
|
|
|
|
|
|
|
|
// This function can be called with VecTy=<6xi128>, Factor=3, in which case
|
|
|
|
// the VF=2, while v2i128 is an unsupported MVT vector type
|
|
|
|
// (see MachineValueType.h::getVectorVT()).
|
|
|
|
if (!LegalVT.isVector())
|
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
|
|
|
Alignment, AddressSpace);
|
|
|
|
|
|
|
|
unsigned VF = VecTy->getVectorNumElements() / Factor;
|
|
|
|
Type *ScalarTy = VecTy->getVectorElementType();
|
2017-08-01 01:09:27 +08:00
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
// Calculate the number of memory operations (NumOfMemOps), required
|
|
|
|
// for load/store the VecTy.
|
|
|
|
unsigned VecTySize = DL.getTypeStoreSize(VecTy);
|
|
|
|
unsigned LegalVTSize = LegalVT.getStoreSize();
|
|
|
|
unsigned NumOfMemOps = (VecTySize + LegalVTSize - 1) / LegalVTSize;
|
|
|
|
|
|
|
|
// Get the cost of one memory operation.
|
|
|
|
Type *SingleMemOpTy = VectorType::get(VecTy->getVectorElementType(),
|
|
|
|
LegalVT.getVectorNumElements());
|
2019-10-22 23:16:52 +08:00
|
|
|
unsigned MemOpCost = getMemoryOpCost(Opcode, SingleMemOpTy,
|
|
|
|
MaybeAlign(Alignment), AddressSpace);
|
2017-08-01 01:09:27 +08:00
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
VectorType *VT = VectorType::get(ScalarTy, VF);
|
|
|
|
EVT ETy = TLI->getValueType(DL, VT);
|
|
|
|
if (!ETy.isSimple())
|
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
|
|
|
Alignment, AddressSpace);
|
|
|
|
|
|
|
|
// TODO: Complete for other data-types and strides.
|
|
|
|
// Each combination of Stride, ElementTy and VF results in a different
|
|
|
|
// sequence; The cost tables are therefore accessed with:
|
|
|
|
// Factor (stride) and VectorType=VFxElemType.
|
|
|
|
// The Cost accounts only for the shuffle sequence;
|
|
|
|
// The cost of the loads/stores is accounted for separately.
|
|
|
|
//
|
|
|
|
static const CostTblEntry AVX2InterleavedLoadTbl[] = {
|
2017-11-16 17:38:32 +08:00
|
|
|
{ 2, MVT::v4i64, 6 }, //(load 8i64 and) deinterleave into 2 x 4i64
|
|
|
|
{ 2, MVT::v4f64, 6 }, //(load 8f64 and) deinterleave into 2 x 4f64
|
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
{ 3, MVT::v2i8, 10 }, //(load 6i8 and) deinterleave into 3 x 2i8
|
|
|
|
{ 3, MVT::v4i8, 4 }, //(load 12i8 and) deinterleave into 3 x 4i8
|
|
|
|
{ 3, MVT::v8i8, 9 }, //(load 24i8 and) deinterleave into 3 x 8i8
|
2017-10-18 19:41:55 +08:00
|
|
|
{ 3, MVT::v16i8, 11}, //(load 48i8 and) deinterleave into 3 x 16i8
|
|
|
|
{ 3, MVT::v32i8, 13}, //(load 96i8 and) deinterleave into 3 x 32i8
|
2017-11-06 18:56:20 +08:00
|
|
|
{ 3, MVT::v8f32, 17 }, //(load 24f32 and)deinterleave into 3 x 8f32
|
2017-08-01 01:09:27 +08:00
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
{ 4, MVT::v2i8, 12 }, //(load 8i8 and) deinterleave into 4 x 2i8
|
|
|
|
{ 4, MVT::v4i8, 4 }, //(load 16i8 and) deinterleave into 4 x 4i8
|
|
|
|
{ 4, MVT::v8i8, 20 }, //(load 32i8 and) deinterleave into 4 x 8i8
|
|
|
|
{ 4, MVT::v16i8, 39 }, //(load 64i8 and) deinterleave into 4 x 16i8
|
2017-11-06 18:56:20 +08:00
|
|
|
{ 4, MVT::v32i8, 80 }, //(load 128i8 and) deinterleave into 4 x 32i8
|
|
|
|
|
|
|
|
{ 8, MVT::v8f32, 40 } //(load 64f32 and)deinterleave into 8 x 8f32
|
2017-06-25 16:26:25 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const CostTblEntry AVX2InterleavedStoreTbl[] = {
|
2017-11-16 17:38:32 +08:00
|
|
|
{ 2, MVT::v4i64, 6 }, //interleave into 2 x 4i64 into 8i64 (and store)
|
|
|
|
{ 2, MVT::v4f64, 6 }, //interleave into 2 x 4f64 into 8f64 (and store)
|
|
|
|
|
2017-06-25 16:26:25 +08:00
|
|
|
{ 3, MVT::v2i8, 7 }, //interleave 3 x 2i8 into 6i8 (and store)
|
|
|
|
{ 3, MVT::v4i8, 8 }, //interleave 3 x 4i8 into 12i8 (and store)
|
|
|
|
{ 3, MVT::v8i8, 11 }, //interleave 3 x 8i8 into 24i8 (and store)
|
2017-10-18 19:41:55 +08:00
|
|
|
{ 3, MVT::v16i8, 11 }, //interleave 3 x 16i8 into 48i8 (and store)
|
|
|
|
{ 3, MVT::v32i8, 13 }, //interleave 3 x 32i8 into 96i8 (and store)
|
2017-06-25 16:26:25 +08:00
|
|
|
|
|
|
|
{ 4, MVT::v2i8, 12 }, //interleave 4 x 2i8 into 8i8 (and store)
|
|
|
|
{ 4, MVT::v4i8, 9 }, //interleave 4 x 4i8 into 16i8 (and store)
|
2017-10-18 19:41:55 +08:00
|
|
|
{ 4, MVT::v8i8, 10 }, //interleave 4 x 8i8 into 32i8 (and store)
|
|
|
|
{ 4, MVT::v16i8, 10 }, //interleave 4 x 16i8 into 64i8 (and store)
|
|
|
|
{ 4, MVT::v32i8, 12 } //interleave 4 x 32i8 into 128i8 (and store)
|
2017-06-25 16:26:25 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
if (Opcode == Instruction::Load) {
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX2InterleavedLoadTbl, Factor, ETy.getSimpleVT()))
|
|
|
|
return NumOfMemOps * MemOpCost + Entry->Cost;
|
|
|
|
} else {
|
|
|
|
assert(Opcode == Instruction::Store &&
|
|
|
|
"Expected Store Instruction at this point");
|
2017-08-01 01:09:27 +08:00
|
|
|
if (const auto *Entry =
|
2017-06-25 16:26:25 +08:00
|
|
|
CostTableLookup(AVX2InterleavedStoreTbl, Factor, ETy.getSimpleVT()))
|
|
|
|
return NumOfMemOps * MemOpCost + Entry->Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
|
|
|
Alignment, AddressSpace);
|
|
|
|
}
|
|
|
|
|
2017-01-02 18:37:52 +08:00
|
|
|
// Get estimation for interleaved load/store operations and strided load.
|
|
|
|
// \p Indices contains indices for strided load.
|
|
|
|
// \p Factor - the factor of interleaving.
|
|
|
|
// AVX-512 provides 3-src shuffles that significantly reduces the cost.
|
|
|
|
int X86TTIImpl::getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,
|
|
|
|
unsigned Factor,
|
|
|
|
ArrayRef<unsigned> Indices,
|
|
|
|
unsigned Alignment,
|
2018-10-14 16:50:06 +08:00
|
|
|
unsigned AddressSpace,
|
2018-10-31 17:57:56 +08:00
|
|
|
bool UseMaskForCond,
|
|
|
|
bool UseMaskForGaps) {
|
2018-10-14 16:50:06 +08:00
|
|
|
|
2018-10-31 17:57:56 +08:00
|
|
|
if (UseMaskForCond || UseMaskForGaps)
|
2018-10-14 16:50:06 +08:00
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2017-01-02 18:37:52 +08:00
|
|
|
|
|
|
|
// VecTy for interleave memop is <VF*Factor x Elt>.
|
|
|
|
// So, for VF=4, Interleave Factor = 3, Element type = i32 we have
|
|
|
|
// VecTy = <12 x i32>.
|
|
|
|
|
|
|
|
// Calculate the number of memory operations (NumOfMemOps), required
|
|
|
|
// for load/store the VecTy.
|
|
|
|
MVT LegalVT = getTLI()->getTypeLegalizationCost(DL, VecTy).second;
|
|
|
|
unsigned VecTySize = DL.getTypeStoreSize(VecTy);
|
|
|
|
unsigned LegalVTSize = LegalVT.getStoreSize();
|
|
|
|
unsigned NumOfMemOps = (VecTySize + LegalVTSize - 1) / LegalVTSize;
|
|
|
|
|
|
|
|
// Get the cost of one memory operation.
|
|
|
|
Type *SingleMemOpTy = VectorType::get(VecTy->getVectorElementType(),
|
|
|
|
LegalVT.getVectorNumElements());
|
2019-10-22 23:16:52 +08:00
|
|
|
unsigned MemOpCost = getMemoryOpCost(Opcode, SingleMemOpTy,
|
|
|
|
MaybeAlign(Alignment), AddressSpace);
|
2017-01-02 18:37:52 +08:00
|
|
|
|
2017-10-18 19:41:55 +08:00
|
|
|
unsigned VF = VecTy->getVectorNumElements() / Factor;
|
|
|
|
MVT VT = MVT::getVectorVT(MVT::getVT(VecTy->getScalarType()), VF);
|
|
|
|
|
2017-01-02 18:37:52 +08:00
|
|
|
if (Opcode == Instruction::Load) {
|
2017-10-18 19:41:55 +08:00
|
|
|
// The tables (AVX512InterleavedLoadTbl and AVX512InterleavedStoreTbl)
|
|
|
|
// contain the cost of the optimized shuffle sequence that the
|
|
|
|
// X86InterleavedAccess pass will generate.
|
|
|
|
// The cost of loads and stores are computed separately from the table.
|
|
|
|
|
|
|
|
// X86InterleavedAccess support only the following interleaved-access group.
|
|
|
|
static const CostTblEntry AVX512InterleavedLoadTbl[] = {
|
|
|
|
{3, MVT::v16i8, 12}, //(load 48i8 and) deinterleave into 3 x 16i8
|
|
|
|
{3, MVT::v32i8, 14}, //(load 96i8 and) deinterleave into 3 x 32i8
|
|
|
|
{3, MVT::v64i8, 22}, //(load 96i8 and) deinterleave into 3 x 32i8
|
|
|
|
};
|
|
|
|
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512InterleavedLoadTbl, Factor, VT))
|
|
|
|
return NumOfMemOps * MemOpCost + Entry->Cost;
|
|
|
|
//If an entry does not exist, fallback to the default implementation.
|
|
|
|
|
2017-01-02 18:37:52 +08:00
|
|
|
// Kind of shuffle depends on number of loaded values.
|
|
|
|
// If we load the entire data in one register, we can use a 1-src shuffle.
|
|
|
|
// Otherwise, we'll merge 2 sources in each operation.
|
|
|
|
TTI::ShuffleKind ShuffleKind =
|
|
|
|
(NumOfMemOps > 1) ? TTI::SK_PermuteTwoSrc : TTI::SK_PermuteSingleSrc;
|
|
|
|
|
|
|
|
unsigned ShuffleCost =
|
|
|
|
getShuffleCost(ShuffleKind, SingleMemOpTy, 0, nullptr);
|
|
|
|
|
|
|
|
unsigned NumOfLoadsInInterleaveGrp =
|
|
|
|
Indices.size() ? Indices.size() : Factor;
|
|
|
|
Type *ResultTy = VectorType::get(VecTy->getVectorElementType(),
|
|
|
|
VecTy->getVectorNumElements() / Factor);
|
|
|
|
unsigned NumOfResults =
|
|
|
|
getTLI()->getTypeLegalizationCost(DL, ResultTy).first *
|
|
|
|
NumOfLoadsInInterleaveGrp;
|
|
|
|
|
|
|
|
// About a half of the loads may be folded in shuffles when we have only
|
|
|
|
// one result. If we have more than one result, we do not fold loads at all.
|
|
|
|
unsigned NumOfUnfoldedLoads =
|
|
|
|
NumOfResults > 1 ? NumOfMemOps : NumOfMemOps / 2;
|
|
|
|
|
|
|
|
// Get a number of shuffle operations per result.
|
|
|
|
unsigned NumOfShufflesPerResult =
|
|
|
|
std::max((unsigned)1, (unsigned)(NumOfMemOps - 1));
|
|
|
|
|
|
|
|
// The SK_MergeTwoSrc shuffle clobbers one of src operands.
|
|
|
|
// When we have more than one destination, we need additional instructions
|
|
|
|
// to keep sources.
|
|
|
|
unsigned NumOfMoves = 0;
|
|
|
|
if (NumOfResults > 1 && ShuffleKind == TTI::SK_PermuteTwoSrc)
|
|
|
|
NumOfMoves = NumOfResults * NumOfShufflesPerResult / 2;
|
|
|
|
|
|
|
|
int Cost = NumOfResults * NumOfShufflesPerResult * ShuffleCost +
|
|
|
|
NumOfUnfoldedLoads * MemOpCost + NumOfMoves;
|
|
|
|
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Store.
|
|
|
|
assert(Opcode == Instruction::Store &&
|
|
|
|
"Expected Store Instruction at this point");
|
2017-10-18 19:41:55 +08:00
|
|
|
// X86InterleavedAccess support only the following interleaved-access group.
|
|
|
|
static const CostTblEntry AVX512InterleavedStoreTbl[] = {
|
|
|
|
{3, MVT::v16i8, 12}, // interleave 3 x 16i8 into 48i8 (and store)
|
|
|
|
{3, MVT::v32i8, 14}, // interleave 3 x 32i8 into 96i8 (and store)
|
|
|
|
{3, MVT::v64i8, 26}, // interleave 3 x 64i8 into 96i8 (and store)
|
|
|
|
|
|
|
|
{4, MVT::v8i8, 10}, // interleave 4 x 8i8 into 32i8 (and store)
|
|
|
|
{4, MVT::v16i8, 11}, // interleave 4 x 16i8 into 64i8 (and store)
|
|
|
|
{4, MVT::v32i8, 14}, // interleave 4 x 32i8 into 128i8 (and store)
|
|
|
|
{4, MVT::v64i8, 24} // interleave 4 x 32i8 into 256i8 (and store)
|
|
|
|
};
|
|
|
|
|
|
|
|
if (const auto *Entry =
|
|
|
|
CostTableLookup(AVX512InterleavedStoreTbl, Factor, VT))
|
|
|
|
return NumOfMemOps * MemOpCost + Entry->Cost;
|
|
|
|
//If an entry does not exist, fallback to the default implementation.
|
2017-01-02 18:37:52 +08:00
|
|
|
|
|
|
|
// There is no strided stores meanwhile. And store can't be folded in
|
|
|
|
// shuffle.
|
|
|
|
unsigned NumOfSources = Factor; // The number of values to be merged.
|
|
|
|
unsigned ShuffleCost =
|
|
|
|
getShuffleCost(TTI::SK_PermuteTwoSrc, SingleMemOpTy, 0, nullptr);
|
|
|
|
unsigned NumOfShufflesPerStore = NumOfSources - 1;
|
|
|
|
|
|
|
|
// The SK_MergeTwoSrc shuffle clobbers one of src operands.
|
|
|
|
// We need additional instructions to keep sources.
|
|
|
|
unsigned NumOfMoves = NumOfMemOps * NumOfShufflesPerStore / 2;
|
|
|
|
int Cost = NumOfMemOps * (MemOpCost + NumOfShufflesPerStore * ShuffleCost) +
|
|
|
|
NumOfMoves;
|
|
|
|
return Cost;
|
|
|
|
}
|
|
|
|
|
|
|
|
int X86TTIImpl::getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
|
|
|
|
unsigned Factor,
|
|
|
|
ArrayRef<unsigned> Indices,
|
|
|
|
unsigned Alignment,
|
2018-10-14 16:50:06 +08:00
|
|
|
unsigned AddressSpace,
|
2018-10-31 17:57:56 +08:00
|
|
|
bool UseMaskForCond,
|
|
|
|
bool UseMaskForGaps) {
|
2017-12-07 02:40:46 +08:00
|
|
|
auto isSupportedOnAVX512 = [](Type *VecTy, bool HasBW) {
|
2017-01-02 18:37:52 +08:00
|
|
|
Type *EltTy = VecTy->getVectorElementType();
|
|
|
|
if (EltTy->isFloatTy() || EltTy->isDoubleTy() || EltTy->isIntegerTy(64) ||
|
|
|
|
EltTy->isIntegerTy(32) || EltTy->isPointerTy())
|
|
|
|
return true;
|
2017-12-07 02:40:46 +08:00
|
|
|
if (EltTy->isIntegerTy(16) || EltTy->isIntegerTy(8))
|
|
|
|
return HasBW;
|
2017-01-02 18:37:52 +08:00
|
|
|
return false;
|
|
|
|
};
|
2017-12-07 02:40:46 +08:00
|
|
|
if (ST->hasAVX512() && isSupportedOnAVX512(VecTy, ST->hasBWI()))
|
2017-01-02 18:37:52 +08:00
|
|
|
return getInterleavedMemoryOpCostAVX512(Opcode, VecTy, Factor, Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2017-06-25 16:26:25 +08:00
|
|
|
if (ST->hasAVX2())
|
|
|
|
return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2017-08-01 01:09:27 +08:00
|
|
|
|
2017-01-02 18:37:52 +08:00
|
|
|
return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
|
2018-10-31 17:57:56 +08:00
|
|
|
Alignment, AddressSpace,
|
|
|
|
UseMaskForCond, UseMaskForGaps);
|
2017-01-02 18:37:52 +08:00
|
|
|
}
|