This patch adds support for the vector merge even word and vector merge odd word
instructions introduced in POWER8.
Phabricator review: http://reviews.llvm.org/D10704
llvm-svn: 240650
This patch corresponds to review:
http://reviews.llvm.org/D10096
This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd
llvm-svn: 239505
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
This patch adds support for the following new instructions in the
Power ISA 2.07:
vpksdss
vpksdus
vpkudus
vpkudum
vupkhsw
vupklsw
These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces. These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.
The first three instructions perform saturating pack operations. The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated. The other
instructions are only generated via built-in support for now.
Appropriate tests have been added.
There is a companion patch to clang for the rest of this support.
llvm-svn: 237499
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.
This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.
Phabricator review: http://reviews.llvm.org/D9475
llvm-svn: 236503
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
llvm-svn: 235910
The PowerPC backend had a number of loads that were marked as canFoldAsLoad
(and I'm partially at fault here for copying around the relevant line of
TableGen definitions without really looking at what it meant). This is not
right; PPC (non-memory) instructions don't support direct memory operands, and
so there is nothing a 'foldable' instruction could be folded into.
Noticed by inspection, no test case.
The one thing we might lose by doing this is ability to fold some loads into
stackmap/patchpoint pseudo-instructions. However, this was untested, and would
not obviously have worked for extending loads, and I'd rather re-add support
for that once it can be tested.
llvm-svn: 231982
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.
Phabricator review: http://reviews.llvm.org/D7469
llvm-svn: 228580
Patch by Kit Barton.
Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes. This is a fairly straightforward addition
after the changes made for vpopcnt:
1. Add the correct definitions for the various instructions in
PPCInstrAltivec.td
2. Make the CTLZ operation legal on vector types when using P8Altivec
in PPCISelLowering.cpp
Test Plan
Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.
Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.
llvm-svn: 228301
Patch by Kit Barton.
Add the vector population count instructions for byte, halfword, word,
and doubleword sizes. There are two major changes here:
PPCISelLowering.cpp: Make CTPOP legal for vector types.
PPCRegisterInfo.td: Added v2i64 to the VRRC register
definition. This is needed for the doubleword variations of the
integer ops that were added in P8.
Test Plan
Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s
Test the generation of the vpopcnt instructions for various vector
data types. When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types. Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions. There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.
llvm-svn: 228046
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
llvm-svn: 214923
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
llvm-svn: 214718
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.
llvm-svn: 214636
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
llvm-svn: 213915
Various masks on shufflevector instructions are recognizable as
specific PowerPC instructions (vector pack, vector merge, etc.).
There is existing code in PPCISelLowering.cpp to recognize the correct
patterns for big endian code. The masks for these instructions are
different for little endian code due to the big-endian numbering
employed by these instructions. This patch adds the recognition code
for little endian.
I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for
this. The existing recognizer test (vec_shuffle.ll) is unnecessarily
verbose and difficult to read, so I felt it was better to add a new
test rather than modify the old one.
llvm-svn: 210536
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.
Before this change, we had the following commutative instructions:
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
XSADDDP
XSMULDP
XVADDDP
XVADDSP
XVMULDP
XVMULSP
After:
ADD4
ADD4o
ADD8
ADD8o
ADDC
ADDC8
ADDC8o
ADDCo
ADDE
ADDE8
ADDE8o
ADDEo
AND
AND8
AND8o
ANDo
CRAND
CREQV
CRNAND
CRNOR
CROR
CRXOR
EQV
EQV8
EQV8o
EQVo
FADD
FADDS
FADDSo
FADDo
FMADD
FMADDS
FMADDSo
FMADDo
FMSUB
FMSUBS
FMSUBSo
FMSUBo
FMUL
FMULS
FMULSo
FMULo
FNMADD
FNMADDS
FNMADDSo
FNMADDo
FNMSUB
FNMSUBS
FNMSUBSo
FNMSUBo
MULHD
MULHDU
MULHDUo
MULHDo
MULHW
MULHWU
MULHWUo
MULHWo
MULLD
MULLDo
MULLW
MULLWo
NAND
NAND8
NAND8o
NANDo
NOR
NOR8
NOR8o
NORo
OR
OR8
OR8o
ORo
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
VADDCUW
VADDFP
VADDSBS
VADDSHS
VADDSWS
VADDUBM
VADDUBS
VADDUHM
VADDUHS
VADDUWM
VADDUWS
VAND
VAVGSB
VAVGSH
VAVGSW
VAVGUB
VAVGUH
VAVGUW
VMADDFP
VMAXFP
VMAXSB
VMAXSH
VMAXSW
VMAXUB
VMAXUH
VMAXUW
VMHADDSHS
VMHRADDSHS
VMINFP
VMINSB
VMINSH
VMINSW
VMINUB
VMINUH
VMINUW
VMLADDUHM
VMULESB
VMULESH
VMULEUB
VMULEUH
VMULOSB
VMULOSH
VMULOUB
VMULOUH
VNMSUBFP
VOR
VXOR
XOR
XOR8
XOR8o
XORo
XSADDDP
XSMADDADP
XSMAXDP
XSMINDP
XSMSUBADP
XSMULDP
XSNMADDADP
XSNMSUBADP
XVADDDP
XVADDSP
XVMADDADP
XVMADDASP
XVMAXDP
XVMAXSP
XVMINDP
XVMINSP
XVMSUBADP
XVMSUBASP
XVMULDP
XVMULSP
XVNMADDADP
XVNMADDASP
XVNMSUBADP
XVNMSUBASP
XXLAND
XXLNOR
XXLOR
XXLXOR
This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.
llvm-svn: 204609
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.
Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.
No functionality change intended.
llvm-svn: 195890
Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.
llvm-svn: 190605
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).
Another bug found by llvm-stress.
llvm-svn: 186108
A couple of AltiVec patterns are just specialized forms of the
generic instruction pattern, and should therefore be marked
isCodeGenOnly to avoid confusing the asm parser:
VCFSX_0, VCTUXS_0, VCFUX_0, VCTSXS_0, and V_SETALLONES.
Noticed by inspection of the generated PPCGenAsmMatcher.inc.
llvm-svn: 185533
In the default PowerPC assembler syntax, registers are specified simply
by number, so they cannot be distinguished from immediate values (without
looking at the opcode). This means that the default operand matching logic
for the asm parser does not work, and we need to specify custom matchers.
Since those can only be specified with RegisterOperand classes and not
directly on the RegisterClass, all instructions patterns used by the asm
parser need to use a RegisterOperand (instead of a RegisterClass) for
all their register operands.
This patch adds one RegisterOperand for each RegisterClass, using the
same name as the class, just in lower case, and updates all instruction
patterns to use RegisterOperand instead of RegisterClass operands.
llvm-svn: 180611
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).
Tests will be added together with the asm parser.
llvm-svn: 180608
This patch follows up on work done by Bill Schmidt in r178277,
and replaces most of the remaining uses of VRRC in ISEL DAG patterns.
The resulting .inc files are identical except for comments, so
no change in code generation is expected.
llvm-svn: 178656
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
llvm-svn: 178617
This follows up Ulrich Weigand's work in PPCInstrInfo.td and
PPCInstr64Bit.td by doing the corresponding work for most of the
Altivec patterns. I have not been able to do anything for the
following classes of instructions:
(1) Vector logicals. These don't have corresponding intrinsics and
don't have a single obvious vector type. So far as I can tell I need
to leave these as VRRC. Affected instructions are: VAND, VANDC,
VNOR, VOR, VXOR, V_SET0.
(2) Instructions that make use of vector shuffle. The selection code
promotes all shuffles to v16i8, so any pattern that matches on a
shuffle is constrained. I haven't found any way to make the patterns
match on their natural types, so I plan to leave these as VRRC.
Affected instructions are: VMRG*, VSPLTB, VSPLTH, VSPLTW, VPKUHUM,
VPKUWUM.
No change in behavior is anticipated.
llvm-svn: 178277
There remain a number of patterns that cannot (and should not)
be handled by the asm parser, in particular all the Pseudo patterns.
This commit marks those patterns as isCodeGenOnly.
No change in generated code.
llvm-svn: 178008
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.
No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).
llvm-svn: 177152
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.
The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).
The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.
llvm-svn: 168998
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.
llvm-svn: 168086
Loads and stores can have different pipeline behavior, especially on
embedded chips. This change allows those differences to be expressed.
Except for the 440 scheduler, there are no functionality changes.
On the 440, the latency adjustment is only by one cycle, and so this
probably does not affect much. Nevertheless, it will make a larger
difference in the future and this removes a FIXME from the 440 itin.
llvm-svn: 153821
PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.
In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.
llvm-svn: 70225