Implemente patterns to extract HWord and Byte vector elements and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46774
llvm-svn: 333377
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.
Patch by Qing Shan Zhang (steven.zhang).
Differential Revision: https://reviews.llvm.org/D47178
llvm-svn: 333150
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46536
llvm-svn: 333115
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46333
llvm-svn: 333112
xsrqpi is currently using Z23Form_1.
The instruction format is xsrqpi R,VRT,VRB,RMC.
Rathar than bits 11-15 being used for FRA, it should have
bits 11-14 reserved and bit 15 for R. This patch adds a new
class Z23Form_4 to fix the instruction format.
Differential Revision: https://reviews.llvm.org/D46761
llvm-svn: 332253
Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.
Differential Revision: https://reviews.llvm.org/D46194
llvm-svn: 331797
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"
This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"
Differential Revision: https://reviews.llvm.org/D46102
llvm-svn: 331778
Legalize and emit code for converting unsigned HWord/Char to QP:
xscvsdqp
xscvudqp
Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.
Differential Revision: https://reviews.llvm.org/D45494
llvm-svn: 330278
Legalize and emit code for converting (Un)Signed Word to quad-precision via:
xscvsdqp
xscvudqp
Differential Revision: https://reviews.llvm.org/D45389
llvm-svn: 330273
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.
Differential Revision: https://reviews.llvm.org/D44746
llvm-svn: 328558
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.
Differential Revision: https://reviews.llvm.org/D43086
llvm-svn: 328556
The following set of instructions was originally planned to be added for Power 9
and so code was added to support them. However, a decision was made later on to
withdraw support for these instructions in the hardware.
xscmpnedp
xvcmpnesp
xvcmpnedp
This patch removes support for the instructions that were not added.
Differential Revision: https://reviews.llvm.org/D43641
llvm-svn: 325918
This patch extends on to rL307174 to not use the power9 vector extract with
variable index instructions when extracting word element 1. For such cases,
the existing selection of MFVSRWZ provides a better sequence.
Differential Revision: https://reviews.llvm.org/D38287
llvm-svn: 319049
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.
The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX
Differential Revision: https://reviews.llvm.org/D38486
llvm-svn: 318651
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Differential Revision: https://reviews.llvm.org/D34815
llvm-svn: 313886
Add codegen for VSX word extract conversion from signed/unsigned to single/double
precision.
For UINT_TO_FP:
Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239.
Here we will add the missing extract integer and conversion to double. This
utilizes the new P9 instruction xxextractuw to extracting an integer element
when the result will be converted to double thereby saving 2 direct moves
(VSR <-> GPR).
For SINT_TO_FP:
We will implement the following sequence which will also reduce the number of
instructions by saving 2 direct moves.
v4i32->f32:
xxspltw
xvcvsxwsp
xscvspdpn
v4i32->f64:
xxspltw
xvcvsxwdp
Differential Revision: https://reviews.llvm.org/D35859
llvm-svn: 310866
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.
Differential Revision: https://reviews.llvm.org/D35007
llvm-svn: 307934
This patch adds the exploitation for new power 9 instructions which extract
variable elements from vectors:
VEXTUBLX
VEXTUBRX
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
Differential Revision: https://reviews.llvm.org/D34032
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
llvm-svn: 307174
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009
llvm-svn: 307169
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.
Differential Revision: https://reviews.llvm.org/D33690
llvm-svn: 305214
There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI
Instruction, this patch recognizes them and does the selection to improve
the PPC performance.
Differential Revision: https://reviews.llvm.org/D33404
llvm-svn: 304298
Summary
clang -c -mcpu=pwr9 test/CodeGen/PowerPC/build-vector-tests.ll causes an assertion failure during the binary encoding.
The failure occurs when a D-form load instruction takes two register operands instead of a register + an immediate.
This patch fixes the problem and also adds an assertion to catch this failure earlier before the binary encoding (i.e. during lit test).
The fix is from Nemanja Ivanovic @nemanjai.
Differential Revision: https://reviews.llvm.org/D33482
llvm-svn: 304133
There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.
llvm-svn: 303822
According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.
This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.
Differential Revision: https://reviews.llvm.org/D32880
llvm-svn: 302834
Fixes PR30730.
This is a re-commit of a pulled commit. The commit was pulled because some
software projects contained uses of Altivec vectors that violated alignment
requirements. Known issues have now been fixed.
Committing on behalf of Lei Huang.
Differential Revision: https://reviews.llvm.org/D26861
llvm-svn: 301892
mfvrd and mffprd are both alias to mfvrsd.
This patch enables correct parsing of the aliases, but we still emit a mfvrsd.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29177
llvm-svn: 297849
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
instructions.
2) Updated the flags on a number of intrinsics indicating that they write
memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
propagate their memory operand
Review: https://reviews.llvm.org/D28818
llvm-svn: 293200
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.
Fixes PR31340.
llvm-svn: 289804
This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.
Differential Revision: https://reviews.llvm.org/D26066
llvm-svn: 288800
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.
llvm-svn: 288214
This patch corresponds to review:
https://reviews.llvm.org/D25912
This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.
llvm-svn: 288152
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.
Fixes PR31127.
llvm-svn: 287765
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
llvm-svn: 286967
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.
Differential review: https://reviews.llvm.org/D26536
llvm-svn: 286862