[PPC64LE] Teach swap optimization about the doubleword splat idiom

With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register.  We can implement this using xxspltd (a special form of
xxpermdi).  This patch teaches the swap optimization pass about this
idiom.

As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG.  Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register.  However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result.  In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.

The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element).  To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations.  That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region.  This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI.  Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.

A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.

llvm-svn: 241285
This commit is contained in:
Bill Schmidt 2015-07-02 17:03:06 +00:00
parent 2cd195166f
commit 7c691fee1c
2 changed files with 46 additions and 12 deletions

View File

@ -260,7 +260,7 @@ bool PPCVSXSwapRemoval::gatherVectorInstructions() {
// select, compare, etc.).
SwapVector[VecIdx].IsSwappable = 1;
break;
case PPC::XXPERMDI:
case PPC::XXPERMDI: {
// This is a swap if it is of the form XXPERMDI t, s, s, 2.
// Unfortunately, MachineCSE ignores COPY and SUBREG_TO_REG, so we
// can also see XXPERMDI t, SUBREG_TO_REG(s), SUBREG_TO_REG(s), 2,
@ -268,9 +268,8 @@ bool PPCVSXSwapRemoval::gatherVectorInstructions() {
// SUBREG_TO_REG to find the real source value for comparison.
// If the real source value is a physical register, then mark the
// XXPERMDI as mentioning a physical register.
// Any other form of XXPERMDI is lane-sensitive and unsafe
// for the optimization.
if (MI.getOperand(3).getImm() == 2) {
int immed = MI.getOperand(3).getImm();
if (immed == 2) {
unsigned trueReg1 = lookThruCopyLike(MI.getOperand(1).getReg(),
VecIdx);
unsigned trueReg2 = lookThruCopyLike(MI.getOperand(2).getReg(),
@ -278,7 +277,26 @@ bool PPCVSXSwapRemoval::gatherVectorInstructions() {
if (trueReg1 == trueReg2)
SwapVector[VecIdx].IsSwap = 1;
}
// This is a doubleword splat if it is of the form
// XXPERMDI t, s, s, 0 or XXPERMDI t, s, s, 3. As above we
// must look through chains of copy-likes to find the source
// register. We turn off the marking for mention of a physical
// register, because splatting it is safe; the optimization
// will not swap the value in the physical register.
else if (immed == 0 || immed == 3) {
unsigned trueReg1 = lookThruCopyLike(MI.getOperand(1).getReg(),
VecIdx);
unsigned trueReg2 = lookThruCopyLike(MI.getOperand(2).getReg(),
VecIdx);
if (trueReg1 == trueReg2) {
SwapVector[VecIdx].IsSwappable = 1;
SwapVector[VecIdx].MentionsPhysVR = 0;
}
}
// Any other form of XXPERMDI is lane-sensitive and unsafe
// for the optimization.
break;
}
case PPC::LVX:
// Non-permuting loads are currently unsafe. We can use special
// handling for this in the future. By not marking these as
@ -307,14 +325,6 @@ bool PPCVSXSwapRemoval::gatherVectorInstructions() {
SwapVector[VecIdx].IsStore = 1;
SwapVector[VecIdx].IsSwap = 1;
break;
case PPC::SUBREG_TO_REG:
// These are fine provided they are moving between full vector
// register classes. For example, the VRs are a subset of the
// VSRs, but each VR and each VSR is a full 128-bit register.
if (isVecReg(MI.getOperand(0).getReg()) &&
isVecReg(MI.getOperand(2).getReg()))
SwapVector[VecIdx].IsSwappable = 1;
break;
case PPC::COPY:
// These are fine provided they are moving between full vector
// register classes.

View File

@ -0,0 +1,24 @@
; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu -O3 < %s | FileCheck %s
; This test verifies that VSX swap optimization works for the
; doubleword splat idiom.
@a = external global <2 x double>, align 16
@b = external global <2 x double>, align 16
define void @test(double %s) {
entry:
%0 = insertelement <2 x double> undef, double %s, i32 0
%1 = shufflevector <2 x double> %0, <2 x double> undef, <2 x i32> zeroinitializer
%2 = load <2 x double>, <2 x double>* @a, align 16
%3 = fadd <2 x double> %0, %2
store <2 x double> %3, <2 x double>* @b, align 16
ret void
}
; CHECK-LABEL: @test
; CHECK: xxspltd
; CHECK: lxvd2x
; CHECK: xvadddp
; CHECK: stxvd2x
; CHECK-NOT: xxswapd