[PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads

Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.

When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:

The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.

A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.

To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.

Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.

llvm-svn: 214481
This commit is contained in:
Hal Finkel 2014-08-01 05:20:41 +00:00
parent f39ebbe613
commit b6d0d6b263
2 changed files with 33 additions and 52 deletions

View File

@ -6217,7 +6217,10 @@ MemSDNode::MemSDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
assert(isVolatile() == MMO->isVolatile() && "Volatile encoding error!");
assert(isNonTemporal() == MMO->isNonTemporal() &&
"Non-temporal encoding error!");
assert(memvt.getStoreSize() == MMO->getSize() && "Size mismatch!");
// We check here that the size of the memory operand fits within the size of
// the MMO. This is because the MMO might indicate only a possible address
// range instead of specifying the affected memory addresses precisely.
assert(memvt.getStoreSize() <= MMO->getSize() && "Size mismatch!");
}
MemSDNode::MemSDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
@ -6227,7 +6230,7 @@ MemSDNode::MemSDNode(unsigned Opc, unsigned Order, DebugLoc dl, SDVTList VTs,
SubclassData = encodeMemSDNodeFlags(0, ISD::UNINDEXED, MMO->isVolatile(),
MMO->isNonTemporal(), MMO->isInvariant());
assert(isVolatile() == MMO->isVolatile() && "Volatile encoding error!");
assert(memvt.getStoreSize() == MMO->getSize() && "Size mismatch!");
assert(memvt.getStoreSize() <= MMO->getSize() && "Size mismatch!");
}
/// Profile - Gather unique data for the node.

View File

@ -8486,17 +8486,25 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
Intrinsic::ppc_altivec_lvsl);
SDValue PermCntl = BuildIntrinsicOp(Intr, Ptr, DAG, dl, MVT::v16i8);
// Refine the alignment of the original load (a "new" load created here
// which was identical to the first except for the alignment would be
// merged with the existing node regardless).
// Create the new MMO for the new base load. It is like the original MMO,
// but represents an area in memory almost twice the vector size centered
// on the original address. If the address is unaligned, we might start
// reading up to (sizeof(vector)-1) bytes below the address of the
// original unaligned load.
MachineFunction &MF = DAG.getMachineFunction();
MachineMemOperand *MMO =
MF.getMachineMemOperand(LD->getPointerInfo(),
LD->getMemOperand()->getFlags(),
LD->getMemoryVT().getStoreSize(),
ABIAlignment);
LD->refineAlignment(MMO);
SDValue BaseLoad = SDValue(LD, 0);
MachineMemOperand *BaseMMO =
MF.getMachineMemOperand(LD->getMemOperand(),
-LD->getMemoryVT().getStoreSize()+1,
2*LD->getMemoryVT().getStoreSize()-1);
// Create the new base load.
SDValue LDXIntID = DAG.getTargetConstant(Intrinsic::ppc_altivec_lvx,
getPointerTy());
SDValue BaseLoadOps[] = { Chain, LDXIntID, Ptr };
SDValue BaseLoad =
DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, dl,
DAG.getVTList(MVT::v4i32, MVT::Other),
BaseLoadOps, MVT::v4i32, BaseMMO);
// Note that the value of IncOffset (which is provided to the next
// load's pointer info offset value, and thus used to calculate the
@ -8518,21 +8526,18 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
SDValue Increment = DAG.getConstant(IncValue, getPointerTy());
Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr, Increment);
MachineMemOperand *ExtraMMO =
MF.getMachineMemOperand(LD->getMemOperand(),
1, 2*LD->getMemoryVT().getStoreSize()-1);
SDValue ExtraLoadOps[] = { Chain, LDXIntID, Ptr };
SDValue ExtraLoad =
DAG.getLoad(VT, dl, Chain, Ptr,
LD->getPointerInfo().getWithOffset(IncOffset),
LD->isVolatile(), LD->isNonTemporal(),
LD->isInvariant(), ABIAlignment);
DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, dl,
DAG.getVTList(MVT::v4i32, MVT::Other),
ExtraLoadOps, MVT::v4i32, ExtraMMO);
SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
BaseLoad.getValue(1), ExtraLoad.getValue(1));
if (BaseLoad.getValueType() != MVT::v4i32)
BaseLoad = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, BaseLoad);
if (ExtraLoad.getValueType() != MVT::v4i32)
ExtraLoad = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, ExtraLoad);
// Because vperm has a big-endian bias, we must reverse the order
// of the input vectors and complement the permute control vector
// when generating little endian code. We have already handled the
@ -8549,36 +8554,9 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
if (VT != MVT::v4i32)
Perm = DAG.getNode(ISD::BITCAST, dl, VT, Perm);
// Now we need to be really careful about how we update the users of the
// original load. We cannot just call DCI.CombineTo (or
// DAG.ReplaceAllUsesWith for that matter), because the load still has
// uses created here (the permutation for example) that need to stay.
SDNode::use_iterator UI = N->use_begin(), UE = N->use_end();
while (UI != UE) {
SDUse &Use = UI.getUse();
SDNode *User = *UI;
// Note: BaseLoad is checked here because it might not be N, but a
// bitcast of N.
if (User == Perm.getNode() || User == BaseLoad.getNode() ||
User == TF.getNode() || Use.getResNo() > 1) {
++UI;
continue;
}
SDValue To = Use.getResNo() ? TF : Perm;
++UI;
SmallVector<SDValue, 8> Ops;
for (const SDUse &O : User->ops()) {
if (O == Use)
Ops.push_back(To);
else
Ops.push_back(O);
}
DAG.UpdateNodeOperands(User, Ops);
}
// The output of the permutation is our loaded result, the TokenFactor is
// our new chain.
DCI.CombineTo(N, Perm, TF);
return SDValue(N, 0);
}
}