In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
This commit is contained in:
Nirav Dave 2016-12-14 15:44:26 +00:00
parent ce452ceb6a
commit 8527ab0ad2
67 changed files with 2364 additions and 2587 deletions

View File

@ -16,7 +16,6 @@
// //
//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//
#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/ADT/SetVector.h" #include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallBitVector.h" #include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallPtrSet.h"
@ -25,6 +24,7 @@
#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGTargetInfo.h" #include "llvm/CodeGen/SelectionDAGTargetInfo.h"
#include "llvm/IR/DataLayout.h" #include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h" #include "llvm/IR/DerivedTypes.h"
@ -40,6 +40,7 @@
#include "llvm/Target/TargetRegisterInfo.h" #include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h" #include "llvm/Target/TargetSubtargetInfo.h"
#include <algorithm> #include <algorithm>
#include <set>
using namespace llvm; using namespace llvm;
#define DEBUG_TYPE "dagcombine" #define DEBUG_TYPE "dagcombine"
@ -52,10 +53,6 @@ STATISTIC(LdStFP2Int , "Number of fp load/store pairs transformed to int");
STATISTIC(SlicedLoads, "Number of load sliced"); STATISTIC(SlicedLoads, "Number of load sliced");
namespace { namespace {
static cl::opt<bool>
CombinerAA("combiner-alias-analysis", cl::Hidden,
cl::desc("Enable DAG combiner alias-analysis heuristics"));
static cl::opt<bool> static cl::opt<bool>
CombinerGlobalAA("combiner-global-alias-analysis", cl::Hidden, CombinerGlobalAA("combiner-global-alias-analysis", cl::Hidden,
cl::desc("Enable DAG combiner's use of IR alias analysis")); cl::desc("Enable DAG combiner's use of IR alias analysis"));
@ -417,15 +414,12 @@ namespace {
/// Holds a pointer to an LSBaseSDNode as well as information on where it /// Holds a pointer to an LSBaseSDNode as well as information on where it
/// is located in a sequence of memory operations connected by a chain. /// is located in a sequence of memory operations connected by a chain.
struct MemOpLink { struct MemOpLink {
MemOpLink (LSBaseSDNode *N, int64_t Offset, unsigned Seq): MemOpLink(LSBaseSDNode *N, int64_t Offset)
MemNode(N), OffsetFromBase(Offset), SequenceNum(Seq) { } : MemNode(N), OffsetFromBase(Offset) {}
// Ptr to the mem node. // Ptr to the mem node.
LSBaseSDNode *MemNode; LSBaseSDNode *MemNode;
// Offset from the base ptr. // Offset from the base ptr.
int64_t OffsetFromBase; int64_t OffsetFromBase;
// What is the sequence number of this mem node.
// Lowest mem operand in the DAG starts at zero.
unsigned SequenceNum;
}; };
/// This is a helper function for visitMUL to check the profitability /// This is a helper function for visitMUL to check the profitability
@ -440,7 +434,6 @@ namespace {
/// constant build_vector of the stored constant values in Stores. /// constant build_vector of the stored constant values in Stores.
SDValue getMergedConstantVectorStore(SelectionDAG &DAG, const SDLoc &SL, SDValue getMergedConstantVectorStore(SelectionDAG &DAG, const SDLoc &SL,
ArrayRef<MemOpLink> Stores, ArrayRef<MemOpLink> Stores,
SmallVectorImpl<SDValue> &Chains,
EVT Ty) const; EVT Ty) const;
/// This is a helper function for visitAND and visitZERO_EXTEND. Returns /// This is a helper function for visitAND and visitZERO_EXTEND. Returns
@ -455,18 +448,15 @@ namespace {
/// This is a helper function for MergeConsecutiveStores. When the source /// This is a helper function for MergeConsecutiveStores. When the source
/// elements of the consecutive stores are all constants or all extracted /// elements of the consecutive stores are all constants or all extracted
/// vector elements, try to merge them into one larger store. /// vector elements, try to merge them into one larger store.
/// \return number of stores that were merged into a merged store (always /// \return True if a merged store was created.
/// a prefix of \p StoreNode). bool MergeStoresOfConstantsOrVecElts(SmallVectorImpl<MemOpLink> &StoreNodes,
bool MergeStoresOfConstantsOrVecElts( EVT MemVT, unsigned NumStores,
SmallVectorImpl<MemOpLink> &StoreNodes, EVT MemVT, unsigned NumStores, bool IsConstantSrc, bool UseVector);
bool IsConstantSrc, bool UseVector);
/// This is a helper function for MergeConsecutiveStores. /// This is a helper function for MergeConsecutiveStores.
/// Stores that may be merged are placed in StoreNodes. /// Stores that may be merged are placed in StoreNodes.
/// Loads that may alias with those stores are placed in AliasLoadNodes. void getStoreMergeCandidates(StoreSDNode *St,
void getStoreMergeAndAliasCandidates( SmallVectorImpl<MemOpLink> &StoreNodes);
StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,
SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes);
/// Helper function for MergeConsecutiveStores. Checks if /// Helper function for MergeConsecutiveStores. Checks if
/// Candidate stores have indirect dependency through their /// Candidate stores have indirect dependency through their
@ -1575,7 +1565,7 @@ SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
} }
SmallVector<SDNode *, 8> TFs; // List of token factors to visit. SmallVector<SDNode *, 8> TFs; // List of token factors to visit.
SmallVector<SDValue, 8> Ops; // Ops for replacing token factor. SmallVector<SDValue, 8> Ops; // Ops for replacing token factor.
SmallPtrSet<SDNode*, 16> SeenOps; SmallPtrSet<SDNode*, 16> SeenOps;
bool Changed = false; // If we should replace this token factor. bool Changed = false; // If we should replace this token factor.
@ -1619,6 +1609,81 @@ SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
} }
} }
// Remove Nodes that are chained to another node in the list. Do so
// by walking up chains breath-first stopping when we've seen
// another operand. In general we must climb to the EntryNode, but we can exit
// early if we find all remaining work is associated with just one operand as
// no further pruning is possible.
// List of nodes to search through and original Ops from which they originate.
SmallVector<std::pair<SDNode *, unsigned>, 8> Worklist;
SmallVector<unsigned, 8> OpWorkCount; // Count of work for each Op.
SmallPtrSet<SDNode *, 16> SeenChains;
bool DidPruneOps = false;
unsigned NumLeftToConsider = 0;
for (const SDValue &Op : Ops) {
Worklist.push_back(std::make_pair(Op.getNode(), NumLeftToConsider++));
OpWorkCount.push_back(1);
}
auto AddToWorklist = [&](unsigned CurIdx, SDNode *Op, unsigned OpNumber) {
// If this is an Op, we can remove the op from the list. Remark any
// search associated with it as from the current OpNumber.
if (SeenOps.count(Op) != 0) {
DidPruneOps = true;
unsigned OrigOpNumber = 0;
while (Ops[OrigOpNumber].getNode() != Op && OrigOpNumber < Ops.size())
OrigOpNumber++;
assert((OrigOpNumber != Ops.size()) &&
"expected to find TokenFactor Operand");
// Re-mark worklist from OrigOpNumber to OpNumber
for (unsigned i = CurIdx + 1; i < Worklist.size(); ++i) {
if (Worklist[i].second == OrigOpNumber) {
Worklist[i].second = OpNumber;
}
}
OpWorkCount[OpNumber] += OpWorkCount[OrigOpNumber];
OpWorkCount[OrigOpNumber] = 0;
NumLeftToConsider--;
}
// Add if it's a new chain
if (SeenChains.insert(Op).second) {
OpWorkCount[OpNumber]++;
Worklist.push_back(std::make_pair(Op, OpNumber));
}
};
for (unsigned i = 0; i < Worklist.size(); ++i) {
// We need at least be consider at least 2 Ops to prune.
if (NumLeftToConsider <= 1)
break;
auto CurNode = Worklist[i].first;
auto CurOpNumber = Worklist[i].second;
assert((OpWorkCount[CurOpNumber] > 0) &&
"Node should not appear in worklist");
switch (CurNode->getOpcode()) {
case ISD::EntryToken:
// Hitting EntryToken is the only way for the search to terminate without
// hitting
// another operand's search. Prevent us from marking this operand
// considered.
NumLeftToConsider++;
break;
case ISD::TokenFactor:
for (const SDValue &Op : CurNode->op_values())
AddToWorklist(i, Op.getNode(), CurOpNumber);
break;
default:
if (auto *MemNode = dyn_cast<MemSDNode>(CurNode))
AddToWorklist(i, MemNode->getChain().getNode(), CurOpNumber);
break;
}
OpWorkCount[CurOpNumber]--;
if (OpWorkCount[CurOpNumber] == 0)
NumLeftToConsider--;
}
SDValue Result; SDValue Result;
// If we've changed things around then replace token factor. // If we've changed things around then replace token factor.
@ -1627,15 +1692,22 @@ SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
// The entry token is the only possible outcome. // The entry token is the only possible outcome.
Result = DAG.getEntryNode(); Result = DAG.getEntryNode();
} else { } else {
// New and improved token factor. if (DidPruneOps) {
Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops); SmallVector<SDValue, 8> PrunedOps;
//
for (const SDValue &Op : Ops) {
if (SeenChains.count(Op.getNode()) == 0)
PrunedOps.push_back(Op);
}
Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, PrunedOps);
} else {
Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops);
}
} }
// Add users to worklist if AA is enabled, since it may introduce // Add users to worklist, since we may introduce a lot of new
// a lot of new chained token factors while removing memory deps. // chained token factors while removing memory deps.
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA return CombineTo(N, Result, true /*add to worklist*/);
: DAG.getSubtarget().useAA();
return CombineTo(N, Result, UseAA /*add to worklist*/);
} }
return Result; return Result;
@ -10683,11 +10755,37 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
// TODO: Handle TRUNCSTORE/LOADEXT // TODO: Handle TRUNCSTORE/LOADEXT
if (OptLevel != CodeGenOpt::None && if (OptLevel != CodeGenOpt::None &&
ISD::isNormalLoad(N) && !LD->isVolatile()) { ISD::isNormalLoad(N) && !LD->isVolatile()) {
if (ISD::isNON_TRUNCStore(Chain.getNode())) { // We can forward a direct store or a store off of a tokenfactor.
if (Chain->getOpcode() == ISD::TokenFactor) {
// If we find a potential match, make sure we are not
// sidestepping a chain dependence from the tokenfactor. This
// may happen if one operand of the token factor depends is
// chained off the other.
for (const SDValue &ChainOp : Chain->op_values()) {
if (ISD::isNON_TRUNCStore(ChainOp.getNode())) {
StoreSDNode *PrevST = cast<StoreSDNode>(ChainOp);
if (PrevST->getBasePtr() == Ptr &&
PrevST->getValue().getValueType() == N->getValueType(0)) {
// Make Sure PrevSt is not a predecessor to another node in
// the token factor as this may implicitly bypass that node.
SmallPtrSet<const SDNode *, 16> Visited;
SmallVector<const SDNode *, 8> Worklist;
// Worklist is all other chainops
for (const SDValue &OtherChainOp : Chain->op_values())
if (OtherChainOp != ChainOp)
Worklist.push_back(OtherChainOp.getNode());
// If it's not a predecssor forwarding is safe.
if (!SDNode::hasPredecessorHelper(PrevST, Visited, Worklist))
return CombineTo(N, PrevST->getOperand(1), Chain);
}
}
}
} else if (ISD::isNON_TRUNCStore(Chain.getNode())) {
StoreSDNode *PrevST = cast<StoreSDNode>(Chain); StoreSDNode *PrevST = cast<StoreSDNode>(Chain);
if (PrevST->getBasePtr() == Ptr && if (PrevST->getBasePtr() == Ptr &&
PrevST->getValue().getValueType() == N->getValueType(0)) PrevST->getValue().getValueType() == N->getValueType(0))
return CombineTo(N, Chain.getOperand(1), Chain); return CombineTo(N, PrevST->getOperand(1), Chain);
} }
} }
@ -10705,14 +10803,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N) {
} }
} }
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA if (LD->isUnindexed()) {
: DAG.getSubtarget().useAA();
#ifndef NDEBUG
if (CombinerAAOnlyFunc.getNumOccurrences() &&
CombinerAAOnlyFunc != DAG.getMachineFunction().getName())
UseAA = false;
#endif
if (UseAA && LD->isUnindexed()) {
// Walk up chain skipping non-aliasing memory nodes. // Walk up chain skipping non-aliasing memory nodes.
SDValue BetterChain = FindBetterChain(N, Chain); SDValue BetterChain = FindBetterChain(N, Chain);
@ -11687,14 +11778,14 @@ bool DAGCombiner::isMulAddWithConstProfitable(SDNode *MulNode,
return false; return false;
} }
SDValue DAGCombiner::getMergedConstantVectorStore( SDValue DAGCombiner::getMergedConstantVectorStore(SelectionDAG &DAG,
SelectionDAG &DAG, const SDLoc &SL, ArrayRef<MemOpLink> Stores, const SDLoc &SL,
SmallVectorImpl<SDValue> &Chains, EVT Ty) const { ArrayRef<MemOpLink> Stores,
EVT Ty) const {
SmallVector<SDValue, 8> BuildVector; SmallVector<SDValue, 8> BuildVector;
for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) { for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {
StoreSDNode *St = cast<StoreSDNode>(Stores[I].MemNode); StoreSDNode *St = cast<StoreSDNode>(Stores[I].MemNode);
Chains.push_back(St->getChain());
BuildVector.push_back(St->getValue()); BuildVector.push_back(St->getValue());
} }
@ -11710,21 +11801,8 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8; int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8;
LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode; LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;
unsigned LatestNodeUsed = 0;
for (unsigned i=0; i < NumStores; ++i) {
// Find a chain for the new wide-store operand. Notice that some
// of the store nodes that we found may not be selected for inclusion
// in the wide store. The chain we use needs to be the chain of the
// latest store node which is *used* and replaced by the wide store.
if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].SequenceNum)
LatestNodeUsed = i;
}
SmallVector<SDValue, 8> Chains;
// The latest Node in the DAG. // The latest Node in the DAG.
LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].MemNode;
SDLoc DL(StoreNodes[0].MemNode); SDLoc DL(StoreNodes[0].MemNode);
SDValue StoredVal; SDValue StoredVal;
@ -11740,7 +11818,7 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
assert(TLI.isTypeLegal(Ty) && "Illegal vector store"); assert(TLI.isTypeLegal(Ty) && "Illegal vector store");
if (IsConstantSrc) { if (IsConstantSrc) {
StoredVal = getMergedConstantVectorStore(DAG, DL, StoreNodes, Chains, Ty); StoredVal = getMergedConstantVectorStore(DAG, DL, StoreNodes, Ty);
} else { } else {
SmallVector<SDValue, 8> Ops; SmallVector<SDValue, 8> Ops;
for (unsigned i = 0; i < NumStores; ++i) { for (unsigned i = 0; i < NumStores; ++i) {
@ -11750,7 +11828,6 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
if (Val.getValueType() != MemVT) if (Val.getValueType() != MemVT)
return false; return false;
Ops.push_back(Val); Ops.push_back(Val);
Chains.push_back(St->getChain());
} }
// Build the extracted vector elements back into a vector. // Build the extracted vector elements back into a vector.
@ -11770,7 +11847,6 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
for (unsigned i = 0; i < NumStores; ++i) { for (unsigned i = 0; i < NumStores; ++i) {
unsigned Idx = IsLE ? (NumStores - 1 - i) : i; unsigned Idx = IsLE ? (NumStores - 1 - i) : i;
StoreSDNode *St = cast<StoreSDNode>(StoreNodes[Idx].MemNode); StoreSDNode *St = cast<StoreSDNode>(StoreNodes[Idx].MemNode);
Chains.push_back(St->getChain());
SDValue Val = St->getValue(); SDValue Val = St->getValue();
StoreInt <<= ElementSizeBytes * 8; StoreInt <<= ElementSizeBytes * 8;
@ -11788,7 +11864,11 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
StoredVal = DAG.getConstant(StoreInt, DL, StoreTy); StoredVal = DAG.getConstant(StoreInt, DL, StoreTy);
} }
assert(!Chains.empty()); SmallVector<SDValue, 8> Chains;
// Gather all Chains we're inheriting
for (unsigned i = 0; i < NumStores; ++i)
Chains.push_back(StoreNodes[i].MemNode->getChain());
SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains); SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);
SDValue NewStore = DAG.getStore(NewChain, DL, StoredVal, SDValue NewStore = DAG.getStore(NewChain, DL, StoredVal,
@ -11796,46 +11876,20 @@ bool DAGCombiner::MergeStoresOfConstantsOrVecElts(
FirstInChain->getPointerInfo(), FirstInChain->getPointerInfo(),
FirstInChain->getAlignment()); FirstInChain->getAlignment());
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA // Replace all merged stores with the new store.
: DAG.getSubtarget().useAA(); for (unsigned i = 0; i < NumStores; ++i)
if (UseAA) { CombineTo(StoreNodes[i].MemNode, NewStore);
// Replace all merged stores with the new store.
for (unsigned i = 0; i < NumStores; ++i)
CombineTo(StoreNodes[i].MemNode, NewStore);
} else {
// Replace the last store with the new store.
CombineTo(LatestOp, NewStore);
// Erase all other stores.
for (unsigned i = 0; i < NumStores; ++i) {
if (StoreNodes[i].MemNode == LatestOp)
continue;
StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i].MemNode);
// ReplaceAllUsesWith will replace all uses that existed when it was
// called, but graph optimizations may cause new ones to appear. For
// example, the case in pr14333 looks like
//
// St's chain -> St -> another store -> X
//
// And the only difference from St to the other store is the chain.
// When we change it's chain to be St's chain they become identical,
// get CSEed and the net result is that X is now a use of St.
// Since we know that St is redundant, just iterate.
while (!St->use_empty())
DAG.ReplaceAllUsesWith(SDValue(St, 0), St->getChain());
deleteAndRecombine(St);
}
}
StoreNodes.erase(StoreNodes.begin() + NumStores, StoreNodes.end()); StoreNodes.erase(StoreNodes.begin() + NumStores, StoreNodes.end());
return true; return true;
} }
void DAGCombiner::getStoreMergeAndAliasCandidates( void DAGCombiner::getStoreMergeCandidates(
StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes, StoreSDNode *St, SmallVectorImpl<MemOpLink> &StoreNodes) {
SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes) {
// This holds the base pointer, index, and the offset in bytes from the base // This holds the base pointer, index, and the offset in bytes from the base
// pointer. // pointer.
BaseIndexOffset BasePtr = BaseIndexOffset::match(St->getBasePtr(), DAG); BaseIndexOffset BasePtr = BaseIndexOffset::match(St->getBasePtr(), DAG);
EVT MemVT = St->getMemoryVT();
// We must have a base and an offset. // We must have a base and an offset.
if (!BasePtr.Base.getNode()) if (!BasePtr.Base.getNode())
@ -11845,104 +11899,49 @@ void DAGCombiner::getStoreMergeAndAliasCandidates(
if (BasePtr.Base.isUndef()) if (BasePtr.Base.isUndef())
return; return;
// Walk up the chain and look for nodes with offsets from the same // We looking for a root node which is an ancestor to all mergable
// base pointer. Stop when reaching an instruction with a different kind // stores. We search up through a load, to our root and then down
// or instruction which has a different base pointer. // through all children. For instance we will find Store{1,2,3} if
EVT MemVT = St->getMemoryVT(); // St is Store1, Store2. or Store3 where the root is not a load
unsigned Seq = 0; // which always true for nonvolatile ops. TODO: Expand
StoreSDNode *Index = St; // the search to find all valid candidates through multiple layers of loads.
//
// Root
// |-------|-------|
// Load Load Store3
// | |
// Store1 Store2
//
// FIXME: We should be able to climb and
// descend TokenFactors to find candidates as well.
SDNode *RootNode = (St->getChain()).getNode();
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA // Set of Parents of Candidates
: DAG.getSubtarget().useAA(); std::set<SDNode *> CandidateParents;
if (UseAA) { if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(RootNode)) {
// Look at other users of the same chain. Stores on the same chain do not RootNode = Ldn->getChain().getNode();
// alias. If combiner-aa is enabled, non-aliasing stores are canonicalized for (auto I = RootNode->use_begin(), E = RootNode->use_end(); I != E; ++I)
// to be on the same chain, so don't bother looking at adjacent chains. if (I.getOperandNo() == 0 && isa<LoadSDNode>(*I)) // walk down chain
CandidateParents.insert(*I);
} else
CandidateParents.insert(RootNode);
SDValue Chain = St->getChain(); // check all parents of mergable children
for (auto I = Chain->use_begin(), E = Chain->use_end(); I != E; ++I) { for (auto P = CandidateParents.begin(); P != CandidateParents.end(); ++P)
if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) { for (auto I = (*P)->use_begin(), E = (*P)->use_end(); I != E; ++I)
if (I.getOperandNo() != 0) if (I.getOperandNo() == 0)
continue; if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) {
if (OtherST->isVolatile() || OtherST->isIndexed())
if (OtherST->isVolatile() || OtherST->isIndexed()) continue;
continue; if (OtherST->getMemoryVT() != MemVT)
continue;
if (OtherST->getMemoryVT() != MemVT) BaseIndexOffset Ptr =
continue; BaseIndexOffset::match(OtherST->getBasePtr(), DAG);
if (Ptr.equalBaseIndex(BasePtr))
BaseIndexOffset Ptr = BaseIndexOffset::match(OtherST->getBasePtr(), DAG); StoreNodes.push_back(MemOpLink(OtherST, Ptr.Offset));
if (Ptr.equalBaseIndex(BasePtr))
StoreNodes.push_back(MemOpLink(OtherST, Ptr.Offset, Seq++));
}
}
return;
}
while (Index) {
// If the chain has more than one use, then we can't reorder the mem ops.
if (Index != St && !SDValue(Index, 0)->hasOneUse())
break;
// Find the base pointer and offset for this memory node.
BaseIndexOffset Ptr = BaseIndexOffset::match(Index->getBasePtr(), DAG);
// Check that the base pointer is the same as the original one.
if (!Ptr.equalBaseIndex(BasePtr))
break;
// The memory operands must not be volatile.
if (Index->isVolatile() || Index->isIndexed())
break;
// No truncation.
if (Index->isTruncatingStore())
break;
// The stored memory type must be the same.
if (Index->getMemoryVT() != MemVT)
break;
// We do not allow under-aligned stores in order to prevent
// overriding stores. NOTE: this is a bad hack. Alignment SHOULD
// be irrelevant here; what MATTERS is that we not move memory
// operations that potentially overlap past each-other.
if (Index->getAlignment() < MemVT.getStoreSize())
break;
// We found a potential memory operand to merge.
StoreNodes.push_back(MemOpLink(Index, Ptr.Offset, Seq++));
// Find the next memory operand in the chain. If the next operand in the
// chain is a store then move up and continue the scan with the next
// memory operand. If the next operand is a load save it and use alias
// information to check if it interferes with anything.
SDNode *NextInChain = Index->getChain().getNode();
while (1) {
if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInChain)) {
// We found a store node. Use it for the next iteration.
Index = STn;
break;
} else if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(NextInChain)) {
if (Ldn->isVolatile()) {
Index = nullptr;
break;
} }
// Save the load node for later. Continue the scan.
AliasLoadNodes.push_back(Ldn);
NextInChain = Ldn->getChain().getNode();
continue;
} else {
Index = nullptr;
break;
}
}
}
} }
// We need to check that merging these stores does not cause a loop // We need to check that merging these stores does not cause a loop
@ -12004,64 +12003,35 @@ bool DAGCombiner::MergeConsecutiveStores(
if (MemVT.isVector() && IsLoadSrc) if (MemVT.isVector() && IsLoadSrc)
return false; return false;
// Only look at ends of store sequences. // Find potential store merge candidates by searching through chain sub-DAG
SDValue Chain = SDValue(St, 0); getStoreMergeCandidates(St, StoreNodes);
if (Chain->hasOneUse() && Chain->use_begin()->getOpcode() == ISD::STORE)
return false;
// Save the LoadSDNodes that we find in the chain.
// We need to make sure that these nodes do not interfere with
// any of the store nodes.
SmallVector<LSBaseSDNode*, 8> AliasLoadNodes;
getStoreMergeAndAliasCandidates(St, StoreNodes, AliasLoadNodes);
// Check if there is anything to merge. // Check if there is anything to merge.
if (StoreNodes.size() < 2) if (StoreNodes.size() < 2)
return false; return false;
// only do dependence check in AA case // Check that we can merge these candidates without causing a cycle
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA if (!checkMergeStoreCandidatesForDependencies(StoreNodes))
: DAG.getSubtarget().useAA();
if (UseAA && !checkMergeStoreCandidatesForDependencies(StoreNodes))
return false; return false;
// Sort the memory operands according to their distance from the // Sort the memory operands according to their distance from the
// base pointer. As a secondary criteria: make sure stores coming // base pointer.
// later in the code come first in the list. This is important for
// the non-UseAA case, because we're merging stores into the FINAL
// store along a chain which potentially contains aliasing stores.
// Thus, if there are multiple stores to the same address, the last
// one can be considered for merging but not the others.
std::sort(StoreNodes.begin(), StoreNodes.end(), std::sort(StoreNodes.begin(), StoreNodes.end(),
[](MemOpLink LHS, MemOpLink RHS) { [](MemOpLink LHS, MemOpLink RHS) {
return LHS.OffsetFromBase < RHS.OffsetFromBase || return LHS.OffsetFromBase < RHS.OffsetFromBase;
(LHS.OffsetFromBase == RHS.OffsetFromBase && });
LHS.SequenceNum < RHS.SequenceNum);
});
// Scan the memory operations on the chain and find the first non-consecutive // Scan the memory operations on the chain and find the first non-consecutive
// store memory address. // store memory address.
unsigned LastConsecutiveStore = 0; unsigned LastConsecutiveStore = 0;
int64_t StartAddress = StoreNodes[0].OffsetFromBase; int64_t StartAddress = StoreNodes[0].OffsetFromBase;
for (unsigned i = 0, e = StoreNodes.size(); i < e; ++i) {
// Check that the addresses are consecutive starting from the second // Check that the addresses are consecutive starting from the second
// element in the list of stores. // element in the list of stores.
if (i > 0) { for (unsigned i = 1, e = StoreNodes.size(); i < e; ++i) {
int64_t CurrAddress = StoreNodes[i].OffsetFromBase; int64_t CurrAddress = StoreNodes[i].OffsetFromBase;
if (CurrAddress - StartAddress != (ElementSizeBytes * i)) if (CurrAddress - StartAddress != (ElementSizeBytes * i))
break;
}
// Check if this store interferes with any of the loads that we found.
// If we find a load that alias with this store. Stop the sequence.
if (any_of(AliasLoadNodes, [&](LSBaseSDNode *Ldn) {
return isAlias(Ldn, StoreNodes[i].MemNode);
}))
break; break;
// Mark this node as useful.
LastConsecutiveStore = i; LastConsecutiveStore = i;
} }
@ -12215,7 +12185,7 @@ bool DAGCombiner::MergeConsecutiveStores(
} }
// We found a potential memory operand to merge. // We found a potential memory operand to merge.
LoadNodes.push_back(MemOpLink(Ld, LdPtr.Offset, 0)); LoadNodes.push_back(MemOpLink(Ld, LdPtr.Offset));
} }
if (LoadNodes.size() < 2) if (LoadNodes.size() < 2)
@ -12304,22 +12274,8 @@ bool DAGCombiner::MergeConsecutiveStores(
// Collect the chains from all merged stores. // Collect the chains from all merged stores.
SmallVector<SDValue, 8> MergeStoreChains; SmallVector<SDValue, 8> MergeStoreChains;
MergeStoreChains.push_back(StoreNodes[0].MemNode->getChain()); for (unsigned i = 0; i < NumElem; ++i)
// The latest Node in the DAG.
unsigned LatestNodeUsed = 0;
for (unsigned i=1; i<NumElem; ++i) {
// Find a chain for the new wide-store operand. Notice that some
// of the store nodes that we found may not be selected for inclusion
// in the wide store. The chain we use needs to be the chain of the
// latest store node which is *used* and replaced by the wide store.
if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].SequenceNum)
LatestNodeUsed = i;
MergeStoreChains.push_back(StoreNodes[i].MemNode->getChain()); MergeStoreChains.push_back(StoreNodes[i].MemNode->getChain());
}
LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].MemNode;
// Find if it is better to use vectors or integers to load and store // Find if it is better to use vectors or integers to load and store
// to memory. // to memory.
@ -12354,23 +12310,9 @@ bool DAGCombiner::MergeConsecutiveStores(
SDValue(NewLoad.getNode(), 1)); SDValue(NewLoad.getNode(), 1));
} }
if (UseAA) { // Replace the all stores with the new store.
// Replace the all stores with the new store. for (unsigned i = 0; i < NumElem; ++i)
for (unsigned i = 0; i < NumElem; ++i) CombineTo(StoreNodes[i].MemNode, NewStore);
CombineTo(StoreNodes[i].MemNode, NewStore);
} else {
// Replace the last store with the new store.
CombineTo(LatestOp, NewStore);
// Erase all other stores.
for (unsigned i = 0; i < NumElem; ++i) {
// Remove all Store nodes.
if (StoreNodes[i].MemNode == LatestOp)
continue;
StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i].MemNode);
DAG.ReplaceAllUsesOfValueWith(SDValue(St, 0), St->getChain());
deleteAndRecombine(St);
}
}
StoreNodes.erase(StoreNodes.begin() + NumElem, StoreNodes.end()); StoreNodes.erase(StoreNodes.begin() + NumElem, StoreNodes.end());
return true; return true;
@ -12529,19 +12471,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *N) {
if (SDValue NewST = TransformFPLoadStorePair(N)) if (SDValue NewST = TransformFPLoadStorePair(N))
return NewST; return NewST;
bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA if (ST->isUnindexed()) {
: DAG.getSubtarget().useAA();
#ifndef NDEBUG
if (CombinerAAOnlyFunc.getNumOccurrences() &&
CombinerAAOnlyFunc != DAG.getMachineFunction().getName())
UseAA = false;
#endif
if (UseAA && ST->isUnindexed()) {
// FIXME: We should do this even without AA enabled. AA will just allow
// FindBetterChain to work in more situations. The problem with this is that
// any combine that expects memory operations to be on consecutive chains
// first needs to be updated to look for users of the same chain.
// Walk up chain skipping non-aliasing memory nodes, on this store and any // Walk up chain skipping non-aliasing memory nodes, on this store and any
// adjacent stores. // adjacent stores.
if (findBetterNeighborChains(ST)) { if (findBetterNeighborChains(ST)) {
@ -12575,8 +12505,13 @@ SDValue DAGCombiner::visitSTORE(SDNode *N) {
if (SimplifyDemandedBits( if (SimplifyDemandedBits(
Value, Value,
APInt::getLowBitsSet(Value.getScalarValueSizeInBits(), APInt::getLowBitsSet(Value.getScalarValueSizeInBits(),
ST->getMemoryVT().getScalarSizeInBits()))) ST->getMemoryVT().getScalarSizeInBits()))) {
// Re-visit the store if anything changed; SimplifyDemandedBits
// will add Value's node back to the worklist if necessary, but
// we also need to re-visit the Store node itself.
AddToWorklist(N);
return SDValue(N, 0); return SDValue(N, 0);
}
} }
// If this is a load followed by a store to the same location, then the store // If this is a load followed by a store to the same location, then the store
@ -15704,6 +15639,18 @@ SDValue DAGCombiner::FindBetterChain(SDNode *N, SDValue OldChain) {
return DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Aliases); return DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Aliases);
} }
// This function tries to collect a bunch of potentially interesting
// nodes to improve the chains of, all at once. This might seem
// redundant, as this function gets called when visiting every store
// node, so why not let the work be done on each store as it's visited?
//
// I believe this is mainly important because MergeConsecutiveStores
// is unable to deal with merging stores of different sizes, so unless
// we improve the chains of all the potential candidates up-front
// before running MergeConsecutiveStores, it might only see some of
// the nodes that will eventually be candidates, and then not be able
// to go from a partially-merged state to the desired final
// fully-merged state.
bool DAGCombiner::findBetterNeighborChains(StoreSDNode *St) { bool DAGCombiner::findBetterNeighborChains(StoreSDNode *St) {
// This holds the base pointer, index, and the offset in bytes from the base // This holds the base pointer, index, and the offset in bytes from the base
// pointer. // pointer.
@ -15739,10 +15686,8 @@ bool DAGCombiner::findBetterNeighborChains(StoreSDNode *St) {
if (!Ptr.equalBaseIndex(BasePtr)) if (!Ptr.equalBaseIndex(BasePtr))
break; break;
// Find the next memory operand in the chain. If the next operand in the // Walk up the chain to find the next store node, ignoring any
// chain is a store then move up and continue the scan with the next // intermediate loads. Any other kind of node will halt the loop.
// memory operand. If the next operand is a load save it and use alias
// information to check if it interferes with anything.
SDNode *NextInChain = Index->getChain().getNode(); SDNode *NextInChain = Index->getChain().getNode();
while (true) { while (true) {
if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInChain)) { if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInChain)) {
@ -15761,9 +15706,14 @@ bool DAGCombiner::findBetterNeighborChains(StoreSDNode *St) {
Index = nullptr; Index = nullptr;
break; break;
} }
} } // end while
} }
// At this point, ChainedStores lists all of the Store nodes
// reachable by iterating up through chain nodes matching the above
// conditions. For each such store identified, try to find an
// earlier chain to attach the store to which won't violate the
// required ordering.
bool MadeChangeToSt = false; bool MadeChangeToSt = false;
SmallVector<std::pair<StoreSDNode *, SDValue>, 8> BetterChains; SmallVector<std::pair<StoreSDNode *, SDValue>, 8> BetterChains;

View File

@ -828,7 +828,7 @@ TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {
MinFunctionAlignment = 0; MinFunctionAlignment = 0;
PrefFunctionAlignment = 0; PrefFunctionAlignment = 0;
PrefLoopAlignment = 0; PrefLoopAlignment = 0;
GatherAllAliasesMaxDepth = 6; GatherAllAliasesMaxDepth = 18;
MinStackArgumentAlignment = 1; MinStackArgumentAlignment = 1;
// TODO: the default will be switched to 0 in the next commit, along // TODO: the default will be switched to 0 in the next commit, along
// with the Target-specific changes necessary. // with the Target-specific changes necessary.

View File

@ -453,16 +453,6 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
PredictableSelectIsExpensive = false; PredictableSelectIsExpensive = false;
// We want to find all load dependencies for long chains of stores to enable
// merging into very wide vectors. The problem is with vectors with > 4
// elements. MergeConsecutiveStores will attempt to merge these because x8/x16
// vectors are a legal type, even though we have to split the loads
// usually. When we can more precisely specify load legality per address
// space, we should be able to make FindBetterChain/MergeConsecutiveStores
// smarter so that they can figure out what to do in 2 iterations without all
// N > 4 stores on the same chain.
GatherAllAliasesMaxDepth = 16;
// FIXME: Need to really handle these. // FIXME: Need to really handle these.
MaxStoresPerMemcpy = 4096; MaxStoresPerMemcpy = 4096;
MaxStoresPerMemmove = 4096; MaxStoresPerMemmove = 4096;

View File

@ -62,7 +62,7 @@ define i64 @test_hfa_ignores_gprs([7 x float], [2 x float] %in, i64, i64 %res) {
; but should go in an 8-byte aligned slot. ; but should go in an 8-byte aligned slot.
define void @test_varargs_stackalign() { define void @test_varargs_stackalign() {
; CHECK-LABEL: test_varargs_stackalign: ; CHECK-LABEL: test_varargs_stackalign:
; CHECK-DARWINPCS: stp {{w[0-9]+}}, {{w[0-9]+}}, [sp, #16] ; CHECK-DARWINPCS: str {{x[0-9]+}}, [sp, #16]
call void(...) @callee([3 x float] undef, [2 x float] [float 1.0, float 2.0]) call void(...) @callee([3 x float] undef, [2 x float] [float 1.0, float 2.0])
ret void ret void

View File

@ -205,10 +205,7 @@ declare i32 @args_i32(i32, i32, i32, i32, i32, i32, i32, i32, i16 signext, i32,
define i32 @test8(i32 %argc, i8** nocapture %argv) nounwind { define i32 @test8(i32 %argc, i8** nocapture %argv) nounwind {
entry: entry:
; CHECK-LABEL: test8 ; CHECK-LABEL: test8
; CHECK: strb {{w[0-9]+}}, [sp, #3] ; CHECK: str w8, [sp]
; CHECK: strb wzr, [sp, #2]
; CHECK: strb {{w[0-9]+}}, [sp, #1]
; CHECK: strb wzr, [sp]
; CHECK: bl ; CHECK: bl
; FAST-LABEL: test8 ; FAST-LABEL: test8
; FAST: strb {{w[0-9]+}}, [sp] ; FAST: strb {{w[0-9]+}}, [sp]

View File

@ -13,8 +13,8 @@ define void @t2() nounwind ssp {
entry: entry:
; CHECK-LABEL: t2: ; CHECK-LABEL: t2:
; CHECK: strh wzr, [sp, #32] ; CHECK: strh wzr, [sp, #32]
; CHECK: stp xzr, xzr, [sp, #16] ; CHECK: stp xzr, xzr, [sp, #8]
; CHECK: str xzr, [sp, #8] ; CHECK: str xzr, [sp, #24]
%buf = alloca [26 x i8], align 1 %buf = alloca [26 x i8], align 1
%0 = getelementptr inbounds [26 x i8], [26 x i8]* %buf, i32 0, i32 0 %0 = getelementptr inbounds [26 x i8], [26 x i8]* %buf, i32 0, i32 0
call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false) call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)

View File

@ -1370,8 +1370,11 @@ entry:
define void @merge_zr32_2_offset(i32* %p) { define void @merge_zr32_2_offset(i32* %p) {
; CHECK-LABEL: merge_zr32_2_offset: ; CHECK-LABEL: merge_zr32_2_offset:
; CHECK: // %entry ; CHECK: // %entry
; CHECK-NEXT: stp xzr, xzr, [x{{[0-9]+}}, #504] ; CHECK-NEXT: str xzr, [x0, #512]
; CHECK-NEXT: str xzr, [x0, #504]
; CHECK-NEXT: ret ; CHECK-NEXT: ret
; We should be able to merge these stores
; CHECKFIXME-NEXT: stp xzr, xzr, [x{{[0-9]+}}, #504]
entry: entry:
%p0 = getelementptr i32, i32* %p, i32 126 %p0 = getelementptr i32, i32* %p, i32 126
store i32 0, i32* %p0 store i32 0, i32* %p0
@ -1411,8 +1414,8 @@ entry:
define void @merge_zr32_3(i32* %p) { define void @merge_zr32_3(i32* %p) {
; CHECK-LABEL: merge_zr32_3: ; CHECK-LABEL: merge_zr32_3:
; CHECK: // %entry ; CHECK: // %entry
; CHECK-NEXT: movi v[[REG:[0-9]]].2d, #0000000000000000 ; CHECK-NEXT: stp xzr, xzr, [x[[REG:[0-9]+]]]
; CHECK-NEXT: stp q[[REG]], q[[REG]], [x{{[0-9]+}}] ; CHECK-NEXT: stp xzr, xzr, [x[[REG]], #16]
; CHECK-NEXT: ret ; CHECK-NEXT: ret
entry: entry:
store i32 0, i32* %p store i32 0, i32* %p
@ -1507,8 +1510,8 @@ entry:
define void @merge_zr64_2(i64* %p) { define void @merge_zr64_2(i64* %p) {
; CHECK-LABEL: merge_zr64_2: ; CHECK-LABEL: merge_zr64_2:
; CHECK: // %entry ; CHECK: // %entry
; CHECK-NEXT: movi v[[REG:[0-9]]].2d, #0000000000000000 ; CHECK-NEXT: stp xzr, xzr, [x[[REG:[0-9]+]]]
; CHECK-NEXT: stp q[[REG]], q[[REG]], [x{{[0-9]+}}] ; CHECK-NEXT: stp xzr, xzr, [x[[REG]], #16]
; CHECK-NEXT: ret ; CHECK-NEXT: ret
entry: entry:
store i64 0, i64* %p store i64 0, i64* %p

View File

@ -4,8 +4,9 @@
@g0 = external global <3 x float>, align 16 @g0 = external global <3 x float>, align 16
@g1 = external global <3 x float>, align 4 @g1 = external global <3 x float>, align 4
; CHECK: ldr s[[R0:[0-9]+]], {{\[}}[[R1:x[0-9]+]]{{\]}}, #4 ; CHECK: ldr q[[R0:[0-9]+]], {{\[}}[[R1:x[0-9]+]], :lo12:g0
; CHECK: ld1{{\.?s?}} { v[[R0]]{{\.?s?}} }[1], {{\[}}[[R1]]{{\]}} ;; TODO: this next line seems like a redundant no-op move?
; CHECK: ins v0.s[1], v0.s[1]
; CHECK: str d[[R0]] ; CHECK: str d[[R0]]
define void @blam() { define void @blam() {

View File

@ -1,5 +1,4 @@
; RUN: llc --combiner-alias-analysis=false < %s | FileCheck %s ; RUN: llc < %s | FileCheck %s
; RUN: llc --combiner-alias-analysis=true < %s | FileCheck %s
; This test checks that we do not merge stores together which have ; This test checks that we do not merge stores together which have
; dependencies through their non-chain operands (e.g. one store is the ; dependencies through their non-chain operands (e.g. one store is the

View File

@ -1,13 +1,21 @@
; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s ; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECK
; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECKNOP
; CHECK: test01.cl:2:{{[0-9]+}} ; This test expects that we have one instance for each line in some order with "s_nop 0" instances after each.
; CHECK-NEXT: s_nop 0
; CHECK: test01.cl:3:{{[0-9]+}} ; Check that each line appears at least once
; CHECK-NEXT: s_nop 0 ; CHECK-DAG: test01.cl:2:3
; CHECK-DAG: test01.cl:3:3
; CHECK-DAG: test01.cl:4:3
; CHECK: test01.cl:4:{{[0-9]+}}
; CHECK-NEXT: s_nop 0 ; Check that each of each of the lines consists of the line output, followed by "s_nop 0"
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0
; CHECK: test01.cl:5:{{[0-9]+}} ; CHECK: test01.cl:5:{{[0-9]+}}
; CHECK-NEXT: s_nop 0 ; CHECK-NEXT: s_nop 0
@ -21,7 +29,7 @@ entry:
call void @llvm.dbg.declare(metadata i32 addrspace(1)** %A.addr, metadata !17, metadata !18), !dbg !19 call void @llvm.dbg.declare(metadata i32 addrspace(1)** %A.addr, metadata !17, metadata !18), !dbg !19
%0 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !20 %0 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !20
%arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %0, i32 0, !dbg !20 %arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %0, i32 0, !dbg !20
store i32 1, i32 addrspace(1)* %arrayidx, align 4, !dbg !21 store i32 1, i32 addrspace(1)* %arrayidx, align 4, !dbg !20
%1 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !22 %1 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !22
%arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %1, i32 1, !dbg !22 %arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %1, i32 1, !dbg !22
store i32 2, i32 addrspace(1)* %arrayidx1, align 4, !dbg !23 store i32 2, i32 addrspace(1)* %arrayidx1, align 4, !dbg !23

View File

@ -253,9 +253,8 @@ define void @dynamic_insertelement_v2i8(<2 x i8> addrspace(1)* %out, <2 x i8> %a
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2 ; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:1 ; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:1
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}{{$}} ; GCN-DAG: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}
; GCN: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}
; GCN: buffer_load_ubyte ; GCN: buffer_load_ubyte
; GCN: buffer_load_ubyte ; GCN: buffer_load_ubyte

View File

@ -1,8 +1,5 @@
; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-NOAA %s ; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-NOAA %s ; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -verify-machineinstrs -combiner-alias-analysis -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -combiner-alias-analysis -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; This test is mostly to test DAG store merging, so disable the vectorizer. ; This test is mostly to test DAG store merging, so disable the vectorizer.
; Run with devices with different unaligned load restrictions. ; Run with devices with different unaligned load restrictions.
@ -474,17 +471,9 @@ define void @merge_global_store_4_adjacent_loads_i8_natural_align(i8 addrspace(1
ret void ret void
} }
; This works once AA is enabled on the subtarget
; GCN-LABEL: {{^}}merge_global_store_4_vector_elts_loads_v4i32: ; GCN-LABEL: {{^}}merge_global_store_4_vector_elts_loads_v4i32:
; GCN: buffer_load_dwordx4 [[LOAD:v\[[0-9]+:[0-9]+\]]] ; GCN: buffer_load_dwordx4 [[LOAD:v\[[0-9]+:[0-9]+\]]]
; GCN: buffer_store_dwordx4 [[LOAD]]
; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v
; GCN-AA: buffer_store_dwordx4 [[LOAD]]
; GCN: s_endpgm ; GCN: s_endpgm
define void @merge_global_store_4_vector_elts_loads_v4i32(i32 addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 { define void @merge_global_store_4_vector_elts_loads_v4i32(i32 addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i32 1 %out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i32 1

View File

@ -32,10 +32,10 @@
; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}} ; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}
; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}} ; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}} ; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}} ; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}} ; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}} ; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}
define void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 { define void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
entry: entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x() %tid = call i32 @llvm.amdgcn.workitem.id.x()
@ -130,8 +130,8 @@ entry:
; HSA-ELT8: private_element_size = 2 ; HSA-ELT8: private_element_size = 2
; HSA-ELT4: private_element_size = 1 ; HSA-ELT4: private_element_size = 1
; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9{{$}} ; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off|v[0-9]}}, s[0:3], s9{{$}}
; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:8 ; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off|v[0-9]}}, s[0:3], s9 offset:8
; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen ; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen

View File

@ -157,9 +157,8 @@ define void @reorder_global_load_local_store_global_load(i32 addrspace(1)* %out,
; FUNC-LABEL: @reorder_local_offsets ; FUNC-LABEL: @reorder_local_offsets
; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset0:100 offset1:102 ; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset0:100 offset1:102
; CI: ds_write2_b32 {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}} offset0:3 offset1:100 ; CI-DAG: ds_write2_b32 {{v[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} offset0:3 offset1:100
; CI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:12 ; CI-DAG: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408
; CI: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408
; CI: buffer_store_dword ; CI: buffer_store_dword
; CI: s_endpgm ; CI: s_endpgm
define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(3)* noalias nocapture %ptr0) #0 { define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(3)* noalias nocapture %ptr0) #0 {
@ -181,12 +180,12 @@ define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspa
} }
; FUNC-LABEL: @reorder_global_offsets ; FUNC-LABEL: @reorder_global_offsets
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400 ; CI-DAG: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408 ; CI-DAG: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400 ; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408 ; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; CI: buffer_store_dword
; CI: s_endpgm ; CI: s_endpgm
define void @reorder_global_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(1)* noalias nocapture %ptr0) #0 { define void @reorder_global_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(1)* noalias nocapture %ptr0) #0 {
%ptr1 = getelementptr inbounds i32, i32 addrspace(1)* %ptr0, i32 3 %ptr1 = getelementptr inbounds i32, i32 addrspace(1)* %ptr0, i32 3

View File

@ -12,7 +12,8 @@ define void @test_byval_8_bytes_alignment(i32 %i, ...) {
entry: entry:
; CHECK: sub sp, sp, #12 ; CHECK: sub sp, sp, #12
; CHECK: sub sp, sp, #4 ; CHECK: sub sp, sp, #4
; CHECK: stmib sp, {r1, r2, r3} ; CHECK: add r0, sp, #4
; CHECK: stm sp, {r0, r1, r2, r3}
%g = alloca i8* %g = alloca i8*
%g1 = bitcast i8** %g to i8* %g1 = bitcast i8** %g to i8*
call void @llvm.va_start(i8* %g1) call void @llvm.va_start(i8* %g1)

View File

@ -1,5 +1,4 @@
; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s -check-prefix=NO-REALIGN ; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s
; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s -check-prefix=REALIGN
; rdar://12713765 ; rdar://12713765
; When realign-stack is set to false, make sure we are not creating stack ; When realign-stack is set to false, make sure we are not creating stack
@ -8,29 +7,31 @@
define void @test1(<16 x float>* noalias sret %agg.result) nounwind ssp "no-realign-stack" { define void @test1(<16 x float>* noalias sret %agg.result) nounwind ssp "no-realign-stack" {
entry: entry:
; NO-REALIGN-LABEL: test1 ; CHECK-LABEL: test1
; NO-REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]] ; CHECK: ldr r[[R1:[0-9]+]], [pc, r1]
; NO-REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]! ; CHECK: add r[[R2:[0-9]+]], r1, #48
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32 ; CHECK: mov r[[R2:[0-9]+]], r[[R1]]
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48 ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: add r[[R1:[0-9]+]], r[[R1]], #32
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48 ; CHECK: mov r[[R1:[0-9]+]], sp
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32 ; CHECK: add r[[R2:[0-9]+]], r[[R1]], #32
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: mov r[[R3:[0-9]+]], r[[R1]] ; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]!
; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]! ; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128] ; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0:0]], #48 ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0]], #32 ; CHECK: add r[[R1:[0-9]+]], r0, #48
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]! ; CHECK: add r[[R1:[0-9]+]], r0, #32
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128] ; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r0:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r0:128]
%retval = alloca <16 x float>, align 16 %retval = alloca <16 x float>, align 16
%0 = load <16 x float>, <16 x float>* @T3_retval, align 16 %0 = load <16 x float>, <16 x float>* @T3_retval, align 16
store <16 x float> %0, <16 x float>* %retval store <16 x float> %0, <16 x float>* %retval
@ -41,32 +42,33 @@ entry:
define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp { define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {
entry: entry:
; REALIGN-LABEL: test2 ; CHECK: ldr r[[R1:[0-9]+]], [pc, r1]
; REALIGN: bfc sp, #0, #6 ; CHECK: add r[[R2:[0-9]+]], r[[R1]], #48
; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]] ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]! ; CHECK: mov r[[R2:[0-9]+]], r[[R1]]
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32 ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: add r[[R1:[0-9]+]], r[[R1]], #32
; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48 ; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128] ; CHECK: mov r[[R1:[0-9]+]], sp
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: orr r[[R2:[0-9]+]], r[[R1]], #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #48
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r0:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r0:128]
; REALIGN: orr r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48 %retval = alloca <16 x float>, align 16
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #32
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #16
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: add r[[R1:[0-9]+]], r[[R0:0]], #48
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: add r[[R1:[0-9]+]], r[[R0]], #32
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]
%retval = alloca <16 x float>, align 16
%0 = load <16 x float>, <16 x float>* @T3_retval, align 16 %0 = load <16 x float>, <16 x float>* @T3_retval, align 16
store <16 x float> %0, <16 x float>* %retval store <16 x float> %0, <16 x float>* %retval
%1 = load <16 x float>, <16 x float>* %retval %1 = load <16 x float>, <16 x float>* %retval

View File

@ -9,8 +9,6 @@ entry:
; CHECK-LABEL: t: ; CHECK-LABEL: t:
; CHECK: vpop {d8} ; CHECK: vpop {d8}
; CHECK-NOT: vpopne ; CHECK-NOT: vpopne
; CHECK: pop {r7, pc}
; CHECK: vpop {d8}
; CHECK: pop {r7, pc} ; CHECK: pop {r7, pc}
br i1 undef, label %if.else, label %if.then br i1 undef, label %if.else, label %if.then

View File

@ -3,9 +3,15 @@
define void @t1(i8* nocapture %c) nounwind optsize { define void @t1(i8* nocapture %c) nounwind optsize {
entry: entry:
; CHECK-LABEL: t1: ; CHECK-LABEL: t1:
;; FIXME: like with arm64-memset-inline.ll, learning how to merge
;; stores made this code worse, since it now uses a vector move,
;; instead of just using an strd instruction taking two registers.
; CHECK: vmov.i32 d16, #0x0
; CHECK: vst1.32 {d16}, [r0:64]!
; CHECK: movs r1, #0 ; CHECK: movs r1, #0
; CHECK: strd r1, r1, [r0] ; CHECK: str r1, [r0]
; CHECK: str r1, [r0, #8]
call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false) call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)
ret void ret void
} }

View File

@ -6,9 +6,9 @@ define void @multiple_store() {
; CHECK: movs [[VAL:r[0-9]+]], #42 ; CHECK: movs [[VAL:r[0-9]+]], #42
; CHECK: movt r[[BASE1]], #15 ; CHECK: movt r[[BASE1]], #15
; CHECK: str [[VAL]], [r[[BASE1]]] ; CHECK-DAG: str [[VAL]], [r[[BASE1]]]
; CHECK: str [[VAL]], [r[[BASE1]], #24] ; CHECK-DAG: str [[VAL]], [r[[BASE1]], #24]
; CHECK: str.w [[VAL]], [r[[BASE1]], #42] ; CHECK-DAG: str.w [[VAL]], [r[[BASE1]], #42]
; CHECK: movw r[[BASE2:[0-9]+]], #20394 ; CHECK: movw r[[BASE2:[0-9]+]], #20394
; CHECK: movt r[[BASE2]], #18 ; CHECK: movt r[[BASE2]], #18

View File

@ -13,50 +13,55 @@
; Function Attrs: nounwind uwtable ; Function Attrs: nounwind uwtable
define i32 @ebpf_filter(%struct.__sk_buff* nocapture readnone %ebpf_packet) #0 section "socket1" { define i32 @ebpf_filter(%struct.__sk_buff* nocapture readnone %ebpf_packet) #0 section "socket1" {
; CHECK: r2 = r10
; CHECK: r2 += -2
; CHECK: r1 = 0
; CHECK: *(u16 *)(r2 + 6) = r1
; CHECK: *(u16 *)(r2 + 4) = r1
; CHECK: *(u16 *)(r2 + 2) = r1
; CHECK: r2 = 6
; CHECK: *(u8 *)(r10 - 7) = r2
; CHECK: r2 = 5
; CHECK: *(u8 *)(r10 - 8) = r2
; CHECK: r2 = 7
; CHECK: *(u8 *)(r10 - 6) = r2
; CHECK: r2 = 8
; CHECK: *(u8 *)(r10 - 5) = r2
; CHECK: r2 = 9
; CHECK: *(u8 *)(r10 - 4) = r2
; CHECK: r2 = 10
; CHECK: *(u8 *)(r10 - 3) = r2
; CHECK: *(u16 *)(r10 + 24) = r1
; CHECK: *(u16 *)(r10 + 22) = r1
; CHECK: *(u16 *)(r10 + 20) = r1
; CHECK: *(u16 *)(r10 + 18) = r1
; CHECK: *(u16 *)(r10 + 16) = r1
; CHECK: *(u16 *)(r10 + 14) = r1
; CHECK: *(u16 *)(r10 + 12) = r1
; CHECK: *(u16 *)(r10 + 10) = r1
; CHECK: *(u16 *)(r10 + 8) = r1
; CHECK: *(u16 *)(r10 + 6) = r1
; CHECK: *(u16 *)(r10 - 2) = r1
; CHECK: *(u16 *)(r10 + 26) = r1
; CHECK: r2 = r10
; CHECK: r2 += -8
; CHECK: r1 = <MCOperand Expr:(routing)>ll
; CHECK: call bpf_map_lookup_elem
; CHECK: exit
%key = alloca %struct.routing_key_2, align 1 %key = alloca %struct.routing_key_2, align 1
%1 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 0 %1 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 0
; CHECK: r1 = 5
; CHECK: *(u8 *)(r10 - 8) = r1
store i8 5, i8* %1, align 1 store i8 5, i8* %1, align 1
%2 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 1 %2 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 1
; CHECK: r1 = 6
; CHECK: *(u8 *)(r10 - 7) = r1
store i8 6, i8* %2, align 1 store i8 6, i8* %2, align 1
%3 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 2 %3 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 2
; CHECK: r1 = 7
; CHECK: *(u8 *)(r10 - 6) = r1
store i8 7, i8* %3, align 1 store i8 7, i8* %3, align 1
%4 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 3 %4 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 3
; CHECK: r1 = 8
; CHECK: *(u8 *)(r10 - 5) = r1
store i8 8, i8* %4, align 1 store i8 8, i8* %4, align 1
%5 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 4 %5 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 4
; CHECK: r1 = 9
; CHECK: *(u8 *)(r10 - 4) = r1
store i8 9, i8* %5, align 1 store i8 9, i8* %5, align 1
%6 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 5 %6 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 0, i32 0, i64 5
; CHECK: r1 = 10
; CHECK: *(u8 *)(r10 - 3) = r1
store i8 10, i8* %6, align 1 store i8 10, i8* %6, align 1
%7 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 1, i32 0, i64 0 %7 = getelementptr inbounds %struct.routing_key_2, %struct.routing_key_2* %key, i64 1, i32 0, i64 0
; CHECK: r1 = r10
; CHECK: r1 += -2
; CHECK: r2 = 0
; CHECK: *(u16 *)(r1 + 6) = r2
; CHECK: *(u16 *)(r1 + 4) = r2
; CHECK: *(u16 *)(r1 + 2) = r2
; CHECK: *(u16 *)(r10 + 24) = r2
; CHECK: *(u16 *)(r10 + 22) = r2
; CHECK: *(u16 *)(r10 + 20) = r2
; CHECK: *(u16 *)(r10 + 18) = r2
; CHECK: *(u16 *)(r10 + 16) = r2
; CHECK: *(u16 *)(r10 + 14) = r2
; CHECK: *(u16 *)(r10 + 12) = r2
; CHECK: *(u16 *)(r10 + 10) = r2
; CHECK: *(u16 *)(r10 + 8) = r2
; CHECK: *(u16 *)(r10 + 6) = r2
; CHECK: *(u16 *)(r10 - 2) = r2
; CHECK: *(u16 *)(r10 + 26) = r2
call void @llvm.memset.p0i8.i64(i8* %7, i8 0, i64 30, i32 1, i1 false) call void @llvm.memset.p0i8.i64(i8* %7, i8 0, i64 30, i32 1, i1 false)
%8 = call i32 (%struct.bpf_map_def*, %struct.routing_key_2*, ...) bitcast (i32 (...)* @bpf_map_lookup_elem to i32 (%struct.bpf_map_def*, %struct.routing_key_2*, ...)*)(%struct.bpf_map_def* nonnull @routing, %struct.routing_key_2* nonnull %key) #3 %8 = call i32 (%struct.bpf_map_def*, %struct.routing_key_2*, ...) bitcast (i32 (...)* @bpf_map_lookup_elem to i32 (%struct.bpf_map_def*, %struct.routing_key_2*, ...)*)(%struct.bpf_map_def* nonnull @routing, %struct.routing_key_2* nonnull %key) #3
ret i32 undef ret i32 undef

View File

@ -1,4 +1,4 @@
; RUN: llc -march=msp430 -combiner-alias-analysis < %s | FileCheck %s ; RUN: llc -march=msp430 < %s | FileCheck %s
target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8" target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8"
target triple = "msp430-generic-generic" target triple = "msp430-generic-generic"
@foo = common global i16 0, align 2 @foo = common global i16 0, align 2

View File

@ -63,39 +63,39 @@ entry:
; NEW-DAG: sd $5, 16([[R2]]) ; NEW-DAG: sd $5, 16([[R2]])
; O32 has run out of argument registers and starts using the stack ; O32 has run out of argument registers and starts using the stack
; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 24($sp) ; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 16($sp)
; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 28($sp) ; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 20($sp)
; O32-DAG: sw [[R3]], 24([[R2]]) ; O32-DAG: sw [[R3]], 24([[R2]])
; O32-DAG: sw [[R4]], 28([[R2]]) ; O32-DAG: sw [[R4]], 28([[R2]])
; NEW-DAG: sd $6, 24([[R2]]) ; NEW-DAG: sd $6, 24([[R2]])
; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 32($sp) ; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 24($sp)
; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 36($sp) ; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 28($sp)
; O32-DAG: sw [[R3]], 32([[R2]]) ; O32-DAG: sw [[R3]], 32([[R2]])
; O32-DAG: sw [[R4]], 36([[R2]]) ; O32-DAG: sw [[R4]], 36([[R2]])
; NEW-DAG: sd $7, 32([[R2]]) ; NEW-DAG: sd $7, 32([[R2]])
; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 40($sp) ; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 32($sp)
; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 44($sp) ; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 36($sp)
; O32-DAG: sw [[R3]], 40([[R2]]) ; O32-DAG: sw [[R3]], 40([[R2]])
; O32-DAG: sw [[R4]], 44([[R2]]) ; O32-DAG: sw [[R4]], 44([[R2]])
; NEW-DAG: sd $8, 40([[R2]]) ; NEW-DAG: sd $8, 40([[R2]])
; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 48($sp) ; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 40($sp)
; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 52($sp) ; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 44($sp)
; O32-DAG: sw [[R3]], 48([[R2]]) ; O32-DAG: sw [[R3]], 48([[R2]])
; O32-DAG: sw [[R4]], 52([[R2]]) ; O32-DAG: sw [[R4]], 52([[R2]])
; NEW-DAG: sd $9, 48([[R2]]) ; NEW-DAG: sd $9, 48([[R2]])
; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 56($sp) ; O32-DAG: lw [[R3:\$([0-9]+|gp)]], 48($sp)
; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 60($sp) ; O32-DAG: lw [[R4:\$([0-9]+|gp)]], 52($sp)
; O32-DAG: sw [[R3]], 56([[R2]]) ; O32-DAG: sw [[R3]], 56([[R2]])
; O32-DAG: sw [[R4]], 60([[R2]]) ; O32-DAG: sw [[R4]], 60([[R2]])
; NEW-DAG: sd $10, 56([[R2]]) ; NEW-DAG: sd $10, 56([[R2]])
; N32/N64 have run out of registers and starts using the stack too ; N32/N64 have run out of registers and starts using the stack too
; O32-DAG: lw [[R3:\$[0-9]+]], 64($sp) ; O32-DAG: lw [[R3:\$[0-9]+]], 56($sp)
; O32-DAG: lw [[R4:\$[0-9]+]], 68($sp) ; O32-DAG: lw [[R4:\$[0-9]+]], 60($sp)
; O32-DAG: sw [[R3]], 64([[R2]]) ; O32-DAG: sw [[R3]], 64([[R2]])
; O32-DAG: sw [[R4]], 68([[R2]]) ; O32-DAG: sw [[R4]], 68([[R2]])
; NEW-DAG: ld [[R3:\$[0-9]+]], 0($sp) ; NEW-DAG: ld [[R3:\$[0-9]+]], 0($sp)

View File

@ -315,12 +315,11 @@ entry:
; Big-endian mode for N32/N64 must add an additional 4 to the offset due to byte ; Big-endian mode for N32/N64 must add an additional 4 to the offset due to byte
; order. ; order.
; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA_TMP2]])
; O32-DAG: sw [[ARG1]], 8([[GV]]) ; O32-DAG: sw [[ARG1]], 8([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4 ; O32-DAG: sw [[VA3]], 0([[SP]])
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG1]], 12([[GV]]) ; O32-DAG: sw [[ARG1]], 12([[GV]])
; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
@ -349,10 +348,9 @@ entry:
; Load the second argument from the variable portion and copy it to the global. ; Load the second argument from the variable portion and copy it to the global.
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG2]], 16([[GV]]) ; O32-DAG: sw [[ARG2]], 16([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4 ; O32-DAG: sw [[VA3]], 0([[SP]])
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG2]], 20([[GV]]) ; O32-DAG: sw [[ARG2]], 20([[GV]])
; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]]) ; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]])
@ -678,12 +676,11 @@ entry:
; Big-endian mode for N32/N64 must add an additional 4 to the offset due to byte ; Big-endian mode for N32/N64 must add an additional 4 to the offset due to byte
; order. ; order.
; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA_TMP2]])
; O32-DAG: sw [[ARG1]], 8([[GV]]) ; O32-DAG: sw [[ARG1]], 8([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4 ; O32-DAG: sw [[VA3]], 0([[SP]])
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG1]], 12([[GV]]) ; O32-DAG: sw [[ARG1]], 12([[GV]])
; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
@ -712,10 +709,9 @@ entry:
; Load the second argument from the variable portion and copy it to the global. ; Load the second argument from the variable portion and copy it to the global.
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG2]], 16([[GV]]) ; O32-DAG: sw [[ARG2]], 16([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: sw [[VA2]], 0([[SP]])
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: sw [[ARG2]], 20([[GV]]) ; O32-DAG: sw [[ARG2]], 20([[GV]])
; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]]) ; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]])
@ -1040,10 +1036,9 @@ entry:
; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; O32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG1]], 8([[GV]]) ; O32-DAG: sw [[ARG1]], 8([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4 ; O32-DAG: sw [[VA3]], 0([[SP]])
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: lw [[ARG1:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: lw [[ARG1:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG1]], 12([[GV]]) ; O32-DAG: sw [[ARG1]], 12([[GV]])
; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords) ; N32-DAG: addiu [[GV:\$[0-9]+]], ${{[0-9]+}}, %lo(dwords)
@ -1072,10 +1067,9 @@ entry:
; Load the second argument from the variable portion and copy it to the global. ; Load the second argument from the variable portion and copy it to the global.
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG2]], 16([[GV]]) ; O32-DAG: sw [[ARG2]], 16([[GV]])
; O32-DAG: lw [[VA:\$[0-9]+]], 0([[SP]]) ; O32-DAG: addiu [[VA3:\$[0-9]+]], [[VA2]], 4
; O32-DAG: addiu [[VA2:\$[0-9]+]], [[VA]], 4 ; O32-DAG: sw [[VA3]], 0([[SP]])
; O32-DAG: sw [[VA2]], 0([[SP]]) ; O32-DAG: lw [[ARG2:\$[0-9]+]], 4([[VA_TMP2]])
; O32-DAG: lw [[ARG2:\$[0-9]+]], 0([[VA]])
; O32-DAG: sw [[ARG2]], 20([[GV]]) ; O32-DAG: sw [[ARG2]], 20([[GV]])
; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]]) ; NEW-DAG: ld [[ARG2:\$[0-9]+]], 0([[VA2]])

View File

@ -132,20 +132,19 @@ entry:
define internal fastcc void @callee0(i32 %a0, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8, i32 %a9, i32 %a10, i32 %a11, i32 %a12, i32 %a13, i32 %a14, i32 %a15, i32 %a16) nounwind noinline { define internal fastcc void @callee0(i32 %a0, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8, i32 %a9, i32 %a10, i32 %a11, i32 %a12, i32 %a13, i32 %a14, i32 %a15, i32 %a16) nounwind noinline {
entry: entry:
; CHECK: callee0 ; CHECK: callee0
; CHECK: sw $4 ; CHECK-DAG: sw $4
; CHECK: sw $5 ; CHECK-DAG: sw $5
; CHECK: sw $6 ; CHECK-DAG: sw $7
; CHECK: sw $7 ; CHECK-DAG: sw $8
; CHECK: sw $8 ; CHECK-DAG: sw $9
; CHECK: sw $9 ; CHECK-DAG: sw $10
; CHECK: sw $10 ; CHECK-DAG: sw $11
; CHECK: sw $11 ; CHECK-DAG: sw $12
; CHECK: sw $12 ; CHECK-DAG: sw $13
; CHECK: sw $13 ; CHECK-DAG: sw $14
; CHECK: sw $14 ; CHECK-DAG: sw $15
; CHECK: sw $15 ; CHECK-DAG: sw $24
; CHECK: sw $24 ; CHECK-DAG: sw $3
; CHECK: sw $3
; t6, t7 and t8 are reserved in NaCl and cannot be used for fastcc. ; t6, t7 and t8 are reserved in NaCl and cannot be used for fastcc.
; CHECK-NACL-NOT: sw $14 ; CHECK-NACL-NOT: sw $14
@ -223,27 +222,27 @@ entry:
define internal fastcc void @callee1(float %a0, float %a1, float %a2, float %a3, float %a4, float %a5, float %a6, float %a7, float %a8, float %a9, float %a10, float %a11, float %a12, float %a13, float %a14, float %a15, float %a16, float %a17, float %a18, float %a19, float %a20) nounwind noinline { define internal fastcc void @callee1(float %a0, float %a1, float %a2, float %a3, float %a4, float %a5, float %a6, float %a7, float %a8, float %a9, float %a10, float %a11, float %a12, float %a13, float %a14, float %a15, float %a16, float %a17, float %a18, float %a19, float %a20) nounwind noinline {
entry: entry:
; CHECK: callee1 ; CHECK-LABEL: callee1:
; CHECK: swc1 $f0 ; CHECK-DAG: swc1 $f0
; CHECK: swc1 $f1 ; CHECK-DAG: swc1 $f1
; CHECK: swc1 $f2 ; CHECK-DAG: swc1 $f2
; CHECK: swc1 $f3 ; CHECK-DAG: swc1 $f3
; CHECK: swc1 $f4 ; CHECK-DAG: swc1 $f4
; CHECK: swc1 $f5 ; CHECK-DAG: swc1 $f5
; CHECK: swc1 $f6 ; CHECK-DAG: swc1 $f6
; CHECK: swc1 $f7 ; CHECK-DAG: swc1 $f7
; CHECK: swc1 $f8 ; CHECK-DAG: swc1 $f8
; CHECK: swc1 $f9 ; CHECK-DAG: swc1 $f9
; CHECK: swc1 $f10 ; CHECK-DAG: swc1 $f10
; CHECK: swc1 $f11 ; CHECK-DAG: swc1 $f11
; CHECK: swc1 $f12 ; CHECK-DAG: swc1 $f12
; CHECK: swc1 $f13 ; CHECK-DAG: swc1 $f13
; CHECK: swc1 $f14 ; CHECK-DAG: swc1 $f14
; CHECK: swc1 $f15 ; CHECK-DAG: swc1 $f15
; CHECK: swc1 $f16 ; CHECK-DAG: swc1 $f16
; CHECK: swc1 $f17 ; CHECK-DAG: swc1 $f17
; CHECK: swc1 $f18 ; CHECK-DAG: swc1 $f18
; CHECK: swc1 $f19 ; CHECK-DAG: swc1 $f19
store float %a0, float* @gf0, align 4 store float %a0, float* @gf0, align 4
store float %a1, float* @gf1, align 4 store float %a1, float* @gf1, align 4
@ -316,8 +315,6 @@ entry:
; NOODDSPREG-LABEL: callee2: ; NOODDSPREG-LABEL: callee2:
; NOODDSPREG: addiu $sp, $sp, -[[OFFSET:[0-9]+]]
; Check that first 10 arguments are received in even float registers ; Check that first 10 arguments are received in even float registers
; f0, f2, ... , f18. Check that 11th argument is received on stack. ; f0, f2, ... , f18. Check that 11th argument is received on stack.
@ -333,7 +330,7 @@ entry:
; NOODDSPREG-DAG: swc1 $f16, 32($[[R0]]) ; NOODDSPREG-DAG: swc1 $f16, 32($[[R0]])
; NOODDSPREG-DAG: swc1 $f18, 36($[[R0]]) ; NOODDSPREG-DAG: swc1 $f18, 36($[[R0]])
; NOODDSPREG-DAG: lwc1 $[[F0:f[0-9]*[02468]]], [[OFFSET]]($sp) ; NOODDSPREG-DAG: lwc1 $[[F0:f[0-9]*[02468]]], 0($sp)
; NOODDSPREG-DAG: swc1 $[[F0]], 40($[[R0]]) ; NOODDSPREG-DAG: swc1 $[[F0]], 40($[[R0]])
store float %a0, float* getelementptr ([11 x float], [11 x float]* @fa, i32 0, i32 0), align 4 store float %a0, float* getelementptr ([11 x float], [11 x float]* @fa, i32 0, i32 0), align 4
@ -397,7 +394,6 @@ entry:
; FP64-NOODDSPREG-LABEL: callee3: ; FP64-NOODDSPREG-LABEL: callee3:
; FP64-NOODDSPREG: addiu $sp, $sp, -[[OFFSET:[0-9]+]]
; Check that first 10 arguments are received in even float registers ; Check that first 10 arguments are received in even float registers
; f0, f2, ... , f18. Check that 11th argument is received on stack. ; f0, f2, ... , f18. Check that 11th argument is received on stack.
@ -414,7 +410,7 @@ entry:
; FP64-NOODDSPREG-DAG: sdc1 $f16, 64($[[R0]]) ; FP64-NOODDSPREG-DAG: sdc1 $f16, 64($[[R0]])
; FP64-NOODDSPREG-DAG: sdc1 $f18, 72($[[R0]]) ; FP64-NOODDSPREG-DAG: sdc1 $f18, 72($[[R0]])
; FP64-NOODDSPREG-DAG: ldc1 $[[F0:f[0-9]*[02468]]], [[OFFSET]]($sp) ; FP64-NOODDSPREG-DAG: ldc1 $[[F0:f[0-9]*[02468]]], 0($sp)
; FP64-NOODDSPREG-DAG: sdc1 $[[F0]], 80($[[R0]]) ; FP64-NOODDSPREG-DAG: sdc1 $[[F0]], 80($[[R0]])
store double %a0, double* getelementptr ([11 x double], [11 x double]* @da, i32 0, i32 0), align 8 store double %a0, double* getelementptr ([11 x double], [11 x double]* @da, i32 0, i32 0), align 8

View File

@ -250,12 +250,18 @@ entry:
; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s0)( ; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s0)(
; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s0)( ; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s0)(
; FIXME: We should be able to do better than this on MIPS32r6/MIPS64r6 since ; MIPS32-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]])
; we have unaligned halfword load/store available ; MIPS32-DAG: sb $[[R1]], 2($[[PTR]])
; ALL-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS32-DAG: lbu $[[R2:[0-9]+]], 1($[[PTR]])
; ALL-DAG: sb $[[R1]], 2($[[PTR]]) ; MIPS32-DAG: sb $[[R2]], 3($[[PTR]])
; ALL-DAG: lbu $[[R1:[0-9]+]], 1($[[PTR]])
; ALL-DAG: sb $[[R1]], 3($[[PTR]]) ; MIPS32R6: lhu $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS32R6: sh $[[R1]], 2($[[PTR]])
; MIPS64-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-DAG: sb $[[R1]], 2($[[PTR]])
; MIPS64-DAG: lbu $[[R2:[0-9]+]], 1($[[PTR]])
; MIPS64-DAG: sb $[[R2]], 3($[[PTR]])
%0 = load %struct.S0, %struct.S0* getelementptr inbounds (%struct.S0, %struct.S0* @struct_s0, i32 0), align 1 %0 = load %struct.S0, %struct.S0* getelementptr inbounds (%struct.S0, %struct.S0* @struct_s0, i32 0), align 1
store %struct.S0 %0, %struct.S0* getelementptr inbounds (%struct.S0, %struct.S0* @struct_s0, i32 1), align 1 store %struct.S0 %0, %struct.S0* getelementptr inbounds (%struct.S0, %struct.S0* @struct_s0, i32 1), align 1
@ -268,37 +274,54 @@ entry:
; MIPS32-EL: lw $[[PTR:[0-9]+]], %got(struct_s1)( ; MIPS32-EL: lw $[[PTR:[0-9]+]], %got(struct_s1)(
; MIPS32-EB: lw $[[PTR:[0-9]+]], %got(struct_s1)( ; MIPS32-EB: lw $[[PTR:[0-9]+]], %got(struct_s1)(
; MIPS32-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS32-EL-DAG: lwl $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS32-DAG: sb $[[R1]], 4($[[PTR]]) ; MIPS32-EL-DAG: lwr $[[R1]], 0($[[PTR]])
; MIPS32-DAG: lbu $[[R1:[0-9]+]], 1($[[PTR]]) ; MIPS32-EL-DAG: swl $[[R1]], 7($[[PTR]])
; MIPS32-DAG: sb $[[R1]], 5($[[PTR]]) ; MIPS32-EL-DAG: swr $[[R1]], 4($[[PTR]])
; MIPS32-DAG: lbu $[[R1:[0-9]+]], 2($[[PTR]]) ; MIPS32-EB-DAG: lwl $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS32-DAG: sb $[[R1]], 6($[[PTR]]) ; MIPS32-EB-DAG: lwr $[[R1]], 3($[[PTR]])
; MIPS32-DAG: lbu $[[R1:[0-9]+]], 3($[[PTR]]) ; MIPS32-EB-DAG: swl $[[R1]], 4($[[PTR]])
; MIPS32-DAG: sb $[[R1]], 7($[[PTR]]) ; MIPS32-EB-DAG: swr $[[R1]], 7($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: sb $[[R1]], 4($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 1($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: sb $[[R1]], 5($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 2($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: sb $[[R1]], 6($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS32-NOLEFTRIGHT-DAG: sb $[[R1]], 7($[[PTR]])
; MIPS32R6: lw $[[PTR:[0-9]+]], %got(struct_s1)( ; MIPS32R6: lw $[[PTR:[0-9]+]], %got(struct_s1)(
; MIPS32R6-DAG: lhu $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS32R6-DAG: lw $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS32R6-DAG: sh $[[R1]], 4($[[PTR]]) ; MIPS32R6-DAG: sw $[[R1]], 4($[[PTR]])
; MIPS32R6-DAG: lhu $[[R1:[0-9]+]], 2($[[PTR]])
; MIPS32R6-DAG: sh $[[R1]], 6($[[PTR]])
; MIPS64-EL: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)( ; MIPS64-EL: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)(
; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)( ; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)(
; MIPS64-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-DAG: sb $[[R1]], 4($[[PTR]]) ; MIPS64-EL-DAG: lwl $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS64-DAG: lbu $[[R1:[0-9]+]], 1($[[PTR]]) ; MIPS64-EL-DAG: lwr $[[R1]], 0($[[PTR]])
; MIPS64-DAG: sb $[[R1]], 5($[[PTR]]) ; MIPS64-EL-DAG: swl $[[R1]], 7($[[PTR]])
; MIPS64-DAG: lbu $[[R1:[0-9]+]], 2($[[PTR]]) ; MIPS64-EL-DAG: swr $[[R1]], 4($[[PTR]])
; MIPS64-DAG: sb $[[R1]], 6($[[PTR]])
; MIPS64-DAG: lbu $[[R1:[0-9]+]], 3($[[PTR]]) ; MIPS64-EB-DAG: lwl $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-DAG: sb $[[R1]], 7($[[PTR]]) ; MIPS64-EB-DAG: lwr $[[R1]], 3($[[PTR]])
; MIPS64-EB-DAG: swl $[[R1]], 4($[[PTR]])
; MIPS64-EB-DAG: swr $[[R1]], 7($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: sb $[[R1]], 4($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 1($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: sb $[[R1]], 5($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 2($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: sb $[[R1]], 6($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: lbu $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS64-NOLEFTRIGHT-DAG: sb $[[R1]], 7($[[PTR]])
; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)( ; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s1)(
; MIPS64R6-DAG: lhu $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS64R6-DAG: lw $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64R6-DAG: sh $[[R1]], 4($[[PTR]]) ; MIPS64R6-DAG: sw $[[R1]], 4($[[PTR]])
; MIPS64R6-DAG: lhu $[[R1:[0-9]+]], 2($[[PTR]])
; MIPS64R6-DAG: sh $[[R1]], 6($[[PTR]])
%0 = load %struct.S1, %struct.S1* getelementptr inbounds (%struct.S1, %struct.S1* @struct_s1, i32 0), align 1 %0 = load %struct.S1, %struct.S1* getelementptr inbounds (%struct.S1, %struct.S1* @struct_s1, i32 0), align 1
store %struct.S1 %0, %struct.S1* getelementptr inbounds (%struct.S1, %struct.S1* @struct_s1, i32 1), align 1 store %struct.S1 %0, %struct.S1* getelementptr inbounds (%struct.S1, %struct.S1* @struct_s1, i32 1), align 1
@ -336,30 +359,21 @@ entry:
; MIPS32R6-DAG: sw $[[R1]], 12($[[PTR]]) ; MIPS32R6-DAG: sw $[[R1]], 12($[[PTR]])
; MIPS64-EL: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)( ; MIPS64-EL: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)(
; MIPS64-EL-DAG: lwl $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS64-EL-DAG: lwr $[[R1]], 0($[[PTR]]) ; MIPS64-EL-DAG: ldl $[[R1:[0-9]+]], 7($[[PTR]])
; MIPS64-EL-DAG: swl $[[R1]], 11($[[PTR]]) ; MIPS64-EL-DAG: ldr $[[R1]], 0($[[PTR]])
; MIPS64-EL-DAG: swr $[[R1]], 8($[[PTR]]) ; MIPS64-EL-DAG: sdl $[[R1]], 15($[[PTR]])
; MIPS64-EL-DAG: lwl $[[R1:[0-9]+]], 7($[[PTR]]) ; MIPS64-EL-DAG: sdr $[[R1]], 8($[[PTR]])
; MIPS64-EL-DAG: lwr $[[R1]], 4($[[PTR]])
; MIPS64-EL-DAG: swl $[[R1]], 15($[[PTR]])
; MIPS64-EL-DAG: swr $[[R1]], 12($[[PTR]])
; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)( ; MIPS64-EB: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)(
; MIPS64-EB-DAG: lwl $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS64-EB-DAG: ldl $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-EB-DAG: lwr $[[R1]], 3($[[PTR]]) ; MIPS64-EB-DAG: ldr $[[R1]], 7($[[PTR]])
; MIPS64-EB-DAG: swl $[[R1]], 8($[[PTR]]) ; MIPS64-EB-DAG: sdl $[[R1]], 8($[[PTR]])
; MIPS64-EB-DAG: swr $[[R1]], 11($[[PTR]]) ; MIPS64-EB-DAG: sdr $[[R1]], 15($[[PTR]])
; MIPS64-EB-DAG: lwl $[[R1:[0-9]+]], 4($[[PTR]])
; MIPS64-EB-DAG: lwr $[[R1]], 7($[[PTR]])
; MIPS64-EB-DAG: swl $[[R1]], 12($[[PTR]])
; MIPS64-EB-DAG: swr $[[R1]], 15($[[PTR]])
; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)( ; MIPS64R6: ld $[[PTR:[0-9]+]], %got_disp(struct_s2)(
; MIPS64R6-DAG: lw $[[R1:[0-9]+]], 0($[[PTR]]) ; MIPS64R6-DAG: ld $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64R6-DAG: sw $[[R1]], 8($[[PTR]]) ; MIPS64R6-DAG: sd $[[R1]], 8($[[PTR]])
; MIPS64R6-DAG: lw $[[R1:[0-9]+]], 4($[[PTR]])
; MIPS64R6-DAG: sw $[[R1]], 12($[[PTR]])
%0 = load %struct.S2, %struct.S2* getelementptr inbounds (%struct.S2, %struct.S2* @struct_s2, i32 0), align 1 %0 = load %struct.S2, %struct.S2* getelementptr inbounds (%struct.S2, %struct.S2* @struct_s2, i32 0), align 1
store %struct.S2 %0, %struct.S2* getelementptr inbounds (%struct.S2, %struct.S2* @struct_s2, i32 1), align 1 store %struct.S2 %0, %struct.S2* getelementptr inbounds (%struct.S2, %struct.S2* @struct_s2, i32 1), align 1
@ -416,17 +430,17 @@ entry:
; MIPS64-EL-DAG: lwl $[[R1:[0-9]+]], 3($[[PTR]]) ; MIPS64-EL-DAG: lwl $[[R1:[0-9]+]], 3($[[PTR]])
; MIPS64-EL-DAG: lwr $[[R1]], 0($[[PTR]]) ; MIPS64-EL-DAG: lwr $[[R1]], 0($[[PTR]])
; MIPS64-EB: ld $[[SPTR:[0-9]+]], %got_disp(arr)( ; MIPS64-EB: ld $[[SPTR:[0-9]+]], %got_disp(arr)(
; MIPS64-EB-DAG: lwl $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-EB-DAG: lwr $[[R1]], 3($[[PTR]])
; MIPS64-EB-DAG: dsll $[[R1]], $[[R1]], 32
; MIPS64-EB-DAG: lbu $[[R2:[0-9]+]], 5($[[PTR]]) ; MIPS64-EB-DAG: lbu $[[R2:[0-9]+]], 5($[[PTR]])
; MIPS64-EB-DAG: lbu $[[R3:[0-9]+]], 4($[[PTR]]) ; MIPS64-EB-DAG: lbu $[[R3:[0-9]+]], 4($[[PTR]])
; MIPS64-EB-DAG: dsll $[[T0:[0-9]+]], $[[R3]], 8 ; MIPS64-EB-DAG: dsll $[[T0:[0-9]+]], $[[R3]], 8
; MIPS64-EB-DAG: or $[[T1:[0-9]+]], $[[T0]], $[[R2]] ; MIPS64-EB-DAG: or $[[T1:[0-9]+]], $[[T0]], $[[R2]]
; MIPS64-EB-DAG: dsll $[[T1]], $[[T1]], 16
; MIPS64-EB-DAG: or $[[T3:[0-9]+]], $[[R1]], $[[T1]]
; MIPS64-EB-DAG: lbu $[[R4:[0-9]+]], 6($[[PTR]]) ; MIPS64-EB-DAG: lbu $[[R4:[0-9]+]], 6($[[PTR]])
; MIPS64-EB-DAG: dsll $[[T1]], $[[T1]], 16
; MIPS64-EB-DAG: lwl $[[R1:[0-9]+]], 0($[[PTR]])
; MIPS64-EB-DAG: lwr $[[R1]], 3($[[PTR]])
; MIPS64-EB-DAG: dsll $[[R5:[0-9]+]], $[[R1]], 32
; MIPS64-EB-DAG: or $[[T3:[0-9]+]], $[[R5]], $[[T1]]
; MIPS64-EB-DAG: dsll $[[T4:[0-9]+]], $[[R4]], 8 ; MIPS64-EB-DAG: dsll $[[T4:[0-9]+]], $[[R4]], 8
; MIPS64-EB-DAG: or $4, $[[T3]], $[[T4]] ; MIPS64-EB-DAG: or $4, $[[T3]], $[[T4]]

View File

@ -13,6 +13,6 @@ entry:
ret i32 0 ret i32 0
} }
; CHECK: li16 ${{[2-7]|16|17}}, 1
; CHECK: addiu ${{[0-9]+}}, $zero, 2148 ; CHECK: addiu ${{[0-9]+}}, $zero, 2148
; CHECK: li16 ${{[2-7]|16|17}}, 1
; CHECK: ori ${{[0-9]+}}, $zero, 33332 ; CHECK: ori ${{[0-9]+}}, $zero, 33332

View File

@ -573,10 +573,10 @@ entry:
; ALL-LABEL: store_LD_LD: ; ALL-LABEL: store_LD_LD:
; ALL: ld $[[R0:[0-9]+]], %got_disp(gld1) ; ALL: ld $[[R0:[0-9]+]], %got_disp(gld1)
; ALL: ld $[[R1:[0-9]+]], 0($[[R0]])
; ALL: ld $[[R2:[0-9]+]], 8($[[R0]]) ; ALL: ld $[[R2:[0-9]+]], 8($[[R0]])
; ALL: ld $[[R3:[0-9]+]], %got_disp(gld0) ; ALL: ld $[[R3:[0-9]+]], %got_disp(gld0)
; ALL: sd $[[R2]], 8($[[R3]]) ; ALL: sd $[[R2]], 8($[[R3]])
; ALL: ld $[[R1:[0-9]+]], 0($[[R0]])
; ALL: sd $[[R1]], 0($[[R3]]) ; ALL: sd $[[R1]], 0($[[R3]])
define void @store_LD_LD() { define void @store_LD_LD() {

View File

@ -130,12 +130,12 @@
; MM-MNO-PIC: addiu $[[R1:[0-9]+]], $[[R0]], %lo(_gp_disp) ; MM-MNO-PIC: addiu $[[R1:[0-9]+]], $[[R0]], %lo(_gp_disp)
; MM-MNO-PIC: addu $[[R2:[0-9]+]], $[[R1]], $25 ; MM-MNO-PIC: addu $[[R2:[0-9]+]], $[[R1]], $25
; MM-MNO-PIC: lw $[[R3:[0-9]+]], %got(g0)($[[R2]]) ; MM-MNO-PIC: lw $[[R3:[0-9]+]], %got(g0)($[[R2]])
; MM-MNO-PIC: lw16 $[[R4:[0-9]+]], 0($[[R3]]) ; MM-MNO-PIC-DAG: lw16 $[[R4:[0-9]+]], 0($[[R3]])
; MM-MNO-PIC: lw16 $[[R5:[0-9]+]], 4($[[R3]]) ; MM-MNO-PIC-DAG: lw16 $[[R5:[0-9]+]], 4($[[R3]])
; MM-MNO-LE-PIC: mtc1 $[[R4]], $f0 ; MM-MNO-LE-PIC-DAG: mtc1 $[[R4]], $f0
; MM-MNO-LE-PIC: mthc1 $[[R5]], $f0 ; MM-MNO-LE-PIC-DAG: mthc1 $[[R5]], $f0
; MM-MNO-BE-PIC: mtc1 $[[R5]], $f0 ; MM-MNO-BE-PIC-DAG: mtc1 $[[R5]], $f0
; MM-MNO-BE-PIC: mthc1 $[[R4]], $f0 ; MM-MNO-BE-PIC-DAG: mthc1 $[[R4]], $f0
; MM-STATIC-PIC: lui $[[R0:[0-9]+]], %hi(g0) ; MM-STATIC-PIC: lui $[[R0:[0-9]+]], %hi(g0)
; MM-STATIC-PIC: ldc1 $f0, %lo(g0)($[[R0]]) ; MM-STATIC-PIC: ldc1 $f0, %lo(g0)($[[R0]])
@ -214,13 +214,13 @@ entry:
; MM-MNO-PIC: lui $[[R0:[0-9]+]], %hi(_gp_disp) ; MM-MNO-PIC: lui $[[R0:[0-9]+]], %hi(_gp_disp)
; MM-MNO-PIC: addiu $[[R1:[0-9]+]], $[[R0]], %lo(_gp_disp) ; MM-MNO-PIC: addiu $[[R1:[0-9]+]], $[[R0]], %lo(_gp_disp)
; MM-MNO-PIC: addu $[[R2:[0-9]+]], $[[R1]], $25 ; MM-MNO-PIC: addu $[[R2:[0-9]+]], $[[R1]], $25
; MM-MNO-LE-PIC: mfc1 $[[R3:[0-9]+]], $f12 ; MM-MNO-LE-PIC-DAG: mfc1 $[[R3:[0-9]+]], $f12
; MM-MNO-BE-PIC: mfhc1 $[[R3:[0-9]+]], $f12 ; MM-MNO-BE-PIC-DAG: mfhc1 $[[R3:[0-9]+]], $f12
; MM-MNO-PIC: lw $[[R4:[0-9]+]], %got(g0)($[[R2]]) ; MM-MNO-PIC-DAG: lw $[[R4:[0-9]+]], %got(g0)($[[R2]])
; MM-MNO-PIC: sw16 $[[R3]], 0($[[R4]]) ; MM-MNO-PIC-DAG: sw16 $[[R3]], 0($[[R4]])
; MM-MNO-LE-PIC: mfhc1 $[[R5:[0-9]+]], $f12 ; MM-MNO-LE-PIC-DAG: mfhc1 $[[R5:[0-9]+]], $f12
; MM-MNO-BE-PIC: mfc1 $[[R5:[0-9]+]], $f12 ; MM-MNO-BE-PIC-DAG: mfc1 $[[R5:[0-9]+]], $f12
; MM-MNO-PIC: sw16 $[[R5]], 4($[[R4]]) ; MM-MNO-PIC-DAG: sw16 $[[R5]], 4($[[R4]])
; MM-STATIC-PIC: lui $[[R0:[0-9]+]], %hi(g0) ; MM-STATIC-PIC: lui $[[R0:[0-9]+]], %hi(g0)
; MM-STATIC-PIC: sdc1 $f12, %lo(g0)($[[R0]]) ; MM-STATIC-PIC: sdc1 $f12, %lo(g0)($[[R0]])
@ -267,8 +267,8 @@ entry:
; MM-MNO-PIC: sll16 $[[R0:[0-9]+]], $5, 3 ; MM-MNO-PIC: sll16 $[[R0:[0-9]+]], $5, 3
; MM-MNO-PIC: addu16 $[[R1:[0-9]+]], $4, $[[R0]] ; MM-MNO-PIC: addu16 $[[R1:[0-9]+]], $4, $[[R0]]
; MM-MNO-PIC: lw16 $[[R2:[0-9]+]], 0($[[R1]]) ; MM-MNO-PIC-DAG: lw16 $[[R2:[0-9]+]], 0($[[R1]])
; MM-MNO-PIC: lw16 $[[R3:[0-9]+]], 4($[[R1]]) ; MM-MNO-PIC-DAG: lw16 $[[R3:[0-9]+]], 4($[[R1]])
; MM-MNO-LE-PIC: mtc1 $[[R2]], $f0 ; MM-MNO-LE-PIC: mtc1 $[[R2]], $f0
; MM-MNO-LE-PIC: mthc1 $[[R3]], $f0 ; MM-MNO-LE-PIC: mthc1 $[[R3]], $f0
; MM-MNO-BE-PIC: mtc1 $[[R3]], $f0 ; MM-MNO-BE-PIC: mtc1 $[[R3]], $f0
@ -313,14 +313,14 @@ entry:
; MM: addu16 $[[R1:[0-9]+]], $6, $[[R0]] ; MM: addu16 $[[R1:[0-9]+]], $6, $[[R0]]
; MM: sdc1 $f12, 0($[[R1]]) ; MM: sdc1 $f12, 0($[[R1]])
; MM-MNO-PIC: sll16 $[[R0:[0-9]+]], $7, 3 ; MM-MNO-PIC: sll16 $[[R0:[0-9]+]], $7, 3
; MM-MNO-PIC: addu16 $[[R1:[0-9]+]], $6, $[[R0]] ; MM-MNO-PIC: addu16 $[[R1:[0-9]+]], $6, $[[R0]]
; MM-MNO-LE-PIC: mfc1 $[[R2:[0-9]+]], $f12 ; MM-MNO-LE-PIC-DAG: mfc1 $[[R2:[0-9]+]], $f12
; MM-MNO-BE-PIC: mfhc1 $[[R2:[0-9]+]], $f12 ; MM-MNO-BE-PIC-DAG: mfhc1 $[[R2:[0-9]+]], $f12
; MM-MNO-PIC: sw16 $[[R2]], 0($[[R1]]) ; MM-MNO-PIC-DAG: sw16 $[[R2]], 0($[[R1]])
; MM-MNO-LE-PIC: mfhc1 $[[R3:[0-9]+]], $f12 ; MM-MNO-LE-PIC-DAG: mfhc1 $[[R3:[0-9]+]], $f12
; MM-MNO-BE-PIC: mfc1 $[[R3:[0-9]+]], $f12 ; MM-MNO-BE-PIC-DAG: mfc1 $[[R3:[0-9]+]], $f12
; MM-MNO-PIC: sw16 $[[R3]], 4($[[R1]]) ; MM-MNO-PIC-DAG: sw16 $[[R3]], 4($[[R1]])
; MM-STATIC-PIC: sll16 $[[R0:[0-9]+]], $7, 3 ; MM-STATIC-PIC: sll16 $[[R0:[0-9]+]], $7, 3
; MM-STATIC-PIC: addu16 $[[R1:[0-9]+]], $6, $[[R0]] ; MM-STATIC-PIC: addu16 $[[R1:[0-9]+]], $6, $[[R0]]

View File

@ -234,15 +234,15 @@ entry:
; MIPS32: insert.w $w[[W0]][1], $[[R1]] ; MIPS32: insert.w $w[[W0]][1], $[[R1]]
; MIPS32: insert.w $w[[W0]][3], $[[R1]] ; MIPS32: insert.w $w[[W0]][3], $[[R1]]
; MIPS64-N64: ld $[[R3:[0-9]+]], %got_disp(h) ; MIPS64-N64-DAG: ld $[[R3:[0-9]+]], %got_disp(h)
; MIPS64-N32: lw $[[R3:[0-9]+]], %got_disp(h) ; MIPS64-N32-DAG: lw $[[R3:[0-9]+]], %got_disp(h)
; MIPS64: dmfc1 $[[R1:[0-9]+]], $f[[F2]] ; MIPS64-DAG: dmfc1 $[[R1:[0-9]+]], $f[[F2]]
; MIPS64: fill.d $w[[W0:[0-9]+]], $[[R1]] ; MIPS64-DAG: fill.d $w[[W0:[0-9]+]], $[[R1]]
; ALL: fexdo.w $w[[W1:[0-9]+]], $w[[W0]], $w[[W0]] ; ALL-DAG: fexdo.w $w[[W1:[0-9]+]], $w[[W0]], $w[[W0]]
; ALL: fexdo.h $w[[W2:[0-9]+]], $w[[W1]], $w[[W1]] ; ALL-DAG: fexdo.h $w[[W2:[0-9]+]], $w[[W1]], $w[[W1]]
; MIPS32: lw $[[R3:[0-9]+]], %got(h) ; MIPS32-DAG: lw $[[R3:[0-9]+]], %got(h)
; ALL: copy_u.h $[[R2:[0-9]+]], $w[[W2]] ; ALL: copy_u.h $[[R2:[0-9]+]], $w[[W2]]
; ALL: sh $[[R2]], 0($[[R3]]) ; ALL: sh $[[R2]], 0($[[R3]])

View File

@ -336,8 +336,8 @@ entry:
; CHECK: llvm_mips_st_b_valid_range_tests: ; CHECK: llvm_mips_st_b_valid_range_tests:
; CHECK: ld.b ; CHECK: ld.b
; CHECK: st.b [[R1:\$w[0-9]+]], -512( ; CHECK-DAG: st.b [[R1:\$w[0-9]+]], -512(
; CHECK: st.b [[R1:\$w[0-9]+]], 511( ; CHECK-DAG: st.b [[R1:\$w[0-9]+]], 511(
; CHECK: .size llvm_mips_st_b_valid_range_tests ; CHECK: .size llvm_mips_st_b_valid_range_tests
; ;
@ -351,10 +351,10 @@ entry:
} }
; CHECK: llvm_mips_st_b_invalid_range_tests: ; CHECK: llvm_mips_st_b_invalid_range_tests:
; CHECK: addiu $2, $1, -513 ; CHECK: addiu $2, $1, 512
; CHECK: ld.b ; CHECK: ld.b
; CHECK: st.b [[R1:\$w[0-9]+]], 0( ; CHECK: st.b [[R1:\$w[0-9]+]], 0(
; CHECK: addiu $1, $1, 512 ; CHECK: addiu $1, $1, -513
; CHECK: st.b [[R1:\$w[0-9]+]], 0( ; CHECK: st.b [[R1:\$w[0-9]+]], 0(
; CHECK: .size llvm_mips_st_b_invalid_range_tests ; CHECK: .size llvm_mips_st_b_invalid_range_tests
; ;
@ -404,8 +404,8 @@ entry:
; CHECK: llvm_mips_st_h_valid_range_tests: ; CHECK: llvm_mips_st_h_valid_range_tests:
; CHECK: ld.h ; CHECK: ld.h
; CHECK: st.h [[R1:\$w[0-9]+]], -1024( ; CHECK-DAG: st.h [[R1:\$w[0-9]+]], -1024(
; CHECK: st.h [[R1:\$w[0-9]+]], 1022( ; CHECK-DAG: st.h [[R1:\$w[0-9]+]], 1022(
; CHECK: .size llvm_mips_st_h_valid_range_tests ; CHECK: .size llvm_mips_st_h_valid_range_tests
; ;
@ -419,10 +419,10 @@ entry:
} }
; CHECK: llvm_mips_st_h_invalid_range_tests: ; CHECK: llvm_mips_st_h_invalid_range_tests:
; CHECK: addiu $2, $1, -1026 ; CHECK: addiu $2, $1, 1024
; CHECK: ld.h ; CHECK: ld.h
; CHECK: st.h [[R1:\$w[0-9]+]], 0( ; CHECK: st.h [[R1:\$w[0-9]+]], 0(
; CHECK: addiu $1, $1, 1024 ; CHECK: addiu $1, $1, -1026
; CHECK: st.h [[R1:\$w[0-9]+]], 0( ; CHECK: st.h [[R1:\$w[0-9]+]], 0(
; CHECK: .size llvm_mips_st_h_invalid_range_tests ; CHECK: .size llvm_mips_st_h_invalid_range_tests
; ;
@ -472,8 +472,8 @@ entry:
; CHECK: llvm_mips_st_w_valid_range_tests: ; CHECK: llvm_mips_st_w_valid_range_tests:
; CHECK: ld.w ; CHECK: ld.w
; CHECK: st.w [[R1:\$w[0-9]+]], -2048( ; CHECK-DAG: st.w [[R1:\$w[0-9]+]], -2048(
; CHECK: st.w [[R1:\$w[0-9]+]], 2044( ; CHECK-DAG: st.w [[R1:\$w[0-9]+]], 2044(
; CHECK: .size llvm_mips_st_w_valid_range_tests ; CHECK: .size llvm_mips_st_w_valid_range_tests
; ;
@ -487,10 +487,10 @@ entry:
} }
; CHECK: llvm_mips_st_w_invalid_range_tests: ; CHECK: llvm_mips_st_w_invalid_range_tests:
; CHECK: addiu $2, $1, -2052 ; CHECK: addiu $2, $1, 2048
; CHECK: ld.w ; CHECK: ld.w
; CHECK: st.w [[R1:\$w[0-9]+]], 0( ; CHECK: st.w [[R1:\$w[0-9]+]], 0(
; CHECK: addiu $1, $1, 2048 ; CHECK: addiu $1, $1, -2052
; CHECK: st.w [[R1:\$w[0-9]+]], 0( ; CHECK: st.w [[R1:\$w[0-9]+]], 0(
; CHECK: .size llvm_mips_st_w_invalid_range_tests ; CHECK: .size llvm_mips_st_w_invalid_range_tests
; ;
@ -540,8 +540,8 @@ entry:
; CHECK: llvm_mips_st_d_valid_range_tests: ; CHECK: llvm_mips_st_d_valid_range_tests:
; CHECK: ld.d ; CHECK: ld.d
; CHECK: st.d [[R1:\$w[0-9]+]], -4096( ; CHECK-DAG: st.d [[R1:\$w[0-9]+]], -4096(
; CHECK: st.d [[R1:\$w[0-9]+]], 4088( ; CHECK-DAG: st.d [[R1:\$w[0-9]+]], 4088(
; CHECK: .size llvm_mips_st_d_valid_range_tests ; CHECK: .size llvm_mips_st_d_valid_range_tests
; ;
@ -555,10 +555,10 @@ entry:
} }
; CHECK: llvm_mips_st_d_invalid_range_tests: ; CHECK: llvm_mips_st_d_invalid_range_tests:
; CHECK: addiu $2, $1, -4104 ; CHECK: addiu $2, $1, 4096
; CHECK: ld.d ; CHECK: ld.d
; CHECK: st.d [[R1:\$w[0-9]+]], 0( ; CHECK: st.d [[R1:\$w[0-9]+]], 0(
; CHECK: addiu $1, $1, 4096 ; CHECK: addiu $1, $1, -4104
; CHECK: st.d [[R1:\$w[0-9]+]], 0( ; CHECK: st.d [[R1:\$w[0-9]+]], 0(
; CHECK: .size llvm_mips_st_d_invalid_range_tests ; CHECK: .size llvm_mips_st_d_invalid_range_tests
; ;

View File

@ -45,20 +45,18 @@ declare void @callee3(float, %struct.S3* byval, %struct.S1* byval)
define void @f2(float %f, %struct.S1* nocapture byval %s1) nounwind { define void @f2(float %f, %struct.S1* nocapture byval %s1) nounwind {
entry: entry:
; CHECK: addiu $sp, $sp, -48 ; CHECK: addiu $sp, $sp, -48
; CHECK: sw $7, 60($sp) ; CHECK-DAG: sw $7, 60($sp)
; CHECK: sw $6, 56($sp) ; CHECK-DAG: sw $6, 56($sp)
; CHECK: lw $4, 80($sp) ; CHECK-DAG: ldc1 $f[[F0:[0-9]+]], 72($sp)
; CHECK: ldc1 $f[[F0:[0-9]+]], 72($sp) ; CHECK-DAG: lw $[[R3:[0-9]+]], 64($sp)
; CHECK: lw $[[R3:[0-9]+]], 64($sp) ; CHECK-DAG: lw $[[R4:[0-9]+]], 68($sp)
; CHECK: lw $[[R4:[0-9]+]], 68($sp) ; CHECK-DAG: lh $[[R1:[0-9]+]], 58($sp)
; CHECK: lw $[[R2:[0-9]+]], 60($sp) ; CHECK-DAG: lb $[[R0:[0-9]+]], 56($sp)
; CHECK: lh $[[R1:[0-9]+]], 58($sp) ; CHECK-DAG: sw $[[R0]], 32($sp)
; CHECK: lb $[[R0:[0-9]+]], 56($sp) ; CHECK-DAG: sw $[[R1]], 28($sp)
; CHECK: sw $[[R0]], 32($sp) ; CHECK-DAG: sw $[[R4]], 20($sp)
; CHECK: sw $[[R1]], 28($sp) ; CHECK-DAG: sw $[[R3]], 16($sp)
; CHECK: sw $[[R2]], 24($sp) ; CHECK-DAG: sw $7, 24($sp)
; CHECK: sw $[[R4]], 20($sp)
; CHECK: sw $[[R3]], 16($sp)
; CHECK: mfc1 $6, $f[[F0]] ; CHECK: mfc1 $6, $f[[F0]]
%i2 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i32 0, i32 5 %i2 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i32 0, i32 5
@ -82,13 +80,11 @@ declare void @callee4(i32, double, i64, i32, i16 signext, i8 signext, float)
define void @f3(%struct.S2* nocapture byval %s2) nounwind { define void @f3(%struct.S2* nocapture byval %s2) nounwind {
entry: entry:
; CHECK: addiu $sp, $sp, -48 ; CHECK: addiu $sp, $sp, -48
; CHECK: sw $7, 60($sp) ; CHECK-DAG: sw $7, 60($sp)
; CHECK: sw $6, 56($sp) ; CHECK-DAG: sw $6, 56($sp)
; CHECK: sw $5, 52($sp) ; CHECK-DAG: sw $5, 52($sp)
; CHECK: sw $4, 48($sp) ; CHECK-DAG: sw $4, 48($sp)
; CHECK: lw $4, 48($sp) ; CHECK-DAG: sw $7, 24($sp)
; CHECK: lw $[[R0:[0-9]+]], 60($sp)
; CHECK: sw $[[R0]], 24($sp)
%arrayidx = getelementptr inbounds %struct.S2, %struct.S2* %s2, i32 0, i32 0, i32 0 %arrayidx = getelementptr inbounds %struct.S2, %struct.S2* %s2, i32 0, i32 0, i32 0
%tmp = load i32, i32* %arrayidx, align 4 %tmp = load i32, i32* %arrayidx, align 4
@ -101,14 +97,14 @@ entry:
define void @f4(float %f, %struct.S3* nocapture byval %s3, %struct.S1* nocapture byval %s1) nounwind { define void @f4(float %f, %struct.S3* nocapture byval %s3, %struct.S1* nocapture byval %s1) nounwind {
entry: entry:
; CHECK: addiu $sp, $sp, -48 ; CHECK: addiu $sp, $sp, -48
; CHECK: sw $7, 60($sp) ; CHECK-DAG: sw $7, 60($sp)
; CHECK: sw $6, 56($sp) ; CHECK-DAG: sw $6, 56($sp)
; CHECK: sw $5, 52($sp) ; CHECK-DAG: sw $5, 52($sp)
; CHECK: lw $4, 60($sp) ; CHECK-DAG: lw $[[R1:[0-9]+]], 80($sp)
; CHECK: lw $[[R1:[0-9]+]], 80($sp) ; CHECK-DAG: lb $[[R0:[0-9]+]], 52($sp)
; CHECK: lb $[[R0:[0-9]+]], 52($sp) ; CHECK-DAG: sw $[[R0]], 32($sp)
; CHECK: sw $[[R0]], 32($sp) ; CHECK-DAG: sw $[[R1]], 24($sp)
; CHECK: sw $[[R1]], 24($sp) ; CHECK: move $4, $7
%i = getelementptr inbounds %struct.S1, %struct.S1* %s1, i32 0, i32 2 %i = getelementptr inbounds %struct.S1, %struct.S1* %s1, i32 0, i32 2
%tmp = load i32, i32* %i, align 4 %tmp = load i32, i32* %i, align 4

View File

@ -29,9 +29,9 @@ entry:
; CHECK-LABEL: va1: ; CHECK-LABEL: va1:
; CHECK: addiu $sp, $sp, -16 ; CHECK: addiu $sp, $sp, -16
; CHECK: sw $5, 20($sp)
; CHECK: sw $7, 28($sp) ; CHECK: sw $7, 28($sp)
; CHECK: sw $6, 24($sp) ; CHECK: sw $6, 24($sp)
; CHECK: sw $5, 20($sp)
; CHECK: lw $2, 20($sp) ; CHECK: lw $2, 20($sp)
} }
@ -83,8 +83,8 @@ entry:
; CHECK-LABEL: va3: ; CHECK-LABEL: va3:
; CHECK: addiu $sp, $sp, -16 ; CHECK: addiu $sp, $sp, -16
; CHECK: sw $7, 28($sp)
; CHECK: sw $6, 24($sp) ; CHECK: sw $6, 24($sp)
; CHECK: sw $7, 28($sp)
; CHECK: lw $2, 24($sp) ; CHECK: lw $2, 24($sp)
} }

View File

@ -60,10 +60,9 @@ equal:
unequal: unequal:
ret i8* %array2_ptr ret i8* %array2_ptr
} }
; CHECK-LABEL: func2: ; CHECK-LABEL: func2:
; CHECK: ld [[REG2:[0-9]+]], 72(1) ; CHECK: cmpld {{([0-9]+,)?}}4, 6
; CHECK: cmpld {{([0-9]+,)?}}4, [[REG2]] ; CHECK: mr [[REG2:[0-9]+]], 6
; CHECK-DAG: std [[REG2]], -[[OFFSET1:[0-9]+]] ; CHECK-DAG: std [[REG2]], -[[OFFSET1:[0-9]+]]
; CHECK-DAG: std 4, -[[OFFSET2:[0-9]+]] ; CHECK-DAG: std 4, -[[OFFSET2:[0-9]+]]
; CHECK: ld 3, -[[OFFSET2]](1) ; CHECK: ld 3, -[[OFFSET2]](1)
@ -85,8 +84,8 @@ unequal:
; DARWIN64: mr ; DARWIN64: mr
; DARWIN64: mr r[[REG3:[0-9]+]], r[[REGA:[0-9]+]] ; DARWIN64: mr r[[REG3:[0-9]+]], r[[REGA:[0-9]+]]
; DARWIN64: cmpld {{(cr[0-9]+,)?}}r[[REGA]], r[[REG2]] ; DARWIN64: cmpld {{(cr[0-9]+,)?}}r[[REGA]], r[[REG2]]
; DARWIN64: std r[[REG3]], -[[OFFSET1:[0-9]+]]
; DARWIN64: std r[[REG2]], -[[OFFSET2:[0-9]+]] ; DARWIN64: std r[[REG2]], -[[OFFSET2:[0-9]+]]
; DARWIN64: std r[[REG3]], -[[OFFSET1:[0-9]+]]
; DARWIN64: ld r3, -[[OFFSET1]] ; DARWIN64: ld r3, -[[OFFSET1]]
; DARWIN64: ld r3, -[[OFFSET2]] ; DARWIN64: ld r3, -[[OFFSET2]]
@ -106,24 +105,24 @@ unequal:
} }
; CHECK-LABEL: func3: ; CHECK-LABEL: func3:
; CHECK: ld [[REG3:[0-9]+]], 72(1) ; CHECK: cmpld {{([0-9]+,)?}}4, 6
; CHECK: ld [[REG4:[0-9]+]], 56(1) ; CHECK: mr [[REG3:[0-9]+]], 6
; CHECK: cmpld {{([0-9]+,)?}}[[REG4]], [[REG3]] ; CHECK: mr [[REG4:[0-9]+]], 4
; CHECK: std [[REG3]], -[[OFFSET1:[0-9]+]](1)
; CHECK: std [[REG4]], -[[OFFSET2:[0-9]+]](1) ; CHECK: std [[REG4]], -[[OFFSET2:[0-9]+]](1)
; CHECK: std [[REG3]], -[[OFFSET1:[0-9]+]](1)
; CHECK: ld 3, -[[OFFSET2]](1) ; CHECK: ld 3, -[[OFFSET2]](1)
; CHECK: ld 3, -[[OFFSET1]](1) ; CHECK: ld 3, -[[OFFSET1]](1)
; DARWIN32: _func3: ; DARWIN32: _func3:
; DARWIN32: addi r[[REG1:[0-9]+]], r[[REGSP:[0-9]+]], 36 ; DARWIN32-DAG: addi r[[REG1:[0-9]+]], r[[REGSP:[0-9]+]], 36
; DARWIN32: addi r[[REG2:[0-9]+]], r[[REGSP]], 24 ; DARWIN32-DAG: addi r[[REG2:[0-9]+]], r[[REGSP]], 24
; DARWIN32: lwz r[[REG3:[0-9]+]], 44(r[[REGSP]]) ; DARWIN32-DAG: lwz r[[REG3:[0-9]+]], 44(r[[REGSP]])
; DARWIN32: lwz r[[REG4:[0-9]+]], 32(r[[REGSP]]) ; DARWIN32-DAG: lwz r[[REG4:[0-9]+]], 32(r[[REGSP]])
; DARWIN32: cmplw {{(cr[0-9]+,)?}}r[[REG4]], r[[REG3]] ; DARWIN32: cmplw {{(cr[0-9]+,)?}}r[[REG4]], r[[REG3]]
; DARWIN32: stw r[[REG3]], -[[OFFSET1:[0-9]+]] ; DARWIN32-DAG: stw r[[REG3]], -[[OFFSET1:[0-9]+]]
; DARWIN32: stw r[[REG4]], -[[OFFSET2:[0-9]+]] ; DARWIN32-DAG: stw r[[REG4]], -[[OFFSET2:[0-9]+]]
; DARWIN32: lwz r3, -[[OFFSET2]] ; DARWIN32-DAG: lwz r3, -[[OFFSET1:[0-9]+]]
; DARWIN32: lwz r3, -[[OFFSET1]] ; DARWIN32-DAG: lwz r3, -[[OFFSET2:[0-9]+]]
; DARWIN64: _func3: ; DARWIN64: _func3:
; DARWIN64: ld r[[REG3:[0-9]+]], 72(r1) ; DARWIN64: ld r[[REG3:[0-9]+]], 72(r1)

View File

@ -24,10 +24,10 @@ entry:
} }
; CHECK-LABEL: foo: ; CHECK-LABEL: foo:
; CHECK: lfd 1 ; CHECK-DAG: lfd 1
; CHECK: lfd 2 ; CHECK-DAG: fmr 2
; CHECK: lfd 3 ; CHECK-DAG: lfd 3
; CHECK: lfd 4 ; CHECK-DAG: lfd 4
define { float, float } @oof() nounwind { define { float, float } @oof() nounwind {
entry: entry:
@ -50,6 +50,6 @@ entry:
} }
; CHECK-LABEL: oof: ; CHECK-LABEL: oof:
; CHECK: lfs 2 ; CHECK-DAG: lfs 2
; CHECK: lfs 1 ; CHECK-DAG: lfs 1

View File

@ -18,14 +18,14 @@ entry:
ret void ret void
} }
; CHECK: std 6, 184(1) ; CHECK-DAG: std 3, 160(1)
; CHECK: std 5, 176(1) ; CHECK-DAG: std 6, 184(1)
; CHECK: std 4, 168(1) ; CHECK-DAG: std 5, 176(1)
; CHECK: std 3, 160(1) ; CHECK-DAG: std 4, 168(1)
; CHECK: lbz {{[0-9]+}}, 167(1) ; CHECK-DAG: lbz {{[0-9]+}}, 167(1)
; CHECK: lhz {{[0-9]+}}, 165(1) ; CHECK-DAG: lhz {{[0-9]+}}, 165(1)
; CHECK: stb {{[0-9]+}}, 55(1) ; CHECK-DAG: stb {{[0-9]+}}, 55(1)
; CHECK: sth {{[0-9]+}}, 53(1) ; CHECK-DAG: sth {{[0-9]+}}, 53(1)
; CHECK: lbz {{[0-9]+}}, 175(1) ; CHECK: lbz {{[0-9]+}}, 175(1)
; CHECK: lwz {{[0-9]+}}, 171(1) ; CHECK: lwz {{[0-9]+}}, 171(1)
; CHECK: stb {{[0-9]+}}, 63(1) ; CHECK: stb {{[0-9]+}}, 63(1)

View File

@ -1,6 +1,6 @@
; RUN: llc -verify-machineinstrs -mcpu=pwr7 -O0 -fast-isel=false -mattr=-vsx < %s | FileCheck %s ; RUN: llc -verify-machineinstrs -mcpu=pwr7 -O0 -fast-isel=false -mattr=-vsx < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -mcpu=pwr7 -O0 -fast-isel=false -mattr=+vsx < %s | FileCheck -check-prefix=CHECK-VSX %s ; RUN: llc -verify-machineinstrs -mcpu=pwr7 -O0 -fast-isel=false -mattr=+vsx < %s | FileCheck -check-prefix=CHECK-VSX %s
; RUN: llc -verify-machineinstrs -mcpu=pwr9 -O0 -fast-isel=false -mattr=+vsx < %s | FileCheck %s ; RUN: llc -verify-machineinstrs -mcpu=pwr9 -O0 -fast-isel=false -mattr=+vsx < %s | FileCheck -check-prefix=CHECK-P9 %s
; Verify internal alignment of long double in a struct. The double ; Verify internal alignment of long double in a struct. The double
; argument comes in in GPR3; GPR4 is skipped; GPRs 5 and 6 contain ; argument comes in in GPR3; GPR4 is skipped; GPRs 5 and 6 contain
@ -19,19 +19,42 @@ entry:
ret ppc_fp128 %0 ret ppc_fp128 %0
} }
; CHECK-DAG: std 6, 72(1) ;; FIXME: Sadly, we now have an extra store to a temp variable here,
; CHECK-DAG: std 5, 64(1) ;; which comes from (roughly):
; CHECK-DAG: std 4, 56(1) ;; store i64 <val> to i64* <frame>
; CHECK-DAG: std 3, 48(1) ;; bitcast (load i64* <frame>) to f64
; CHECK: lfd 1, 64(1) ;; The code now can elide the load, making:
; CHECK: lfd 2, 72(1) ;; store i64 <val> -> <frame>
;; bitcast i64 <val> to f64
;; Finally, the bitcast itself turns into a store/load pair.
;;
;; This behavior is new, because previously, llvm was accidentally
;; unable to detect that the load came directly from the store, and
;; elide it.
; CHECK-VSX-DAG: std 6, 72(1) ; CHECK: std 6, 72(1)
; CHECK-VSX-DAG: std 5, 64(1) ; CHECK: std 5, 64(1)
; CHECK-VSX-DAG: std 4, 56(1) ; CHECK: std 4, 56(1)
; CHECK-VSX-DAG: std 3, 48(1) ; CHECK: std 3, 48(1)
; CHECK-VSX: li 3, 16 ; CHECK: std 5, -16(1)
; CHECK-VSX: addi 4, 1, 48 ; CHECK: std 6, -8(1)
; CHECK-VSX: lxsdx 1, 4, 3 ; CHECK: lfd 1, -16(1)
; CHECK-VSX: li 3, 24 ; CHECK: lfd 2, -8(1)
; CHECK-VSX: lxsdx 2, 4, 3
; CHECK-VSX: std 6, 72(1)
; CHECK-VSX: std 5, 64(1)
; CHECK-VSX: std 4, 56(1)
; CHECK-VSX: std 3, 48(1)
; CHECK-VSX: std 5, -16(1)
; CHECK-VSX: std 6, -8(1)
; CHECK-VSX: addi 3, 1, -16
; CHECK-VSX: lxsdx 1, 0, 3
; CHECK-VSX: addi 3, 1, -8
; CHECK-VSX: lxsdx 2, 0, 3
; CHECK-P9: std 6, 72(1)
; CHECK-P9: std 5, 64(1)
; CHECK-P9: std 4, 56(1)
; CHECK-P9: std 3, 48(1)
; CHECK-P9: mtvsrd 1, 5
; CHECK-P9: mtvsrd 2, 6

View File

@ -113,13 +113,13 @@ entry:
%add13 = add nsw i32 %add11, %6 %add13 = add nsw i32 %add11, %6
ret i32 %add13 ret i32 %add13
; CHECK: lha {{[0-9]+}}, 126(1) ; CHECK-DAG: lha {{[0-9]+}}, 126(1)
; CHECK: lha {{[0-9]+}}, 132(1) ; CHECK-DAG: lha {{[0-9]+}}, 132(1)
; CHECK: lbz {{[0-9]+}}, 119(1) ; CHECK-DAG: lbz {{[0-9]+}}, 119(1)
; CHECK: lwz {{[0-9]+}}, 140(1) ; CHECK-DAG: lwz {{[0-9]+}}, 140(1)
; CHECK: lwz {{[0-9]+}}, 144(1) ; CHECK-DAG: lwz {{[0-9]+}}, 144(1)
; CHECK: lwz {{[0-9]+}}, 152(1) ; CHECK-DAG: lwz {{[0-9]+}}, 152(1)
; CHECK: lwz {{[0-9]+}}, 160(1) ; CHECK-DAG: lwz {{[0-9]+}}, 160(1)
} }
define i32 @caller2() nounwind { define i32 @caller2() nounwind {
@ -205,11 +205,11 @@ entry:
%add13 = add nsw i32 %add11, %6 %add13 = add nsw i32 %add11, %6
ret i32 %add13 ret i32 %add13
; CHECK: lha {{[0-9]+}}, 126(1) ; CHECK-DAG: lha {{[0-9]+}}, 126(1)
; CHECK: lha {{[0-9]+}}, 133(1) ; CHECK-DAG: lha {{[0-9]+}}, 133(1)
; CHECK: lbz {{[0-9]+}}, 119(1) ; CHECK-DAG: lbz {{[0-9]+}}, 119(1)
; CHECK: lwz {{[0-9]+}}, 140(1) ; CHECK-DAG: lwz {{[0-9]+}}, 140(1)
; CHECK: lwz {{[0-9]+}}, 147(1) ; CHECK-DAG: lwz {{[0-9]+}}, 147(1)
; CHECK: lwz {{[0-9]+}}, 154(1) ; CHECK-DAG: lwz {{[0-9]+}}, 154(1)
; CHECK: lwz {{[0-9]+}}, 161(1) ; CHECK-DAG: lwz {{[0-9]+}}, 161(1)
} }

View File

@ -59,6 +59,7 @@ entry:
%call = call i32 @callee1(%struct.s1* byval %p1, %struct.s2* byval %p2, %struct.s3* byval %p3, %struct.s4* byval %p4, %struct.s5* byval %p5, %struct.s6* byval %p6, %struct.s7* byval %p7) %call = call i32 @callee1(%struct.s1* byval %p1, %struct.s2* byval %p2, %struct.s3* byval %p3, %struct.s4* byval %p4, %struct.s5* byval %p5, %struct.s6* byval %p6, %struct.s7* byval %p7)
ret i32 %call ret i32 %call
; CHECK-LABEL: caller1
; CHECK: ld 9, 112(31) ; CHECK: ld 9, 112(31)
; CHECK: ld 8, 120(31) ; CHECK: ld 8, 120(31)
; CHECK: ld 7, 128(31) ; CHECK: ld 7, 128(31)
@ -97,20 +98,21 @@ entry:
%add13 = add nsw i32 %add11, %6 %add13 = add nsw i32 %add11, %6
ret i32 %add13 ret i32 %add13
; CHECK: std 9, 96(1) ; CHECK-LABEL: callee1
; CHECK: std 8, 88(1) ; CHECK-DAG: std 9, 96(1)
; CHECK: std 7, 80(1) ; CHECK-DAG: std 8, 88(1)
; CHECK: stw 6, 76(1) ; CHECK-DAG: std 7, 80(1)
; CHECK: stw 5, 68(1) ; CHECK-DAG: stw 6, 76(1)
; CHECK: sth 4, 62(1) ; CHECK-DAG: stw 5, 68(1)
; CHECK: stb 3, 55(1) ; CHECK-DAG: sth 4, 62(1)
; CHECK: lha {{[0-9]+}}, 62(1) ; CHECK-DAG: stb 3, 55(1)
; CHECK: lha {{[0-9]+}}, 68(1) ; CHECK-DAG: lha {{[0-9]+}}, 62(1)
; CHECK: lbz {{[0-9]+}}, 55(1) ; CHECK-DAG: lha {{[0-9]+}}, 68(1)
; CHECK: lwz {{[0-9]+}}, 76(1) ; CHECK-DAG: lbz {{[0-9]+}}, 55(1)
; CHECK: lwz {{[0-9]+}}, 80(1) ; CHECK-DAG: lwz {{[0-9]+}}, 76(1)
; CHECK: lwz {{[0-9]+}}, 88(1) ; CHECK-DAG: lwz {{[0-9]+}}, 80(1)
; CHECK: lwz {{[0-9]+}}, 96(1) ; CHECK-DAG: lwz {{[0-9]+}}, 88(1)
; CHECK-DAG: lwz {{[0-9]+}}, 96(1)
} }
define i32 @caller2() nounwind { define i32 @caller2() nounwind {
@ -139,6 +141,7 @@ entry:
%call = call i32 @callee2(%struct.t1* byval %p1, %struct.t2* byval %p2, %struct.t3* byval %p3, %struct.t4* byval %p4, %struct.t5* byval %p5, %struct.t6* byval %p6, %struct.t7* byval %p7) %call = call i32 @callee2(%struct.t1* byval %p1, %struct.t2* byval %p2, %struct.t3* byval %p3, %struct.t4* byval %p4, %struct.t5* byval %p5, %struct.t6* byval %p6, %struct.t7* byval %p7)
ret i32 %call ret i32 %call
; CHECK-LABEL: caller2
; CHECK: stb {{[0-9]+}}, 71(1) ; CHECK: stb {{[0-9]+}}, 71(1)
; CHECK: sth {{[0-9]+}}, 69(1) ; CHECK: sth {{[0-9]+}}, 69(1)
; CHECK: stb {{[0-9]+}}, 87(1) ; CHECK: stb {{[0-9]+}}, 87(1)
@ -184,18 +187,19 @@ entry:
%add13 = add nsw i32 %add11, %6 %add13 = add nsw i32 %add11, %6
ret i32 %add13 ret i32 %add13
; CHECK: std 9, 96(1) ; CHECK-LABEL: callee2
; CHECK: std 8, 88(1) ; CHECK-DAG: std 9, 96(1)
; CHECK: std 7, 80(1) ; CHECK-DAG: std 8, 88(1)
; CHECK: stw 6, 76(1) ; CHECK-DAG: std 7, 80(1)
; CHECK: std 5, 64(1) ; CHECK-DAG: stw 6, 76(1)
; CHECK: sth 4, 62(1) ; CHECK-DAG: std 5, 64(1)
; CHECK: stb 3, 55(1) ; CHECK-DAG: sth 4, 62(1)
; CHECK: lha {{[0-9]+}}, 62(1) ; CHECK-DAG: stb 3, 55(1)
; CHECK: lha {{[0-9]+}}, 69(1) ; CHECK-DAG: lha {{[0-9]+}}, 62(1)
; CHECK: lbz {{[0-9]+}}, 55(1) ; CHECK-DAG: lha {{[0-9]+}}, 69(1)
; CHECK: lwz {{[0-9]+}}, 76(1) ; CHECK-DAG: lbz {{[0-9]+}}, 55(1)
; CHECK: lwz {{[0-9]+}}, 83(1) ; CHECK-DAG: lwz {{[0-9]+}}, 76(1)
; CHECK: lwz {{[0-9]+}}, 90(1) ; CHECK-DAG: lwz {{[0-9]+}}, 83(1)
; CHECK: lwz {{[0-9]+}}, 97(1) ; CHECK-DAG: lwz {{[0-9]+}}, 90(1)
; CHECK-DAG: lwz {{[0-9]+}}, 97(1)
} }

View File

@ -1,10 +1,7 @@
; Check that unaligned accesses are allowed in general. We check the ; Check that unaligned accesses are allowed in general. We check the
; few exceptions (like CRL) in their respective test files. ; few exceptions (like CRL) in their respective test files.
; ;
; FIXME: -combiner-alias-analysis (the default for SystemZ) stops ; RUN: llc < %s -mtriple=s390x-linux-gnu | FileCheck %s
; f1 from being optimized.
; RUN: llc < %s -mtriple=s390x-linux-gnu -combiner-alias-analysis=false \
; RUN: | FileCheck %s
; Check that these four byte stores become a single word store. ; Check that these four byte stores become a single word store.
define void @f1(i8 *%ptr) { define void @f1(i8 *%ptr) {

View File

@ -9,9 +9,9 @@
define void @_Z19getClosestDiagonal3ii(%0* noalias sret, i32, i32) nounwind { define void @_Z19getClosestDiagonal3ii(%0* noalias sret, i32, i32) nounwind {
; CHECK: bl ___muldf3 ; CHECK: bl ___muldf3
; CHECK: bl ___muldf3
; CHECK: beq LBB0 ; CHECK: beq LBB0
; CHECK: bl ___muldf3 ; CHECK: bl ___muldf3
; CHECK: bl ___muldf3
; <label>:3 ; <label>:3
switch i32 %1, label %4 [ switch i32 %1, label %4 [
i32 0, label %5 i32 0, label %5

View File

@ -74,15 +74,17 @@ define zeroext i16 @test6() {
} }
; Accessing the bottom of a large array shouldn't require materializing a base ; Accessing the bottom of a large array shouldn't require materializing a base
;
; CHECK: movs [[REG:r[0-9]+]], #1
; CHECK: str [[REG]], [sp, #16]
; CHECK: str [[REG]], [sp, #4]
define void @test7() { define void @test7() {
%arr = alloca [200 x i32], align 4 %arr = alloca [200 x i32], align 4
; CHECK: movs [[REG:r[0-9]+]], #1
; CHECK: str [[REG]], [sp, #4]
%arrayidx = getelementptr inbounds [200 x i32], [200 x i32]* %arr, i32 0, i32 1 %arrayidx = getelementptr inbounds [200 x i32], [200 x i32]* %arr, i32 0, i32 1
store i32 1, i32* %arrayidx, align 4 store i32 1, i32* %arrayidx, align 4
; CHECK: str [[REG]], [sp, #16]
%arrayidx1 = getelementptr inbounds [200 x i32], [200 x i32]* %arr, i32 0, i32 4 %arrayidx1 = getelementptr inbounds [200 x i32], [200 x i32]* %arr, i32 0, i32 4
store i32 1, i32* %arrayidx1, align 4 store i32 1, i32* %arrayidx1, align 4
@ -96,30 +98,36 @@ define void @test8() {
%arr1 = alloca [224 x i32], align 4 %arr1 = alloca [224 x i32], align 4
; CHECK: movs [[REG:r[0-9]+]], #1 ; CHECK: movs [[REG:r[0-9]+]], #1
; CHECK: str [[REG]], [sp] ; CHECK-DAG: str [[REG]], [sp]
%arr1idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr1, i32 0, i32 0 %arr1idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr1, i32 0, i32 0
store i32 1, i32* %arr1idx1, align 4 store i32 1, i32* %arr1idx1, align 4
; Offset in range for sp-based store, but not for non-sp-based store ; Offset in range for sp-based store, but not for non-sp-based store
; CHECK: str [[REG]], [sp, #128] ; CHECK-DAG: str [[REG]], [sp, #128]
%arr1idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr1, i32 0, i32 32 %arr1idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr1, i32 0, i32 32
store i32 1, i32* %arr1idx2, align 4 store i32 1, i32* %arr1idx2, align 4
; CHECK: str [[REG]], [sp, #896] ; CHECK-DAG: str [[REG]], [sp, #896]
%arr2idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr2, i32 0, i32 0 %arr2idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr2, i32 0, i32 0
store i32 1, i32* %arr2idx1, align 4 store i32 1, i32* %arr2idx1, align 4
; %arr2 is in range, but this element of it is not ; %arr2 is in range, but this element of it is not
; CHECK: str [[REG]], [{{r[0-9]+}}] ; CHECK-DAG: ldr [[RA:r[0-9]+]], .LCPI7_2
; CHECK-DAG: add [[RA]], sp
; CHECK-DAG: str [[REG]], [{{r[0-9]+}}]
%arr2idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr2, i32 0, i32 32 %arr2idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr2, i32 0, i32 32
store i32 1, i32* %arr2idx2, align 4 store i32 1, i32* %arr2idx2, align 4
; %arr3 is not in range ; %arr3 is not in range
; CHECK: str [[REG]], [{{r[0-9]+}}] ; CHECK-DAG: ldr [[RB:r[0-9]+]], .LCPI7_3
; CHECK-DAG: add [[RB]], sp
; CHECK-DAG: str [[REG]], [{{r[0-9]+}}]
%arr3idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr3, i32 0, i32 0 %arr3idx1 = getelementptr inbounds [224 x i32], [224 x i32]* %arr3, i32 0, i32 0
store i32 1, i32* %arr3idx1, align 4 store i32 1, i32* %arr3idx1, align 4
; CHECK: str [[REG]], [{{r[0-9]+}}] ; CHECK-DAG: ldr [[RC:r[0-9]+]], .LCPI7_4
; CHECK-DAG: add [[RC]], sp
; CHECK-DAG: str [[REG]], [{{r[0-9]+}}]
%arr3idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr3, i32 0, i32 32 %arr3idx2 = getelementptr inbounds [224 x i32], [224 x i32]* %arr3, i32 0, i32 32
store i32 1, i32* %arr3idx2, align 4 store i32 1, i32* %arr3idx2, align 4

View File

@ -1,4 +1,4 @@
; RUN: llc < %s -combiner-alias-analysis -march=x86-64 -mcpu=core2 | FileCheck %s ; RUN: llc < %s -march=x86-64 -mcpu=core2 | FileCheck %s
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-apple-darwin10.4" target triple = "x86_64-apple-darwin10.4"

View File

@ -3,8 +3,8 @@
; CHECK: merge_stores_can ; CHECK: merge_stores_can
; CHECK: callq foo ; CHECK: callq foo
; CHECK: xorps %xmm0, %xmm0 ; CHECK: xorps %xmm0, %xmm0
; CHECK-NEXT: movl 36(%rsp), %ebp
; CHECK-NEXT: movups %xmm0 ; CHECK-NEXT: movups %xmm0
; CHECK-NEXT: movl 36(%rsp), %ebp
; CHECK: callq foo ; CHECK: callq foo
; CHECK: ret ; CHECK: ret
declare i32 @foo([10 x i32]* ) declare i32 @foo([10 x i32]* )

View File

@ -292,16 +292,12 @@ block4: ; preds = %4, %.lr.ph
ret void ret void
} }
;; On x86, even unaligned copies should be merged to vector ops. ;; On x86, even unaligned copies can be merged to vector ops.
;; TODO: however, this cannot happen at the moment, due to brokenness
;; in MergeConsecutiveStores. See UseAA FIXME in DAGCombiner.cpp
;; visitSTORE.
; CHECK-LABEL: merge_loads_no_align: ; CHECK-LABEL: merge_loads_no_align:
; load: ; load:
; CHECK-NOT: vmovups ;; TODO ; CHECK: vmovups
; store: ; store:
; CHECK-NOT: vmovups ;; TODO ; CHECK: vmovups
; CHECK: ret ; CHECK: ret
define void @merge_loads_no_align(i32 %count, %struct.B* noalias nocapture %q, %struct.B* noalias nocapture %p) nounwind uwtable noinline ssp { define void @merge_loads_no_align(i32 %count, %struct.B* noalias nocapture %q, %struct.B* noalias nocapture %p) nounwind uwtable noinline ssp {
%a1 = icmp sgt i32 %count, 0 %a1 = icmp sgt i32 %count, 0
@ -549,8 +545,8 @@ define void @merge_vec_element_and_scalar_load([6 x i64]* %array) {
; CHECK-LABEL: merge_vec_element_and_scalar_load ; CHECK-LABEL: merge_vec_element_and_scalar_load
; CHECK: movq (%rdi), %rax ; CHECK: movq (%rdi), %rax
; CHECK-NEXT: movq 8(%rdi), %rcx
; CHECK-NEXT: movq %rax, 32(%rdi) ; CHECK-NEXT: movq %rax, 32(%rdi)
; CHECK-NEXT: movq 8(%rdi), %rax ; CHECK-NEXT: movq %rcx, 40(%rdi)
; CHECK-NEXT: movq %rax, 40(%rdi)
; CHECK-NEXT: retq ; CHECK-NEXT: retq
} }

View File

@ -1173,10 +1173,6 @@ define void @ktest_2(<32 x float> %in, float * %base) {
; KNL-NEXT: kmovw %k0, %eax ; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: vpinsrb $15, %eax, %xmm2, %xmm2 ; KNL-NEXT: vpinsrb $15, %eax, %xmm2, %xmm2
; KNL-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2 ; KNL-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2
; KNL-NEXT: vpsllw $7, %ymm2, %ymm2
; KNL-NEXT: vpand {{.*}}(%rip), %ymm2, %ymm2
; KNL-NEXT: vpxor %ymm3, %ymm3, %ymm3
; KNL-NEXT: vpcmpgtb %ymm2, %ymm3, %ymm2
; KNL-NEXT: vmovups 4(%rdi), %zmm3 {%k2} {z} ; KNL-NEXT: vmovups 4(%rdi), %zmm3 {%k2} {z}
; KNL-NEXT: vmovups 68(%rdi), %zmm4 {%k1} {z} ; KNL-NEXT: vmovups 68(%rdi), %zmm4 {%k1} {z}
; KNL-NEXT: vcmpltps %zmm4, %zmm1, %k0 ; KNL-NEXT: vcmpltps %zmm4, %zmm1, %k0

View File

@ -11,9 +11,9 @@ define void @cftx020(double* nocapture %a) {
; CHECK-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0] ; CHECK-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; CHECK-NEXT: vaddpd %xmm1, %xmm0, %xmm0 ; CHECK-NEXT: vaddpd %xmm1, %xmm0, %xmm0
; CHECK-NEXT: vmovupd (%rdi), %xmm1 ; CHECK-NEXT: vmovupd (%rdi), %xmm1
; CHECK-NEXT: vsubpd 16(%rdi), %xmm1, %xmm1
; CHECK-NEXT: vmovupd %xmm0, (%rdi) ; CHECK-NEXT: vmovupd %xmm0, (%rdi)
; CHECK-NEXT: vmovupd %xmm1, 16(%rdi) ; CHECK-NEXT: vsubpd 16(%rdi), %xmm1, %xmm0
; CHECK-NEXT: vmovupd %xmm0, 16(%rdi)
; CHECK-NEXT: retq ; CHECK-NEXT: retq
entry: entry:
%0 = load double, double* %a, align 8 %0 = load double, double* %a, align 8

View File

@ -151,47 +151,47 @@ define <16 x i8> @_clearupper16xi8a(<16 x i8>) nounwind {
; SSE-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp) ; SSE-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm0 ; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %r9d
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %edx
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %esi
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %r8d
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %edi
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm1 ; SSE-NEXT: movd %eax, %xmm1
; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE-NEXT: movd %esi, %xmm0
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %esi ; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %ecx ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %ecx, %xmm2 ; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
; SSE-NEXT: movd %edx, %xmm0 ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %esi, %xmm1 ; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %edi, %xmm0 ; SSE-NEXT: movd %eax, %xmm3
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %ecx
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %edx
; SSE-NEXT: movd %edx, %xmm3
; SSE-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3],xmm3[4],xmm0[4],xmm3[5],xmm0[5],xmm3[6],xmm0[6],xmm3[7],xmm0[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3],xmm3[4],xmm0[4],xmm3[5],xmm0[5],xmm3[6],xmm0[6],xmm3[7],xmm0[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3],xmm3[4],xmm1[4],xmm3[5],xmm1[5],xmm3[6],xmm1[6],xmm3[7],xmm1[7] ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7] ; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: movd %r9d, %xmm0 ; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm1 ; SSE-NEXT: movd %eax, %xmm1
; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE-NEXT: movd %r8d, %xmm0 ; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
; SSE-NEXT: movd %ecx, %xmm2 ; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7] ; SSE-NEXT: movd {{.*#+}} xmm3 = mem[0],zero,zero,zero
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE-NEXT: movd {{.*#+}} xmm4 = mem[0],zero,zero,zero
; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero ; SSE-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm3
; SSE-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; SSE-NEXT: movd %eax, %xmm4
; SSE-NEXT: punpcklbw {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1],xmm4[2],xmm2[2],xmm4[3],xmm2[3],xmm4[4],xmm2[4],xmm4[5],xmm2[5],xmm4[6],xmm2[6],xmm4[7],xmm2[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm4 = xmm4[0],xmm3[0],xmm4[1],xmm3[1],xmm4[2],xmm3[2],xmm4[3],xmm3[3],xmm4[4],xmm3[4],xmm4[5],xmm3[5],xmm4[6],xmm3[6],xmm4[7],xmm3[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3],xmm0[4],xmm4[4],xmm0[5],xmm4[5],xmm0[6],xmm4[6],xmm0[7],xmm4[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3],xmm0[4],xmm4[4],xmm0[5],xmm4[5],xmm0[6],xmm4[6],xmm0[7],xmm4[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7] ; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
; SSE-NEXT: pand {{.*}}(%rip), %xmm0 ; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: retq ; SSE-NEXT: retq
; ;

View File

@ -1,20 +0,0 @@
; RUN: llc < %s -march=x86-64 -combiner-global-alias-analysis -combiner-alias-analysis
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
%struct.Hash_Key = type { [4 x i32], i32 }
@g_flipV_hashkey = external global %struct.Hash_Key, align 16 ; <%struct.Hash_Key*> [#uses=1]
define void @foo() nounwind {
%t0 = load i32, i32* undef, align 16 ; <i32> [#uses=1]
%t1 = load i32, i32* null, align 4 ; <i32> [#uses=1]
%t2 = srem i32 %t0, 32 ; <i32> [#uses=1]
%t3 = shl i32 1, %t2 ; <i32> [#uses=1]
%t4 = xor i32 %t3, %t1 ; <i32> [#uses=1]
store i32 %t4, i32* null, align 4
%t5 = getelementptr %struct.Hash_Key, %struct.Hash_Key* @g_flipV_hashkey, i64 0, i32 0, i64 0 ; <i32*> [#uses=2]
%t6 = load i32, i32* %t5, align 4 ; <i32> [#uses=1]
%t7 = shl i32 1, undef ; <i32> [#uses=1]
%t8 = xor i32 %t7, %t6 ; <i32> [#uses=1]
store i32 %t8, i32* %t5, align 4
unreachable
}

View File

@ -1,23 +0,0 @@
; RUN: llc < %s --combiner-alias-analysis --combiner-global-alias-analysis
; PR4880
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i386-pc-linux-gnu"
%struct.alst_node = type { %struct.node }
%struct.arg_node = type { %struct.node, i8*, %struct.alst_node* }
%struct.arglst_node = type { %struct.alst_node, %struct.arg_node*, %struct.arglst_node* }
%struct.lam_node = type { %struct.alst_node, %struct.arg_node*, %struct.alst_node* }
%struct.node = type { i32 (...)**, %struct.node* }
define i32 @._ZN8lam_node18resolve_name_clashEP8arg_nodeP9alst_node._ZNK8lam_nodeeqERK8exp_node._ZN11arglst_nodeD0Ev(%struct.lam_node* %this.this, %struct.arg_node* %outer_arg, %struct.alst_node* %env.cmp, %struct.arglst_node* %this, i32 %functionID) {
comb_entry:
%.SV59 = alloca %struct.node* ; <%struct.node**> [#uses=1]
%0 = load i32 (...)**, i32 (...)*** null, align 4 ; <i32 (...)**> [#uses=1]
%1 = getelementptr inbounds i32 (...)*, i32 (...)** %0, i32 3 ; <i32 (...)**> [#uses=1]
%2 = load i32 (...)*, i32 (...)** %1, align 4 ; <i32 (...)*> [#uses=1]
store %struct.node* undef, %struct.node** %.SV59
%3 = bitcast i32 (...)* %2 to i32 (%struct.node*)* ; <i32 (%struct.node*)*> [#uses=1]
%4 = tail call i32 %3(%struct.node* undef) ; <i32> [#uses=0]
unreachable
}

View File

@ -9,19 +9,22 @@ target triple = "i686-unknown-linux-gnu"
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1 @.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
; CHECK-LABEL: func: ; CHECK-LABEL: func:
; This tests whether eax is properly saved/restored around the lahf/sahf ; This tests whether eax is properly saved/restored around the
; instruction sequences. ; lahf/sahf instruction sequences. We make mem op volatile to prevent
; their reordering to avoid spills.
define i32 @func() { define i32 @func() {
entry: entry:
%bval = load i8, i8* @b %bval = load i8, i8* @b
%inc = add i8 %bval, 1 %inc = add i8 %bval, 1
store i8 %inc, i8* @b store volatile i8 %inc, i8* @b
%cval = load i32, i32* @c %cval = load volatile i32, i32* @c
%inc1 = add nsw i32 %cval, 1 %inc1 = add nsw i32 %cval, 1
store i32 %inc1, i32* @c store volatile i32 %inc1, i32* @c
%aval = load i8, i8* @a %aval = load volatile i8, i8* @a
%inc2 = add i8 %aval, 1 %inc2 = add i8 %aval, 1
store i8 %inc2, i8* @a store volatile i8 %inc2, i8* @a
; Copy flags produced by the incb of %inc1 to a register, need to save+restore ; Copy flags produced by the incb of %inc1 to a register, need to save+restore
; eax around it. The flags will be reused by %tobool. ; eax around it. The flags will be reused by %tobool.
; CHECK: pushl %eax ; CHECK: pushl %eax

View File

@ -51,19 +51,11 @@ define void @merge_vec_element_store(<4 x double> %v, double* %ptr) {
} }
;; TODO: FAST *should* be:
;; movups (%rdi), %xmm0
;; movups %xmm0, 40(%rdi)
;; ..but is not currently. See the UseAA FIXME in DAGCombiner.cpp
;; visitSTORE.
define void @merge_vec_load_and_stores(i64 *%ptr) { define void @merge_vec_load_and_stores(i64 *%ptr) {
; FAST-LABEL: merge_vec_load_and_stores: ; FAST-LABEL: merge_vec_load_and_stores:
; FAST: # BB#0: ; FAST: # BB#0:
; FAST-NEXT: movq (%rdi), %rax ; FAST-NEXT: movups (%rdi), %xmm0
; FAST-NEXT: movq 8(%rdi), %rcx ; FAST-NEXT: movups %xmm0, 40(%rdi)
; FAST-NEXT: movq %rax, 40(%rdi)
; FAST-NEXT: movq %rcx, 48(%rdi)
; FAST-NEXT: retq ; FAST-NEXT: retq
; ;
; SLOW-LABEL: merge_vec_load_and_stores: ; SLOW-LABEL: merge_vec_load_and_stores:

View File

@ -1,9 +1,9 @@
; RUN: llc -march=x86 < %s | FileCheck %s ; RUN: llc -march=x86 < %s | FileCheck %s
; CHECK-LABEL: @bar ; CHECK-LABEL: @bar
; CHECK: movl $1074339512, ; CHECK-DAG: movl $1074339512,
; CHECK: movl $1374389535, ; CHECK-DAG: movl $1374389535,
; CHECK: movl $1078523331, ; CHECK-DAG: movl $1078523331,
define void @bar() unnamed_addr { define void @bar() unnamed_addr {
entry-block: entry-block:
%a = alloca double %a = alloca double

View File

@ -18,13 +18,13 @@ target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
; CHECK-NEXT: movdqa %xmm0, (%edx) ; CHECK-NEXT: movdqa %xmm0, (%edx)
; CHECK-NEXT: shll $4, %ecx ; CHECK-NEXT: shll $4, %ecx
; CHECK-NEXT: movl (%ecx,%edx), %esi ; CHECK-NEXT: movl (%ecx,%edx), %esi
; CHECK-NEXT: movl 12(%ecx,%edx), %edi ; CHECK-NEXT: movl 4(%ecx,%edx), %edi
; CHECK-NEXT: movl 8(%ecx,%edx), %ebx ; CHECK-NEXT: movl 8(%ecx,%edx), %ebx
; CHECK-NEXT: movl 4(%ecx,%edx), %edx ; CHECK-NEXT: movl 12(%ecx,%edx), %edx
; CHECK-NEXT: movl %esi, 12(%eax,%ecx) ; CHECK-NEXT: movl %esi, 12(%eax,%ecx)
; CHECK-NEXT: movl %edx, (%eax,%ecx) ; CHECK-NEXT: movl %edi, (%eax,%ecx)
; CHECK-NEXT: movl %ebx, 8(%eax,%ecx) ; CHECK-NEXT: movl %ebx, 8(%eax,%ecx)
; CHECK-NEXT: movl %edi, 4(%eax,%ecx) ; CHECK-NEXT: movl %edx, 4(%eax,%ecx)
; CHECK-NEXT: popl %esi ; CHECK-NEXT: popl %esi
; CHECK-NEXT: popl %edi ; CHECK-NEXT: popl %edi
; CHECK-NEXT: popl %ebx ; CHECK-NEXT: popl %ebx

View File

@ -2,17 +2,17 @@
; RUN: grep adcl %t | count 7 ; RUN: grep adcl %t | count 7
; RUN: grep sbbl %t | count 7 ; RUN: grep sbbl %t | count 7
define void @add(i256* %p, i256* %q) nounwind { define void @add(i256* %p, i256* %q, i256* %r) nounwind {
%a = load i256, i256* %p %a = load i256, i256* %p
%b = load i256, i256* %q %b = load i256, i256* %q
%c = add i256 %a, %b %c = add i256 %a, %b
store i256 %c, i256* %p store i256 %c, i256* %r
ret void ret void
} }
define void @sub(i256* %p, i256* %q) nounwind { define void @sub(i256* %p, i256* %q, i256* %r) nounwind {
%a = load i256, i256* %p %a = load i256, i256* %p
%b = load i256, i256* %q %b = load i256, i256* %q
%c = sub i256 %a, %b %c = sub i256 %a, %b
store i256 %c, i256* %p store i256 %c, i256* %r
ret void ret void
} }

View File

@ -55,8 +55,7 @@ target triple = "i386-apple-macosx10.5"
; ;
; CHECK-NEXT: L_e$non_lazy_ptr, [[E:%[a-z]+]] ; CHECK-NEXT: L_e$non_lazy_ptr, [[E:%[a-z]+]]
; CHECK-NEXT: movb [[D]], ([[E]]) ; CHECK-NEXT: movb [[D]], ([[E]])
; CHECK-NEXT: L_f$non_lazy_ptr, [[F:%[a-z]+]] ; CHECK-NEXT: movsbl ([[E]]), [[CONV:%[a-z]+]]
; CHECK-NEXT: movsbl ([[F]]), [[CONV:%[a-z]+]]
; CHECK-NEXT: movl $6, [[CONV:%[a-z]+]] ; CHECK-NEXT: movl $6, [[CONV:%[a-z]+]]
; The eflags is used in the next instruction. ; The eflags is used in the next instruction.
; If that instruction disappear, we are not exercising the bug ; If that instruction disappear, we are not exercising the bug
@ -96,7 +95,7 @@ for.end: ; preds = %for.cond.preheader
%.b3 = load i1, i1* @d, align 1 %.b3 = load i1, i1* @d, align 1
%tmp2 = select i1 %.b3, i8 0, i8 6 %tmp2 = select i1 %.b3, i8 0, i8 6
store i8 %tmp2, i8* @e, align 1 store i8 %tmp2, i8* @e, align 1
%tmp3 = load i8, i8* @f, align 1 %tmp3 = load i8, i8* @e, align 1
%conv = sext i8 %tmp3 to i32 %conv = sext i8 %tmp3 to i32
%add = add nsw i32 %conv, 1 %add = add nsw i32 %conv, 1
%rem = srem i32 %tmp1, %add %rem = srem i32 %tmp1, %add

View File

@ -1,7 +1,6 @@
; RUN: llc -march=x86-64 < %s | FileCheck %s ; RUN: llc -march=x86-64 < %s
; Check for a sane output. This testcase used to crash. See PR29132. ; This testcase used to crash. See PR29132.
; CHECK: leal -1
target triple = "x86_64-unknown-linux-gnu" target triple = "x86_64-unknown-linux-gnu"

View File

@ -1037,12 +1037,12 @@ define <2 x i64> @merge_2i64_i64_12_volatile(i64* %ptr) nounwind uwtable noinlin
define <4 x float> @merge_4f32_f32_2345_volatile(float* %ptr) nounwind uwtable noinline ssp { define <4 x float> @merge_4f32_f32_2345_volatile(float* %ptr) nounwind uwtable noinline ssp {
; SSE2-LABEL: merge_4f32_f32_2345_volatile: ; SSE2-LABEL: merge_4f32_f32_2345_volatile:
; SSE2: # BB#0: ; SSE2: # BB#0:
; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero ; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero ; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE2-NEXT: movss {{.*#+}} xmm3 = mem[0],zero,zero,zero
; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] ; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] ; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] ; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE2-NEXT: retq ; SSE2-NEXT: retq
; ;
@ -1065,13 +1065,13 @@ define <4 x float> @merge_4f32_f32_2345_volatile(float* %ptr) nounwind uwtable n
; X32-SSE1-LABEL: merge_4f32_f32_2345_volatile: ; X32-SSE1-LABEL: merge_4f32_f32_2345_volatile:
; X32-SSE1: # BB#0: ; X32-SSE1: # BB#0:
; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax ; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE1-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero ; X32-SSE1-DAG: movss 8(%eax), %[[R0:xmm[0-3]]] # [[R0]] = mem[0],zero,zero,zero
; X32-SSE1-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero ; X32-SSE1-DAG: movss 12(%eax), %[[R1:xmm[0-3]]] # [[R1]] = mem[0],zero,zero,zero
; X32-SSE1-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero ; X32-SSE1-DAG: movss 16(%eax), %[[R2:xmm[0-3]]] # [[R2]] = mem[0],zero,zero,zero
; X32-SSE1-NEXT: movss {{.*#+}} xmm3 = mem[0],zero,zero,zero ; X32-SSE1-DAG: movss 20(%eax), %[[R3:xmm[0-3]]] # [[R3]] = mem[0],zero,zero,zero
; X32-SSE1-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1] ; X32-SSE1-DAG: unpcklps %[[R2]], %[[R0]] # [[R0]] = [[R0]][0],[[R2]][0],[[R0]][1],[[R2]][1]
; X32-SSE1-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1] ; X32-SSE1-DAG: unpcklps %[[R3]], %[[R1]] # [[R1]] = [[R1]][0],[[R3]][0],[[R1]][1],[[R3]][1]
; X32-SSE1-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] ; X32-SSE1-DAG: unpcklps %[[R1]], %[[R0]] # [[R0]] = [[R0]][0],[[R1]][0],[[R0]][1],[[R1]][1]
; X32-SSE1-NEXT: retl ; X32-SSE1-NEXT: retl
; ;
; X32-SSE41-LABEL: merge_4f32_f32_2345_volatile: ; X32-SSE41-LABEL: merge_4f32_f32_2345_volatile:

View File

@ -682,10 +682,10 @@ define <16 x i16> @merge_16i16_i16_0uu3zzuuuuuzCuEF_volatile(i16* %ptr) nounwind
; AVX1: # BB#0: ; AVX1: # BB#0:
; AVX1-NEXT: vpxor %xmm0, %xmm0, %xmm0 ; AVX1-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1 ; AVX1-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1
; AVX1-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX1-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0 ; AVX1-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0 ; AVX1-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0 ; AVX1-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: retq ; AVX1-NEXT: retq
; ;
@ -693,10 +693,10 @@ define <16 x i16> @merge_16i16_i16_0uu3zzuuuuuzCuEF_volatile(i16* %ptr) nounwind
; AVX2: # BB#0: ; AVX2: # BB#0:
; AVX2-NEXT: vpxor %xmm0, %xmm0, %xmm0 ; AVX2-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1 ; AVX2-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1
; AVX2-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX2-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0 ; AVX2-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0 ; AVX2-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0 ; AVX2-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: retq ; AVX2-NEXT: retq
; ;
@ -704,10 +704,10 @@ define <16 x i16> @merge_16i16_i16_0uu3zzuuuuuzCuEF_volatile(i16* %ptr) nounwind
; AVX512F: # BB#0: ; AVX512F: # BB#0:
; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0 ; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
; AVX512F-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1 ; AVX512F-NEXT: vpinsrw $0, (%rdi), %xmm0, %xmm1
; AVX512F-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0 ; AVX512F-NEXT: vpinsrw $4, 24(%rdi), %xmm0, %xmm0
; AVX512F-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0 ; AVX512F-NEXT: vpinsrw $6, 28(%rdi), %xmm0, %xmm0
; AVX512F-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0 ; AVX512F-NEXT: vpinsrw $7, 30(%rdi), %xmm0, %xmm0
; AVX512F-NEXT: vpinsrw $3, 6(%rdi), %xmm1, %xmm1
; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 ; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX512F-NEXT: retq ; AVX512F-NEXT: retq
; ;
@ -716,10 +716,10 @@ define <16 x i16> @merge_16i16_i16_0uu3zzuuuuuzCuEF_volatile(i16* %ptr) nounwind
; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax ; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-AVX-NEXT: vpxor %xmm0, %xmm0, %xmm0 ; X32-AVX-NEXT: vpxor %xmm0, %xmm0, %xmm0
; X32-AVX-NEXT: vpinsrw $0, (%eax), %xmm0, %xmm1 ; X32-AVX-NEXT: vpinsrw $0, (%eax), %xmm0, %xmm1
; X32-AVX-NEXT: vpinsrw $3, 6(%eax), %xmm1, %xmm1
; X32-AVX-NEXT: vpinsrw $4, 24(%eax), %xmm0, %xmm0 ; X32-AVX-NEXT: vpinsrw $4, 24(%eax), %xmm0, %xmm0
; X32-AVX-NEXT: vpinsrw $6, 28(%eax), %xmm0, %xmm0 ; X32-AVX-NEXT: vpinsrw $6, 28(%eax), %xmm0, %xmm0
; X32-AVX-NEXT: vpinsrw $7, 30(%eax), %xmm0, %xmm0 ; X32-AVX-NEXT: vpinsrw $7, 30(%eax), %xmm0, %xmm0
; X32-AVX-NEXT: vpinsrw $3, 6(%eax), %xmm1, %xmm1
; X32-AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; X32-AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; X32-AVX-NEXT: retl ; X32-AVX-NEXT: retl
%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 0 %ptr0 = getelementptr inbounds i16, i16* %ptr, i64 0

View File

@ -21,11 +21,11 @@
; DBGDAG-DAG: [[LD2:t[0-9]+]]: i16,ch = load<LD2[%tmp81](align=1)> [[ENTRYTOKEN]], [[BASEPTR]], undef:i64 ; DBGDAG-DAG: [[LD2:t[0-9]+]]: i16,ch = load<LD2[%tmp81](align=1)> [[ENTRYTOKEN]], [[BASEPTR]], undef:i64
; DBGDAG-DAG: [[LD1:t[0-9]+]]: i8,ch = load<LD1[%tmp12]> [[ENTRYTOKEN]], [[ADDPTR]], undef:i64 ; DBGDAG-DAG: [[LD1:t[0-9]+]]: i8,ch = load<LD1[%tmp12]> [[ENTRYTOKEN]], [[ADDPTR]], undef:i64
; DBGDAG: [[LOADTOKEN:t[0-9]+]]: ch = TokenFactor [[LD2]]:1, [[LD1]]:1 ; DBGDAG-DAG: [[ST1:t[0-9]+]]: ch = store<ST1[%tmp14]> [[ENTRYTOKEN]], [[LD1]], t{{[0-9]+}}, undef:i64
; DBGDAG-DAG: [[LOADTOKEN:t[0-9]+]]: ch = TokenFactor [[LD2]]:1, [[LD1]]:1
; DBGDAG: [[ST2:t[0-9]+]]: ch = store<ST2[%tmp10](align=1)> [[LOADTOKEN]], [[LD2]], t{{[0-9]+}}, undef:i64
; DBGDAG-DAG: [[ST2:t[0-9]+]]: ch = store<ST2[%tmp10](align=1)> [[LOADTOKEN]], [[LD2]], t{{[0-9]+}}, undef:i64 ; DBGDAG: X86ISD::RET_FLAG t{{[0-9]+}},
; DBGDAG-DAG: [[ST1:t[0-9]+]]: ch = store<ST1[%tmp14]> [[ST2]], [[LD1]], t{{[0-9]+}}, undef:i64
; DBGDAG: X86ISD::RET_FLAG [[ST1]],
; DBGDAG: Type-legalized selection DAG: BB#0 'merge_store_partial_overlap_load:' ; DBGDAG: Type-legalized selection DAG: BB#0 'merge_store_partial_overlap_load:'
define void @merge_store_partial_overlap_load([4 x i8]* %tmp) { define void @merge_store_partial_overlap_load([4 x i8]* %tmp) {

View File

@ -1,31 +0,0 @@
; RUN: llc < %s -mtriple x86_64-apple-macosx10.9.0 | FileCheck %s
; PR18023
; CHECK: movabsq $4294967296, %rcx
; CHECK: movq %rcx, (%rax)
; CHECK: movl $1, 4(%rax)
; CHECK: movl $0, 4(%rax)
; CHECK: movq $1, 4(%rax)
@c = common global i32 0, align 4
@a = common global [3 x i32] zeroinitializer, align 4
@b = common global i32 0, align 4
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
define void @func() {
store i32 1, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 1), align 4
store i32 0, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 0), align 4
%1 = load volatile i32, i32* @b, align 4
store i32 1, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 1), align 4
store i32 0, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 1), align 4
%2 = load volatile i32, i32* @b, align 4
store i32 1, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 1), align 4
store i32 0, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 2), align 4
%3 = load volatile i32, i32* @b, align 4
store i32 3, i32* @c, align 4
%4 = load i32, i32* getelementptr inbounds ([3 x i32], [3 x i32]* @a, i64 0, i64 1), align 4
%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %4)
ret void
}
declare i32 @printf(i8*, ...)

View File

@ -1,8 +1,8 @@
; RUN: llc -mtriple=x86_64-unknown-unknown < %s | FileCheck %s ; RUN: llc -mtriple=x86_64-unknown-unknown < %s | FileCheck %s
; CHECK-LABEL: int32_float_pair ; CHECK-LABEL: int32_float_pair
; CHECK: movl %edi, (%rsi) ; CHECK-DAG: movl %edi, (%rsi)
; CHECK: movss %xmm0, 4(%rsi) ; CHECK-DAG: movss %xmm0, 4(%rsi)
define void @int32_float_pair(i32 %tmp1, float %tmp2, i64* %ref.tmp) { define void @int32_float_pair(i32 %tmp1, float %tmp2, i64* %ref.tmp) {
entry: entry:
%t0 = bitcast float %tmp2 to i32 %t0 = bitcast float %tmp2 to i32
@ -15,8 +15,8 @@ entry:
} }
; CHECK-LABEL: float_int32_pair ; CHECK-LABEL: float_int32_pair
; CHECK: movss %xmm0, (%rsi) ; CHECK-DAG: movss %xmm0, (%rsi)
; CHECK: movl %edi, 4(%rsi) ; CHECK-DAG: movl %edi, 4(%rsi)
define void @float_int32_pair(float %tmp1, i32 %tmp2, i64* %ref.tmp) { define void @float_int32_pair(float %tmp1, i32 %tmp2, i64* %ref.tmp) {
entry: entry:
%t0 = bitcast float %tmp1 to i32 %t0 = bitcast float %tmp1 to i32
@ -29,9 +29,9 @@ entry:
} }
; CHECK-LABEL: int16_float_pair ; CHECK-LABEL: int16_float_pair
; CHECK: movzwl %di, %eax ; CHECK-DAG: movzwl %di, %eax
; CHECK: movl %eax, (%rsi) ; CHECK-DAG: movl %eax, (%rsi)
; CHECK: movss %xmm0, 4(%rsi) ; CHECK-DAG: movss %xmm0, 4(%rsi)
define void @int16_float_pair(i16 signext %tmp1, float %tmp2, i64* %ref.tmp) { define void @int16_float_pair(i16 signext %tmp1, float %tmp2, i64* %ref.tmp) {
entry: entry:
%t0 = bitcast float %tmp2 to i32 %t0 = bitcast float %tmp2 to i32
@ -44,9 +44,9 @@ entry:
} }
; CHECK-LABEL: int8_float_pair ; CHECK-LABEL: int8_float_pair
; CHECK: movzbl %dil, %eax ; CHECK-DAG: movzbl %dil, %eax
; CHECK: movl %eax, (%rsi) ; CHECK-DAG: movl %eax, (%rsi)
; CHECK: movss %xmm0, 4(%rsi) ; CHECK-DAG: movss %xmm0, 4(%rsi)
define void @int8_float_pair(i8 signext %tmp1, float %tmp2, i64* %ref.tmp) { define void @int8_float_pair(i8 signext %tmp1, float %tmp2, i64* %ref.tmp) {
entry: entry:
%t0 = bitcast float %tmp2 to i32 %t0 = bitcast float %tmp2 to i32

View File

@ -13,9 +13,9 @@ target triple = "x86_64-unknown-linux-gnu"
;; the same result in memory in the end. ;; the same result in memory in the end.
; CHECK-LABEL: redundant_stores_merging: ; CHECK-LABEL: redundant_stores_merging:
; CHECK: movl $123, e+8(%rip) ; CHECK: movabsq $528280977409, %rax
; CHECK: movabsq $1958505086977, %rax
; CHECK: movq %rax, e+4(%rip) ; CHECK: movq %rax, e+4(%rip)
; CHECK: movl $456, e+8(%rip)
define void @redundant_stores_merging() { define void @redundant_stores_merging() {
entry: entry:
store i32 1, i32* getelementptr inbounds (%structTy, %structTy* @e, i64 0, i32 1), align 4 store i32 1, i32* getelementptr inbounds (%structTy, %structTy* @e, i64 0, i32 1), align 4
@ -26,9 +26,9 @@ entry:
;; This variant tests PR25154. ;; This variant tests PR25154.
; CHECK-LABEL: redundant_stores_merging_reverse: ; CHECK-LABEL: redundant_stores_merging_reverse:
; CHECK: movl $123, e+8(%rip) ; CHECK: movabsq $528280977409, %rax
; CHECK: movabsq $1958505086977, %rax
; CHECK: movq %rax, e+4(%rip) ; CHECK: movq %rax, e+4(%rip)
; CHECK: movl $456, e+8(%rip)
define void @redundant_stores_merging_reverse() { define void @redundant_stores_merging_reverse() {
entry: entry:
store i32 123, i32* getelementptr inbounds (%structTy, %structTy* @e, i64 0, i32 2), align 4 store i32 123, i32* getelementptr inbounds (%structTy, %structTy* @e, i64 0, i32 2), align 4
@ -45,9 +45,8 @@ entry:
;; a movl, after the store to 3). ;; a movl, after the store to 3).
;; CHECK-LABEL: overlapping_stores_merging: ;; CHECK-LABEL: overlapping_stores_merging:
;; CHECK: movw $0, b+2(%rip) ;; CHECK: movl $1, b(%rip)
;; CHECK: movw $2, b+3(%rip) ;; CHECK: movw $2, b+3(%rip)
;; CHECK: movw $1, b(%rip)
define void @overlapping_stores_merging() { define void @overlapping_stores_merging() {
entry: entry:
store i16 0, i16* bitcast (i8* getelementptr inbounds ([8 x i8], [8 x i8]* @b, i64 0, i64 2) to i16*), align 2 store i16 0, i16* bitcast (i8* getelementptr inbounds ([8 x i8], [8 x i8]* @b, i64 0, i64 2) to i16*), align 2

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; NOTE: Assertions have been autogenerated by update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx | FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx | FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 | FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx2 | FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2
@ -18,7 +18,7 @@ define <4 x double> @var_shuffle_v4f64_v4f64_xxxx_i64(<4 x double> %x, i64 %i0,
; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0] ; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
; ALL-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero ; ALL-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; ALL-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0] ; ALL-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; ALL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 ; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; ALL-NEXT: movq %rbp, %rsp ; ALL-NEXT: movq %rbp, %rsp
; ALL-NEXT: popq %rbp ; ALL-NEXT: popq %rbp
; ALL-NEXT: retq ; ALL-NEXT: retq
@ -67,7 +67,7 @@ define <4 x double> @var_shuffle_v4f64_v2f64_xxxx_i64(<2 x double> %x, i64 %i0,
; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0] ; ALL-NEXT: vmovhpd {{.*#+}} xmm0 = xmm0[0],mem[0]
; ALL-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero ; ALL-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
; ALL-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0] ; ALL-NEXT: vmovhpd {{.*#+}} xmm1 = xmm1[0],mem[0]
; ALL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 ; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; ALL-NEXT: retq ; ALL-NEXT: retq
%x0 = extractelement <2 x double> %x, i64 %i0 %x0 = extractelement <2 x double> %x, i64 %i0
%x1 = extractelement <2 x double> %x, i64 %i1 %x1 = extractelement <2 x double> %x, i64 %i1
@ -90,11 +90,11 @@ define <4 x i64> @var_shuffle_v4i64_v4i64_xxxx_i64(<4 x i64> %x, i64 %i0, i64 %i
; AVX1-NEXT: vmovaps %ymm0, (%rsp) ; AVX1-NEXT: vmovaps %ymm0, (%rsp)
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: movq %rbp, %rsp ; AVX1-NEXT: movq %rbp, %rsp
; AVX1-NEXT: popq %rbp ; AVX1-NEXT: popq %rbp
; AVX1-NEXT: retq ; AVX1-NEXT: retq
@ -108,11 +108,11 @@ define <4 x i64> @var_shuffle_v4i64_v4i64_xxxx_i64(<4 x i64> %x, i64 %i0, i64 %i
; AVX2-NEXT: vmovaps %ymm0, (%rsp) ; AVX2-NEXT: vmovaps %ymm0, (%rsp)
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: movq %rbp, %rsp ; AVX2-NEXT: movq %rbp, %rsp
; AVX2-NEXT: popq %rbp ; AVX2-NEXT: popq %rbp
; AVX2-NEXT: retq ; AVX2-NEXT: retq
@ -137,7 +137,7 @@ define <4 x i64> @var_shuffle_v4i64_v4i64_xx00_i64(<4 x i64> %x, i64 %i0, i64 %i
; AVX1-NEXT: vmovaps %ymm0, (%rsp) ; AVX1-NEXT: vmovaps %ymm0, (%rsp)
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1 ; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: movq %rbp, %rsp ; AVX1-NEXT: movq %rbp, %rsp
@ -153,7 +153,7 @@ define <4 x i64> @var_shuffle_v4i64_v4i64_xx00_i64(<4 x i64> %x, i64 %i0, i64 %i
; AVX2-NEXT: vmovaps %ymm0, (%rsp) ; AVX2-NEXT: vmovaps %ymm0, (%rsp)
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1 ; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT: movq %rbp, %rsp ; AVX2-NEXT: movq %rbp, %rsp
@ -176,11 +176,11 @@ define <4 x i64> @var_shuffle_v4i64_v2i64_xxxx_i64(<2 x i64> %x, i64 %i0, i64 %i
; AVX1-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp) ; AVX1-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: retq ; AVX1-NEXT: retq
; ;
; AVX2-LABEL: var_shuffle_v4i64_v2i64_xxxx_i64: ; AVX2-LABEL: var_shuffle_v4i64_v2i64_xxxx_i64:
@ -188,11 +188,11 @@ define <4 x i64> @var_shuffle_v4i64_v2i64_xxxx_i64(<2 x i64> %x, i64 %i0, i64 %i
; AVX2-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp) ; AVX2-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: retq ; AVX2-NEXT: retq
%x0 = extractelement <2 x i64> %x, i64 %i0 %x0 = extractelement <2 x i64> %x, i64 %i0
%x1 = extractelement <2 x i64> %x, i64 %i1 %x1 = extractelement <2 x i64> %x, i64 %i1
@ -210,29 +210,29 @@ define <8 x float> @var_shuffle_v8f32_v8f32_xxxxxxxx_i32(<8 x float> %x, i32 %i0
; AVX1: # BB#0: ; AVX1: # BB#0:
; AVX1-NEXT: pushq %rbp ; AVX1-NEXT: pushq %rbp
; AVX1-NEXT: movq %rsp, %rbp ; AVX1-NEXT: movq %rsp, %rbp
; AVX1-NEXT: pushq %rbx
; AVX1-NEXT: andq $-32, %rsp ; AVX1-NEXT: andq $-32, %rsp
; AVX1-NEXT: subq $64, %rsp ; AVX1-NEXT: subq $64, %rsp
; AVX1-NEXT: movslq %edi, %rax ; AVX1-NEXT: movslq %edi, %rax
; AVX1-NEXT: movslq %esi, %rsi ; AVX1-NEXT: movslq %esi, %rbx
; AVX1-NEXT: movslq %edx, %rdx ; AVX1-NEXT: movslq %edx, %r11
; AVX1-NEXT: movslq %ecx, %r11 ; AVX1-NEXT: movslq %ecx, %r10
; AVX1-NEXT: movslq %r8d, %r10 ; AVX1-NEXT: movslq %r8d, %rdi
; AVX1-NEXT: vmovaps %ymm0, (%rsp) ; AVX1-NEXT: vmovaps %ymm0, (%rsp)
; AVX1-NEXT: movslq %r9d, %r8 ; AVX1-NEXT: movslq %r9d, %rcx
; AVX1-NEXT: movslq 16(%rbp), %rdi ; AVX1-NEXT: movslq 16(%rbp), %rdx
; AVX1-NEXT: movslq 24(%rbp), %rcx ; AVX1-NEXT: movslq 24(%rbp), %rsi
; AVX1-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero ; AVX1-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero ; AVX1-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; AVX1-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero ; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[2,3]
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0],mem[0],xmm2[2,3] ; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],mem[0],xmm2[3] ; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],mem[0]
; AVX1-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1,2],mem[0] ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero ; AVX1-NEXT: leaq -8(%rbp), %rsp
; AVX1-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0],mem[0],xmm3[2,3] ; AVX1-NEXT: popq %rbx
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm3[0,1],xmm0[0],xmm3[3]
; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
; AVX1-NEXT: movq %rbp, %rsp
; AVX1-NEXT: popq %rbp ; AVX1-NEXT: popq %rbp
; AVX1-NEXT: retq ; AVX1-NEXT: retq
; ;
@ -284,26 +284,26 @@ define <8 x float> @var_shuffle_v8f32_v8f32_xxxxxxxx_i32(<8 x float> %x, i32 %i0
define <8 x float> @var_shuffle_v8f32_v4f32_xxxxxxxx_i32(<4 x float> %x, i32 %i0, i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6, i32 %i7) nounwind { define <8 x float> @var_shuffle_v8f32_v4f32_xxxxxxxx_i32(<4 x float> %x, i32 %i0, i32 %i1, i32 %i2, i32 %i3, i32 %i4, i32 %i5, i32 %i6, i32 %i7) nounwind {
; ALL-LABEL: var_shuffle_v8f32_v4f32_xxxxxxxx_i32: ; ALL-LABEL: var_shuffle_v8f32_v4f32_xxxxxxxx_i32:
; ALL: # BB#0: ; ALL: # BB#0:
; ALL-NEXT: pushq %rbx
; ALL-NEXT: movslq %edi, %rax ; ALL-NEXT: movslq %edi, %rax
; ALL-NEXT: movslq %esi, %rsi ; ALL-NEXT: movslq %esi, %rbx
; ALL-NEXT: movslq %edx, %rdx ; ALL-NEXT: movslq %edx, %r11
; ALL-NEXT: movslq %ecx, %r11 ; ALL-NEXT: movslq %ecx, %r10
; ALL-NEXT: movslq %r8d, %r10 ; ALL-NEXT: movslq %r8d, %rdi
; ALL-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp) ; ALL-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
; ALL-NEXT: movslq %r9d, %r8 ; ALL-NEXT: movslq %r9d, %rcx
; ALL-NEXT: movslq {{[0-9]+}}(%rsp), %rdi ; ALL-NEXT: movslq {{[0-9]+}}(%rsp), %rdx
; ALL-NEXT: movslq {{[0-9]+}}(%rsp), %rcx ; ALL-NEXT: movslq {{[0-9]+}}(%rsp), %rsi
; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero ; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; ALL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
; ALL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
; ALL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
; ALL-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero ; ALL-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; ALL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero ; ALL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[2,3]
; ALL-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0],mem[0],xmm2[2,3] ; ALL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],mem[0],xmm1[3]
; ALL-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],mem[0],xmm2[3] ; ALL-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],mem[0]
; ALL-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1,2],mem[0] ; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; ALL-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero ; ALL-NEXT: popq %rbx
; ALL-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0],mem[0],xmm3[2,3]
; ALL-NEXT: vinsertps {{.*#+}} xmm0 = xmm3[0,1],xmm0[0],xmm3[3]
; ALL-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
; ALL-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
; ALL-NEXT: retq ; ALL-NEXT: retq
%x0 = extractelement <4 x float> %x, i32 %i0 %x0 = extractelement <4 x float> %x, i32 %i0
%x1 = extractelement <4 x float> %x, i32 %i1 %x1 = extractelement <4 x float> %x, i32 %i1
@ -336,26 +336,19 @@ define <16 x i16> @var_shuffle_v16i16_v16i16_xxxxxxxxxxxxxxxx_i16(<16 x i16> %x,
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax
; AVX1-NEXT: vmovd %eax, %xmm0 ; AVX1-NEXT: vmovd %eax, %xmm0
; AVX1-NEXT: movslq 40(%rbp), %rax ; AVX1-NEXT: movslq 40(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $1, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $1, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 48(%rbp), %rax ; AVX1-NEXT: movslq 48(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $2, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $2, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 56(%rbp), %rax ; AVX1-NEXT: movslq 56(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $3, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $3, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 64(%rbp), %rax ; AVX1-NEXT: movslq 64(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $4, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $4, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 72(%rbp), %rax ; AVX1-NEXT: movslq 72(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $5, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 80(%rbp), %rax ; AVX1-NEXT: movslq 80(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $6, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $6, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq 88(%rbp), %rax ; AVX1-NEXT: movslq 88(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $7, (%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $7, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq %edi, %rax ; AVX1-NEXT: movslq %edi, %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax
; AVX1-NEXT: vmovd %eax, %xmm1 ; AVX1-NEXT: vmovd %eax, %xmm1
@ -370,11 +363,9 @@ define <16 x i16> @var_shuffle_v16i16_v16i16_xxxxxxxxxxxxxxxx_i16(<16 x i16> %x,
; AVX1-NEXT: movslq %r9d, %rax ; AVX1-NEXT: movslq %r9d, %rax
; AVX1-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm1, %xmm1 ; AVX1-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: movslq 16(%rbp), %rax ; AVX1-NEXT: movslq 16(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $6, (%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
; AVX1-NEXT: movslq 24(%rbp), %rax ; AVX1-NEXT: movslq 24(%rbp), %rax
; AVX1-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $7, (%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: vpinsrw $7, %eax, %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: movq %rbp, %rsp ; AVX1-NEXT: movq %rbp, %rsp
; AVX1-NEXT: popq %rbp ; AVX1-NEXT: popq %rbp
@ -391,26 +382,19 @@ define <16 x i16> @var_shuffle_v16i16_v16i16_xxxxxxxxxxxxxxxx_i16(<16 x i16> %x,
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax
; AVX2-NEXT: vmovd %eax, %xmm0 ; AVX2-NEXT: vmovd %eax, %xmm0
; AVX2-NEXT: movslq 40(%rbp), %rax ; AVX2-NEXT: movslq 40(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $1, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $1, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 48(%rbp), %rax ; AVX2-NEXT: movslq 48(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $2, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $2, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 56(%rbp), %rax ; AVX2-NEXT: movslq 56(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $3, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $3, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 64(%rbp), %rax ; AVX2-NEXT: movslq 64(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $4, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $4, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 72(%rbp), %rax ; AVX2-NEXT: movslq 72(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $5, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 80(%rbp), %rax ; AVX2-NEXT: movslq 80(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $6, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $6, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq 88(%rbp), %rax ; AVX2-NEXT: movslq 88(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $7, (%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $7, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq %edi, %rax ; AVX2-NEXT: movslq %edi, %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax
; AVX2-NEXT: vmovd %eax, %xmm1 ; AVX2-NEXT: vmovd %eax, %xmm1
@ -425,11 +409,9 @@ define <16 x i16> @var_shuffle_v16i16_v16i16_xxxxxxxxxxxxxxxx_i16(<16 x i16> %x,
; AVX2-NEXT: movslq %r9d, %rax ; AVX2-NEXT: movslq %r9d, %rax
; AVX2-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm1, %xmm1 ; AVX2-NEXT: vpinsrw $5, (%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: movslq 16(%rbp), %rax ; AVX2-NEXT: movslq 16(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $6, (%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
; AVX2-NEXT: movslq 24(%rbp), %rax ; AVX2-NEXT: movslq 24(%rbp), %rax
; AVX2-NEXT: movzwl (%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $7, (%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: vpinsrw $7, %eax, %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: movq %rbp, %rsp ; AVX2-NEXT: movq %rbp, %rsp
; AVX2-NEXT: popq %rbp ; AVX2-NEXT: popq %rbp
@ -477,26 +459,19 @@ define <16 x i16> @var_shuffle_v16i16_v8i16_xxxxxxxxxxxxxxxx_i16(<8 x i16> %x, i
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax
; AVX1-NEXT: vmovd %eax, %xmm0 ; AVX1-NEXT: vmovd %eax, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $1, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $1, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $2, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $2, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $3, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $3, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $4, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $4, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $5, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $6, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $6, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $7, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX1-NEXT: vpinsrw $7, %eax, %xmm0, %xmm0
; AVX1-NEXT: movslq %edi, %rax ; AVX1-NEXT: movslq %edi, %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax
; AVX1-NEXT: vmovd %eax, %xmm1 ; AVX1-NEXT: vmovd %eax, %xmm1
@ -511,11 +486,9 @@ define <16 x i16> @var_shuffle_v16i16_v8i16_xxxxxxxxxxxxxxxx_i16(<8 x i16> %x, i
; AVX1-NEXT: movslq %r9d, %rax ; AVX1-NEXT: movslq %r9d, %rax
; AVX1-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm1, %xmm1 ; AVX1-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $6, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX1-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX1-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX1-NEXT: vpinsrw $7, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX1-NEXT: vpinsrw $7, %eax, %xmm1, %xmm1
; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0 ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: retq ; AVX1-NEXT: retq
; ;
@ -526,26 +499,19 @@ define <16 x i16> @var_shuffle_v16i16_v8i16_xxxxxxxxxxxxxxxx_i16(<8 x i16> %x, i
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax
; AVX2-NEXT: vmovd %eax, %xmm0 ; AVX2-NEXT: vmovd %eax, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $1, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $1, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $2, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $2, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $3, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $3, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $4, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $4, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $5, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $6, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $6, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $7, -24(%rsp,%rax,2), %xmm0, %xmm0
; AVX2-NEXT: vpinsrw $7, %eax, %xmm0, %xmm0
; AVX2-NEXT: movslq %edi, %rax ; AVX2-NEXT: movslq %edi, %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax
; AVX2-NEXT: vmovd %eax, %xmm1 ; AVX2-NEXT: vmovd %eax, %xmm1
@ -560,11 +526,9 @@ define <16 x i16> @var_shuffle_v16i16_v8i16_xxxxxxxxxxxxxxxx_i16(<8 x i16> %x, i
; AVX2-NEXT: movslq %r9d, %rax ; AVX2-NEXT: movslq %r9d, %rax
; AVX2-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm1, %xmm1 ; AVX2-NEXT: vpinsrw $5, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $6, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: vpinsrw $6, %eax, %xmm1, %xmm1
; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax ; AVX2-NEXT: movslq {{[0-9]+}}(%rsp), %rax
; AVX2-NEXT: movzwl -24(%rsp,%rax,2), %eax ; AVX2-NEXT: vpinsrw $7, -24(%rsp,%rax,2), %xmm1, %xmm1
; AVX2-NEXT: vpinsrw $7, %eax, %xmm1, %xmm1
; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0 ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: retq ; AVX2-NEXT: retq
%x0 = extractelement <8 x i16> %x, i32 %i0 %x0 = extractelement <8 x i16> %x, i32 %i0
@ -620,11 +584,11 @@ define <4 x i64> @mem_shuffle_v4i64_v4i64_xxxx_i64(<4 x i64> %x, i64* %i) nounwi
; AVX1-NEXT: vmovaps %ymm0, (%rsp) ; AVX1-NEXT: vmovaps %ymm0, (%rsp)
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: movq %rbp, %rsp ; AVX1-NEXT: movq %rbp, %rsp
; AVX1-NEXT: popq %rbp ; AVX1-NEXT: popq %rbp
; AVX1-NEXT: retq ; AVX1-NEXT: retq
@ -642,11 +606,11 @@ define <4 x i64> @mem_shuffle_v4i64_v4i64_xxxx_i64(<4 x i64> %x, i64* %i) nounwi
; AVX2-NEXT: vmovaps %ymm0, (%rsp) ; AVX2-NEXT: vmovaps %ymm0, (%rsp)
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT: movq %rbp, %rsp ; AVX2-NEXT: movq %rbp, %rsp
; AVX2-NEXT: popq %rbp ; AVX2-NEXT: popq %rbp
; AVX2-NEXT: retq ; AVX2-NEXT: retq
@ -679,11 +643,11 @@ define <4 x i64> @mem_shuffle_v4i64_v2i64_xxxx_i64(<2 x i64> %x, i64* %i) nounwi
; AVX1-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp) ; AVX1-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0] ; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX1-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq ; AVX1-NEXT: retq
; ;
; AVX2-LABEL: mem_shuffle_v4i64_v2i64_xxxx_i64: ; AVX2-LABEL: mem_shuffle_v4i64_v2i64_xxxx_i64:
@ -695,11 +659,11 @@ define <4 x i64> @mem_shuffle_v4i64_v2i64_xxxx_i64(<2 x i64> %x, i64* %i) nounwi
; AVX2-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp) ; AVX2-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0] ; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
; AVX2-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero ; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; AVX2-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT: retq ; AVX2-NEXT: retq
%p0 = getelementptr inbounds i64, i64* %i, i32 0 %p0 = getelementptr inbounds i64, i64* %i, i32 0
%p1 = getelementptr inbounds i64, i64* %i, i32 1 %p1 = getelementptr inbounds i64, i64* %i, i32 1

View File

@ -27,23 +27,26 @@ catch:
; CHECK-LABEL: _use_except_handler3: ; CHECK-LABEL: _use_except_handler3:
; CHECK: pushl %ebp ; CHECK: pushl %ebp
; CHECK: movl %esp, %ebp ; CHECK-NEXT: movl %esp, %ebp
; CHECK: pushl %ebx ; CHECK-NEXT: pushl %ebx
; CHECK: pushl %edi ; CHECK-NEXT: pushl %edi
; CHECK: pushl %esi ; CHECK-NEXT: pushl %esi
; CHECK: subl ${{[0-9]+}}, %esp ; CHECK-NEXT: subl ${{[0-9]+}}, %esp
; CHECK: movl $-1, -16(%ebp) ; CHECK-NEXT: movl %esp, -36(%ebp)
; CHECK: movl $L__ehtable$use_except_handler3, -20(%ebp) ; CHECK-NEXT: movl $-1, -16(%ebp)
; CHECK: leal -28(%ebp), %[[node:[^ ,]*]] ; CHECK-NEXT: movl $L__ehtable$use_except_handler3, -20(%ebp)
; CHECK: movl $__except_handler3, -24(%ebp) ; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
; CHECK: movl %fs:0, %[[next:[^ ,]*]] ; CHECK-NEXT: movl $__except_handler3, -24(%ebp)
; CHECK: movl %[[next]], -28(%ebp) ; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
; CHECK: movl %[[node]], %fs:0 ; CHECK-NEXT: movl %[[next]], -28(%ebp)
; CHECK: calll _may_throw_or_crash ; CHECK-NEXT: movl %[[node]], %fs:0
; CHECK-NEXT: movl $0, -16(%ebp)
; CHECK-NEXT: calll _may_throw_or_crash
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]] ; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
; CHECK: movl %[[next]], %fs:0 ; CHECK-NEXT: movl %[[next]], %fs:0
; CHECK: retl ; CHECK: retl
; CHECK: LBB1_2: # %catch{{$}} ; CHECK-NEXT: LBB1_2: # %catch{{$}}
; CHECK: .section .xdata,"dr" ; CHECK: .section .xdata,"dr"
; CHECK-LABEL: L__ehtable$use_except_handler3: ; CHECK-LABEL: L__ehtable$use_except_handler3:
@ -66,23 +69,37 @@ catch:
; CHECK-LABEL: _use_except_handler4: ; CHECK-LABEL: _use_except_handler4:
; CHECK: pushl %ebp ; CHECK: pushl %ebp
; CHECK: movl %esp, %ebp ; CHECK-NEXT: movl %esp, %ebp
; CHECK: subl ${{[0-9]+}}, %esp ; CHECK-NEXT: pushl %ebx
; CHECK: movl %esp, -36(%ebp) ; CHECK-NEXT: pushl %edi
; CHECK: movl $-2, -16(%ebp) ; CHECK-NEXT: pushl %esi
; CHECK: movl $L__ehtable$use_except_handler4, %[[lsda:[^ ,]*]] ; CHECK-NEXT: subl ${{[0-9]+}}, %esp
; CHECK: xorl ___security_cookie, %[[lsda]] ; CHECK-NEXT: movl %ebp, %eax
; CHECK: movl %[[lsda]], -20(%ebp) ; CHECK-NEXT: movl %esp, -36(%ebp)
; CHECK: leal -28(%ebp), %[[node:[^ ,]*]] ; CHECK-NEXT: movl $-2, -16(%ebp)
; CHECK: movl $__except_handler4, -24(%ebp) ; CHECK-NEXT: movl $L__ehtable$use_except_handler4, %[[lsda:[^ ,]*]]
; CHECK: movl %fs:0, %[[next:[^ ,]*]] ; CHECK-NEXT: movl ___security_cookie, %[[seccookie:[^ ,]*]]
; CHECK: movl %[[next]], -28(%ebp) ; CHECK-NEXT: xorl %[[seccookie]], %[[lsda]]
; CHECK: movl %[[node]], %fs:0 ; CHECK-NEXT: movl %[[lsda]], -20(%ebp)
; CHECK: calll _may_throw_or_crash ; CHECK-NEXT: xorl %[[seccookie]], %[[tmp1:[^ ,]*]]
; CHECK-NEXT: movl %[[tmp1]], -40(%ebp)
; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
; CHECK-NEXT: movl $__except_handler4, -24(%ebp)
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
; CHECK-NEXT: movl %[[next]], -28(%ebp)
; CHECK-NEXT: movl %[[node]], %fs:0
; CHECK-NEXT: movl $0, -16(%ebp)
; CHECK-NEXT: calll _may_throw_or_crash
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]] ; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
; CHECK: movl %[[next]], %fs:0 ; CHECK-NEXT: movl %[[next]], %fs:0
; CHECK: retl ; CHECK-NEXT: addl $28, %esp
; CHECK: LBB2_2: # %catch{{$}} ; CHECK-NEXT: popl %esi
; CHECK-NEXT: popl %edi
; CHECK-NEXT: popl %ebx
; CHECK-NEXT: popl %ebp
; CHECK-NEXT: retl
; CHECK-NEXT: LBB2_2: # %catch{{$}}
; CHECK: .section .xdata,"dr" ; CHECK: .section .xdata,"dr"
; CHECK-LABEL: L__ehtable$use_except_handler4: ; CHECK-LABEL: L__ehtable$use_except_handler4:
@ -109,26 +126,33 @@ catch:
; CHECK-LABEL: _use_except_handler4_ssp: ; CHECK-LABEL: _use_except_handler4_ssp:
; CHECK: pushl %ebp ; CHECK: pushl %ebp
; CHECK: movl %esp, %ebp ; CHECK-NEXT: movl %esp, %ebp
; CHECK: subl ${{[0-9]+}}, %esp ; CHECK-NEXT: pushl %ebx
; CHECK: movl %ebp, %[[ehguard:[^ ,]*]] ; CHECK-NEXT: pushl %edi
; CHECK: movl %esp, -36(%ebp) ; CHECK-NEXT: pushl %esi
; CHECK: movl $-2, -16(%ebp) ; CHECK-NEXT: subl ${{[0-9]+}}, %esp
; CHECK: movl $L__ehtable$use_except_handler4_ssp, %[[lsda:[^ ,]*]] ; CHECK-NEXT: movl %ebp, %[[ehguard:[^ ,]*]]
; CHECK: xorl ___security_cookie, %[[lsda]] ; CHECK-NEXT: movl %esp, -36(%ebp)
; CHECK: movl %[[lsda]], -20(%ebp) ; CHECK-NEXT: movl $-2, -16(%ebp)
; CHECK: xorl ___security_cookie, %[[ehguard]] ; CHECK-NEXT: movl $L__ehtable$use_except_handler4_ssp, %[[lsda:[^ ,]*]]
; CHECK: movl %[[ehguard]], -40(%ebp) ; CHECK-NEXT: movl ___security_cookie, %[[seccookie:[^ ,]*]]
; CHECK: leal -28(%ebp), %[[node:[^ ,]*]] ; CHECK-NEXT: xorl %[[seccookie]], %[[lsda]]
; CHECK: movl $__except_handler4, -24(%ebp) ; CHECK-NEXT: movl %[[lsda]], -20(%ebp)
; CHECK: movl %fs:0, %[[next:[^ ,]*]] ; CHECK-NEXT: xorl %[[seccookie]], %[[ehguard]]
; CHECK: movl %[[next]], -28(%ebp) ; CHECK-NEXT: movl %[[ehguard]], -40(%ebp)
; CHECK: movl %[[node]], %fs:0 ; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
; CHECK: calll _may_throw_or_crash ; CHECK-NEXT: movl $__except_handler4, -24(%ebp)
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
; CHECK-NEXT: movl %[[next]], -28(%ebp)
; CHECK-NEXT: movl %[[node]], %fs:0
; CHECK-NEXT: movl $0, -16(%ebp)
; CHECK-NEXT: calll _may_throw_or_crash
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]] ; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
; CHECK: movl %[[next]], %fs:0 ; CHECK-NEXT: movl %[[next]], %fs:0
; CHECK: retl ; CHECK: retl
; CHECK: [[catch:[^ ,]*]]: # %catch{{$}} ; CHECK-NEXT: [[catch:[^ ,]*]]: # %catch{{$}}
; CHECK: .section .xdata,"dr" ; CHECK: .section .xdata,"dr"
; CHECK-LABEL: L__ehtable$use_except_handler4_ssp: ; CHECK-LABEL: L__ehtable$use_except_handler4_ssp:
@ -155,23 +179,26 @@ catch:
; CHECK-LABEL: _use_CxxFrameHandler3: ; CHECK-LABEL: _use_CxxFrameHandler3:
; CHECK: pushl %ebp ; CHECK: pushl %ebp
; CHECK: movl %esp, %ebp ; CHECK-NEXT: movl %esp, %ebp
; CHECK: subl ${{[0-9]+}}, %esp ; CHECK-NEXT: pushl %ebx
; CHECK: movl %esp, -28(%ebp) ; CHECK-NEXT: pushl %edi
; CHECK: movl $-1, -16(%ebp) ; CHECK-NEXT: pushl %esi
; CHECK: leal -24(%ebp), %[[node:[^ ,]*]] ; CHECK-NEXT: subl ${{[0-9]+}}, %esp
; CHECK: movl $___ehhandler$use_CxxFrameHandler3, -20(%ebp) ; CHECK-NEXT: movl %esp, -28(%ebp)
; CHECK: movl %fs:0, %[[next:[^ ,]*]] ; CHECK-NEXT: movl $-1, -16(%ebp)
; CHECK: movl %[[next]], -24(%ebp) ; CHECK-NEXT: leal -24(%ebp), %[[node:[^ ,]*]]
; CHECK: movl %[[node]], %fs:0 ; CHECK-NEXT: movl $___ehhandler$use_CxxFrameHandler3, -20(%ebp)
; CHECK: movl $0, -16(%ebp) ; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
; CHECK: calll _may_throw_or_crash ; CHECK-NEXT: movl %[[next]], -24(%ebp)
; CHECK-NEXT: movl %[[node]], %fs:0
; CHECK-NEXT: movl $0, -16(%ebp)
; CHECK-NEXT: calll _may_throw_or_crash
; CHECK: movl -24(%ebp), %[[next:[^ ,]*]] ; CHECK: movl -24(%ebp), %[[next:[^ ,]*]]
; CHECK: movl %[[next]], %fs:0 ; CHECK-NEXT: movl %[[next]], %fs:0
; CHECK: retl ; CHECK: retl
; CHECK: .section .xdata,"dr" ; CHECK: .section .xdata,"dr"
; CHECK: .p2align 2 ; CHECK-NEXT: .p2align 2
; CHECK-LABEL: L__ehtable$use_CxxFrameHandler3: ; CHECK-LABEL: L__ehtable$use_CxxFrameHandler3:
; CHECK-NEXT: .long 429065506 ; CHECK-NEXT: .long 429065506
; CHECK-NEXT: .long 2 ; CHECK-NEXT: .long 2
@ -185,8 +212,8 @@ catch:
; CHECK-LABEL: ___ehhandler$use_CxxFrameHandler3: ; CHECK-LABEL: ___ehhandler$use_CxxFrameHandler3:
; CHECK: movl $L__ehtable$use_CxxFrameHandler3, %eax ; CHECK: movl $L__ehtable$use_CxxFrameHandler3, %eax
; CHECK: jmp ___CxxFrameHandler3 # TAILCALL ; CHECK-NEXT: jmp ___CxxFrameHandler3 # TAILCALL
; CHECK: .safeseh __except_handler3 ; CHECK: .safeseh __except_handler3
; CHECK: .safeseh __except_handler4 ; CHECK-NEXT: .safeseh __except_handler4
; CHECK: .safeseh ___ehhandler$use_CxxFrameHandler3 ; CHECK-NEXT: .safeseh ___ehhandler$use_CxxFrameHandler3

View File

@ -26,10 +26,10 @@ entry:
; CHECK-LABEL: test_vararg ; CHECK-LABEL: test_vararg
; CHECK: extsp 6 ; CHECK: extsp 6
; CHECK: stw lr, sp[1] ; CHECK: stw lr, sp[1]
; CHECK: stw r3, sp[6]
; CHECK: stw r0, sp[3] ; CHECK: stw r0, sp[3]
; CHECK: stw r1, sp[4] ; CHECK: stw r1, sp[4]
; CHECK: stw r2, sp[5] ; CHECK: stw r2, sp[5]
; CHECK: stw r3, sp[6]
; CHECK: ldaw r0, sp[3] ; CHECK: ldaw r0, sp[3]
; CHECK: stw r0, sp[2] ; CHECK: stw r0, sp[2]
%list = alloca i8*, align 4 %list = alloca i8*, align 4