Commit Graph

1266 Commits

Author SHA1 Message Date
Matthew Simpson e2037d24f9 [LV] Model if-converted phi node costs
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.

Differential Revision: https://reviews.llvm.org/D31906

llvm-svn: 300980
2017-04-21 14:14:54 +00:00
Easwaran Raman 76aba5f6d7 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574
2017-04-18 18:16:57 +00:00
Gil Rapaport fb1d915ab2 [LV] Cache block mask values
This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054

llvm-svn: 300557
2017-04-18 14:43:43 +00:00
Gil Rapaport 334f8fbe47 [LV] Remove implicit single basic block assumption
This patch is part of D28975's breakdown - no change in output intended.

LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.

Differential Revision: https://reviews.llvm.org/D32040

llvm-svn: 300310
2017-04-14 07:30:23 +00:00
Anna Thomas dcdb325fee [LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.

Reviewers: mssimpso, mkuper, anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31979

llvm-svn: 300238
2017-04-13 18:59:25 +00:00
Ayal Zaks cd712b6c49 [LV] Refactor ILV to provide vectorizeInstruction(); NFC
Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide
vectorizeInstruction(). Aligning DeadInstructions with its only user.
Facilitates driving the transformation by VPlan - follows
https://reviews.llvm.org/D28975 and its tentative breakdown.

Differential Revision: https://reviews.llvm.org/D31997

llvm-svn: 300183
2017-04-13 09:07:23 +00:00
Jonas Paulsson 22776892c9 [SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.

New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

Review: Matthew Simpson
https://reviews.llvm.org/D31601

llvm-svn: 300061
2017-04-12 13:29:25 +00:00
Jonas Paulsson 592dbea779 [LoopVectorizer] Improve handling of branches during cost estimation.
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.

This patch handles these cases differently with TTI based cost estimates.

Review: Matthew Simpson
https://reviews.llvm.org/D31175

llvm-svn: 300058
2017-04-12 13:13:15 +00:00
Jonas Paulsson da74ed42da [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.

This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.

The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.

New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Review: Adam Nemet
https://reviews.llvm.org/D30680

llvm-svn: 300056
2017-04-12 12:41:37 +00:00
Jonas Paulsson fccc7d66c3 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Anna Thomas 00dc1b74b7 [LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.

A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.

Differential Revision: https://reviews.llvm.org/D31910

Reviewers: mssimpso
llvm-svn: 299985
2017-04-11 21:02:00 +00:00
Matthew Simpson 11fe2e9f2b Reapply r298620: [LV] Vectorize GEPs
This patch reapplies r298620. The original patch was reverted because of two
issues. First, the patch exposed a bug in InstCombine that caused the Chromium
builds to fail (PR32414). This issue was fixed in r299017. Second, the patch
introduced a bug in the vectorizer's scalars analysis that caused test suite
builds to fail on SystemZ. The scalars analysis was too aggressive and marked a
memory instruction scalar, even though it was going to be vectorized. This
issue has been fixed in the current patch and several new test cases for the
scalars analysis have been added.

llvm-svn: 299770
2017-04-07 14:15:34 +00:00
Matthew Simpson b8ff4a4a70 [LV] Transform truncations of non-primary induction variables
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32419
llvm-svn: 298882
2017-03-27 20:07:38 +00:00
Ivan Krasin c2124e185c Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413

Original change description:

[LV] Vectorize GEPs

This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Original Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298735
2017-03-24 20:49:43 +00:00
Matthew Simpson 4e7b71bc86 [LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298620
2017-03-23 16:29:58 +00:00
Matthew Simpson 1fb4064531 [LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.

Differential Revision: https://reviews.llvm.org/D30587

llvm-svn: 298615
2017-03-23 16:07:21 +00:00
Yi Kong 019e1c4f99 Test commit access
Remove some trailing whitespaces.

llvm-svn: 298379
2017-03-21 14:49:19 +00:00
Gil Rapaport 28d0f8ddf7 [LV] Refactor cross-iteration phi's back-patching; NFC
This patch refactors the PHisToFix loop as follows:

- The loop itself now resides in its own method.
- The new method iterates on scalar-loop's header; the PHIsToFix map formerly
  propagated as an output parameter and filled during phi widening is removed.
- The code handling reductions is moved into its own method, similar to the
  existing fixFirstOrderRecurrence().

Differential Revision: https://reviews.llvm.org/D30755

llvm-svn: 297740
2017-03-14 13:50:47 +00:00
Ayal Zaks 928ec40584 [LV] Refactor Cost Model's selectVectorizationFactor(); NFC
Refactoring Cost Model's selectVectorizationFactor() so that it handles only the
selection of the best VF from a pre-computed range of candidate VF's, extracting
early-exit criteria and the computation of a MaxVF upper-bound to other methods,
all driven by a newly introduced LoopVectorizationPlanner.

Differential Revision: https://reviews.llvm.org/D30653

llvm-svn: 297737
2017-03-14 13:07:04 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Gil Rapaport 00cb43908c [LV] Set memcheck metadata also for VF==1
This commit is a follow-up on r297580. It fixes the FIXME added temporarily
by that commit to keep the removal of Unroller's specialized version of
scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details.

llvm-svn: 297610
2017-03-13 10:23:46 +00:00
Gil Rapaport a1e5a37d3f [LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFC
Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's
variant. OTOH Vectorizer's scalarizeInstruction() already supports the special
case of VF==1 except for avoiding mask-bit extraction in that case. This patch
removes Unroller's specialized version in favor of a unified method.

The only functional difference between the two variants seems to be setting
memcheck metadata for loads and stores only in Vectorizer's variant, which is a
bug in Unroller. To keep this patch an NFC the unified method doesn't set
memcheck metadata for VF==1.

Differential Revision: https://reviews.llvm.org/D30715

llvm-svn: 297580
2017-03-12 12:31:38 +00:00
Ayal Zaks 09cf3121d8 Test commit.
llvm-svn: 297579
2017-03-12 09:48:06 +00:00
Michael Kuperstein 5fb39a7966 [SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.

Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.

Revert until we can fix this properly.

llvm-svn: 297493
2017-03-10 18:59:07 +00:00
Daniel Berlin 04d9e746f1 Add support for DenseMap/DenseSet count and find using const pointers
Summary:
Similar to SmallPtrSet, this makes find and count work with both const
referneces and const pointers.

Reviewers: dblaikie

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30713

llvm-svn: 297424
2017-03-10 00:25:26 +00:00
Adam Nemet 8c83386f89 [SLP] Mark values in Dot that need to be extracted
llvm-svn: 297361
2017-03-09 05:48:03 +00:00
Adam Nemet 95da05c3f5 [SLP] Visualize SLP trees with -view-slp-tree
Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure.  We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.

I decorated the graph where a value needs to be gathered for one reason or
another.  These are the red nodes.

There are other improvement I am planning to make as I work through my case
here.  For example, I would also like to mark nodes that need to be extracted.

Differential Revision: https://reviews.llvm.org/D30731

llvm-svn: 297303
2017-03-08 18:47:50 +00:00
Matthew Simpson 3388de1349 [LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.

llvm-svn: 297302
2017-03-08 18:18:20 +00:00
Matthew Simpson c86b2134c7 [LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.

llvm-svn: 297179
2017-03-07 18:47:30 +00:00
Michael Kuperstein 768d013a03 [SLP] Revert r296863 due to miscompiles.
Details and reproducer are on the email thread for r296863.

llvm-svn: 297103
2017-03-06 23:54:51 +00:00
Mohammad Shahid bdac9f30c0 [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.

Reviewers: mkuper

Differential Revision: https://reviews.llvm.org/D30159

Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
llvm-svn: 296863
2017-03-03 10:02:47 +00:00
Matthew Simpson 455c2ee394 [LV] Considier non-consecutive but vectorizable accesses for VF selection
When computing the smallest and largest types for selecting the maximum
vectorization factor, we currently ignore loads and stores of pointer types if
the memory access is non-consecutive. We do this because such accesses must be
scalarized regardless of vectorization factor, and thus shouldn't be considered
when determining the factor. This patch makes this check less aggressive by
also considering non-consecutive accesses that may be vectorized, such as
interleaved accesses. Because we don't know at the time of the check if an
accesses will certainly be vectorized (this is a cost model decision given a
particular VF), we consider all accesses that can potentially be vectorized.

Differential Revision: https://reviews.llvm.org/D30305

llvm-svn: 296747
2017-03-02 13:55:05 +00:00
Hans Wennborg cc4ff78c9d Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available"
It caused miscompiles, e.g. in Chromium (PR32109).

llvm-svn: 296654
2017-03-01 18:57:16 +00:00
Alexey Bataev 4a45efa431 [SLP] Preserve IR flags when vectorizing horizontal reductions.
Summary:
The SLP vectorizer should propagate IR-level optimization hints/flags
(nsw, nuw, exact, fast-math) when converting scalar horizontal
reductions instructions into vectors, just like for other vectorized
instructions.
It doe not include IR propagation for extra arguments, we need to handle
original scalar operations for extra args to propagate correct flags.

Reviewers: mkuper, mzolotukhin, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30418

llvm-svn: 296614
2017-03-01 12:43:39 +00:00
Alexey Bataev 74e5a36856 [SLP] Preserve IR flags for extra args.
Summary:
We should preserve IR flags for extra args. These IR flags should be
taken from original scalar operations, not from the reduction
operations.

Reviewers: mkuper, mzolotukhin, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30447

llvm-svn: 296613
2017-03-01 12:22:33 +00:00
Alexey Bataev dfec81107f [SLP] Fix for PR32038: extra add of PHI node when it is not required.
Summary:
If horizontal reduction tree starts from the binary operation that is
used in PHI node, but this PHI is not used in horizontal reduction, we
may end up with extra addition of this PHI node after vectorization.
Here is an example:
```
%phi = phi i32 [ %tmp, %end], ...
...
%tmp = add i32 %tmp1, %tmp2
end:
```
after vectorization we always have something like:

```
%phi = phi i32 [ %tmp, %end], ...
...
%red = extractelement <8 x 32> %vec.red, 0
%tmp = add i32 %red, %phi
end:
```
even if `%phi` is not used in reduction tree. Patch considers these PHI
nodes as extra arguments and considers them in the final result iff they
really used in reduction.

Reviewers: mkuper, hfinkel, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30409

llvm-svn: 296606
2017-03-01 10:50:44 +00:00
Adam Nemet 15032a0455 [LV] These remark should have been missed remarks
The practice in LV is that we emit analysis remarks and then finally report
either a missed or applied remark on the final decision whether vectorization
is taking place.  On this code path, we were closing with an analysis remark.

llvm-svn: 296578
2017-03-01 04:31:15 +00:00
Mohammad Shahid 175ffa8c35 [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.

Reviewers: mkuper

Differential Revision: https://reviews.llvm.org/D30159

Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d
llvm-svn: 296575
2017-03-01 03:51:54 +00:00
Adam Nemet abe46080d4 Revert "(HEAD, origin/master, origin/HEAD, master) [LV] These should missed remarks"
This reverts commit r296544.

This got committed by accident.

llvm-svn: 296546
2017-02-28 23:54:27 +00:00
Adam Nemet e193da191d [LV] These should missed remarks
llvm-svn: 296544
2017-02-28 23:48:58 +00:00
Matthew Simpson bdc9c78880 [LV] Merge floating-point and integer induction widening code
This patch merges the existing floating-point induction variable widening code
into the integer induction variable widening code, creating a single set of
functions for both kinds of inductions. The primary motivation for doing this
is to enable vector phi node creation for floating-point induction variables.

Differential Revision: https://reviews.llvm.org/D30211

llvm-svn: 296145
2017-02-24 18:20:12 +00:00
Adam Nemet 41b019a39c [LAA] Remove unused LoopAccessReport
The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.

llvm-svn: 296017
2017-02-23 21:17:36 +00:00
Adam Nemet bf0e69532c [LV] Remove unused VectorizationReport
The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.

llvm-svn: 296016
2017-02-23 21:17:31 +00:00
Alexey Bataev f77d1656af [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295972
2017-02-23 13:37:09 +00:00
Alexey Bataev 14b370c1bf Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84.

llvm-svn: 295957
2017-02-23 11:09:35 +00:00
Alexey Bataev 7ae653285d [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295956
2017-02-23 10:57:15 +00:00
Alexey Bataev 7337212e83 Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d.

llvm-svn: 295951
2017-02-23 09:59:29 +00:00
Alexey Bataev 68f2402c61 [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295949
2017-02-23 09:40:38 +00:00
Matt Arsenault f0a88dbaab LoadStoreVectorizer: Split even sized illegal chains properly
Implement isLegalToVectorizeLoadChain for AMDGPU to avoid
producing private address spaces accesses that will need to be
split up later. This was doing the wrong thing in the case
where the queried chain was an even number of elements.

A possible <4 x i32> store was being split into
store <2 x i32>
store i32
store i32

rather than
store <2 x i32>
store <2 x i32>

when legal.

llvm-svn: 295933
2017-02-23 03:58:53 +00:00
Michael Kuperstein 6181c62b95 Revert r295868 because it breaks a different SLP lit test.
llvm-svn: 295906
2017-02-22 23:35:13 +00:00