Commit Graph

80 Commits

Author SHA1 Message Date
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Simon Pilgrim ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Sanjay Patel a0aaa11afc [SLP] fix variables names in test; NFC
'tmpXXX' conflicts with the auto-generated script regex names.
That could cause mask a bug or fail if the output changes.

llvm-svn: 356790
2019-03-22 18:33:11 +00:00
James Y Knight 786558882c Update GettingStarted guide to recommend that people use the new
official Git repository.

Remove the directions for using git-svn, and demote the prominence of
the svn instructions.

Also, fix a few other issues while I'm in there:

* Mention LLVM_ENABLE_PROJECTS more.
* Getting started doesn't need to mention test-suite, but should
  mention clang and the other projects.
* Remove mentions of "configure", since that's long gone.

I've also adjusted a few other mentions of svn to point to github, but
have not done so comprehensively.

Differential Revision: https://reviews.llvm.org/D56654

llvm-svn: 351130
2019-01-14 22:27:32 +00:00
Alexey Bataev ce2c8b3360 [SLP]Update test checks for the SPL vectorizer, NFC.
llvm-svn: 350967
2019-01-11 20:21:14 +00:00
Sanjay Patel 007416acc8 [SLPVectorizer] regenerate test checks; NFC
llvm-svn: 344848
2018-10-20 14:53:07 +00:00
Alexey Bataev 0edcd0278d [SLP] Fix insert point for reused extract instructions.
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.

Reviewers: efriedma, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50143

llvm-svn: 339166
2018-08-07 19:21:05 +00:00
Alexey Bataev c0c3a6ed5e [SLP] Fix PR38339: Instruction does not dominate all uses!
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.

Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49928

llvm-svn: 338380
2018-07-31 14:02:43 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Shiva Chen 2c864551df [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Matthew Simpson 661e6a02bd [SLP] Add additional test for transposable binary operations with reuse
llvm-svn: 331274
2018-05-01 15:59:26 +00:00
Davide Italiano bd3bf1660b [SLPVectorizer] Debug info shouldn't impact spill cost computation.
<rdar://problem/39794738>

(Also, PR32761).

Differential Revision:  https://reviews.llvm.org/D46199

llvm-svn: 331199
2018-04-30 16:57:33 +00:00
Matthew Simpson cfdec0ff70 [SLP] Add tests for transposable binary operations
These test cases are vectorizable, but we are currently unable to vectorize
them effectively.

llvm-svn: 330945
2018-04-26 14:50:04 +00:00
Haicheng Wu f7466f3164 [SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.

Differential Revision: https://reviews.llvm.org/D45469

llvm-svn: 330143
2018-04-16 18:09:49 +00:00
Haicheng Wu 5ba379557d [SLP] update a test case. NFC.
llvm-svn: 329818
2018-04-11 15:09:49 +00:00
Haicheng Wu 7f0daaeb86 [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

llvm-svn: 329035
2018-04-03 00:05:10 +00:00
Haicheng Wu b45f921678 [SLP] Add more checks to a test case. NFC.
llvm-svn: 328572
2018-03-26 18:59:28 +00:00
Haicheng Wu 0ec1dbe417 [SLP] Add a test case. NFC.
llvm-svn: 328546
2018-03-26 16:47:37 +00:00
Matthew Simpson 6c289a1c74 [SLP] Stop counting cost of gather sequences with multiple uses
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.

Differential Revision: https://reviews.llvm.org/D44742

llvm-svn: 328316
2018-03-23 14:18:27 +00:00
Matthew Simpson b17fff79f0 [SLP] Add test case for a gather sequence with multiple uses
llvm-svn: 328133
2018-03-21 19:13:14 +00:00
Matthew Simpson eacfefd056 [AArch64] Implement getArithmeticReductionCost
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.

Differential Revision: https://reviews.llvm.org/D44490

llvm-svn: 327702
2018-03-16 11:34:15 +00:00
Sanjay Patel 04d1d79ee5 [AArch64] add SLP test based on TSVC; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326217
2018-02-27 18:06:15 +00:00
Sanjay Patel d53da082a0 [AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems
llvm-svn: 325718
2018-02-21 20:48:14 +00:00
Sanjay Patel ffe51e450f [AArch64] add SLP test for matmul (PR36280); NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 325717
2018-02-21 20:34:16 +00:00
Sanjay Patel e6143904b9 revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)
There are too many perf regressions resulting from this, so we need to 
investigate (and add tests for) targets like ARM and AArch64 before 
trying to reinstate.

llvm-svn: 325658
2018-02-21 01:42:52 +00:00
Sanjay Patel 3e8a76abfd [TTI CostModel] change default cost of FP ops to 1 (PR36280)
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this: 
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.

The div/rem costs for x86 look very wrong in some cases, but that's already true, 
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent 
problem (though the solution may be adjusted after this change is made).

Differential Revision: https://reviews.llvm.org/D43079

llvm-svn: 325515
2018-02-19 16:11:44 +00:00
Alexey Bataev ca2396e673 [SLP] Take user instructions cost into consideration in insertelement vectorization.
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42657

llvm-svn: 324893
2018-02-12 14:54:48 +00:00
Alexey Bataev 90e29b81d6 [SLP] Add/update tests for SLP vectorizer, NFC.
llvm-svn: 322225
2018-01-10 21:29:18 +00:00
Adam Nemet 572a87c76f [SLP] Added more missed optimization remarks
Summary:
Added more remarks to SLP pass, in particular "missed" optimization remarks.
Also proposed several tests for new functionality.

Patch by Vladimir Miloserdov!

For reference you may look at: https://reviews.llvm.org/rL302811

Reviewers: anemet, fhahn

Reviewed By: anemet

Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits

Differential Revision: https://reviews.llvm.org/D38367

llvm-svn: 318307
2017-11-15 17:04:53 +00:00
Sam Elliott b0c9753691 Keep Optimization Remark Yaml in NewPM
Summary:
The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds.

So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33951

Reviewers: anemet, chandlerc

Reviewed By: anemet, chandlerc

Subscribers: javed.absar, chandlerc, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D36906

llvm-svn: 311271
2017-08-20 01:30:45 +00:00
Alexey Bataev 9581b42589 [SLP] General improvements of SLP vectorization process.
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:

1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310260
2017-08-07 15:25:49 +00:00
Alexey Bataev 53d523c9eb Revert "[SLP] General improvements of SLP vectorization process."
This reverts commit r310255.

llvm-svn: 310257
2017-08-07 14:51:52 +00:00
Alexey Bataev faace8f1f1 [SLP] General improvements of SLP vectorization process.
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

llvm-svn: 310255
2017-08-07 14:03:17 +00:00
Alexey Bataev 65e9dd66f7 [SLPVectorizer] Test update, NFC.
llvm-svn: 309814
2017-08-02 14:22:53 +00:00
Amara Emerson c9916d7e97 Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.

llvm-svn: 303211
2017-05-16 21:29:22 +00:00
Adam Nemet e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Hans Wennborg bd6e9e77a7 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Adam Nemet 0aca09fc6c [SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.

llvm-svn: 302811
2017-05-11 17:06:17 +00:00
Amara Emerson 816542ceb3 [AArch64] Enable use of reduction intrinsics.
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.

The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.

Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 302678
2017-05-10 15:15:38 +00:00
Alexey Bataev dfec81107f [SLP] Fix for PR32038: extra add of PHI node when it is not required.
Summary:
If horizontal reduction tree starts from the binary operation that is
used in PHI node, but this PHI is not used in horizontal reduction, we
may end up with extra addition of this PHI node after vectorization.
Here is an example:
```
%phi = phi i32 [ %tmp, %end], ...
...
%tmp = add i32 %tmp1, %tmp2
end:
```
after vectorization we always have something like:

```
%phi = phi i32 [ %tmp, %end], ...
...
%red = extractelement <8 x 32> %vec.red, 0
%tmp = add i32 %red, %phi
end:
```
even if `%phi` is not used in reduction tree. Patch considers these PHI
nodes as extra arguments and considers them in the final result iff they
really used in reduction.

Reviewers: mkuper, hfinkel, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30409

llvm-svn: 296606
2017-03-01 10:50:44 +00:00
Alexey Bataev cb78d09d14 [SLP] A test for a fix of PR32038.
llvm-svn: 296349
2017-02-27 16:07:10 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Matthew Simpson 235e479984 Reapply "[SLP] Initialize VectorizedValue when gathering"
The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
llvm-svn: 279370
2016-08-20 14:49:02 +00:00
Vitaly Buka cc7db13bf0 Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot.
This reverts commit r279125.

https://reviews.llvm.org/D23410

llvm-svn: 279363
2016-08-20 07:09:39 +00:00
Matthew Simpson 11db6b6b8c [SLP] Initialize VectorizedValue when gathering
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.

This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.

Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410

llvm-svn: 279125
2016-08-18 19:50:32 +00:00
Matthew Simpson e5dfb08fcb [TTI] Add hook for vector extract with extension
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.

Differential Revision: http://reviews.llvm.org/D18523

llvm-svn: 267725
2016-04-27 15:20:21 +00:00
Matthew Simpson 92821cb4a8 Reapply commit r259357 with a fix for PR26629
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.

llvm-svn: 261212
2016-02-18 14:14:40 +00:00