Commit Graph

281 Commits

Author SHA1 Message Date
Alexey Bataev 992ac2d5c2 [SLP] Additional test for checking that instruction with extra args is
not reconstructed.

llvm-svn: 292911
2017-01-24 10:44:00 +00:00
Alexey Bataev 95d176242b [SLP] Additional test with extra args in horizontal reductions.
llvm-svn: 292821
2017-01-23 19:28:23 +00:00
Alexey Bataev 61d8e0003c [SLP] Additional test for SLP vectorizer with 31 reduction elements.
llvm-svn: 292783
2017-01-23 11:53:16 +00:00
Alexey Bataev 4fe77b9329 [SLP] Initial test for fix of PR31690.
llvm-svn: 292631
2017-01-20 18:40:21 +00:00
Alexey Bataev f5677329a6 [SLP] A new test for horizontal vectorization for non-power-of-2
instructions.

llvm-svn: 292626
2017-01-20 18:04:29 +00:00
Mohammad Shahid 5dc021bf45 [SLP] Add a base test for jumbled store
Change-Id: I905ce08a02c76a6896dcfd9629547417c99adc4a
llvm-svn: 292581
2017-01-20 06:05:33 +00:00
Alexey Bataev f86cca1a42 [SLP] Add a tests for a fix for PR30787.
Add a test for PR30787: Failure to beneficially vectorize 'copyable'
elements in integer binary ops.

llvm-svn: 292416
2017-01-18 18:07:46 +00:00
Michael Kuperstein f69e64662b [SLP] Remove bogus assert.
The removed assert seems bogus - it's perfectly legal for the roots of the
vectorized subtrees to be equal even if the original scalar values aren't,
if the original scalars happen to be equivalent.

This fixes PR31599.

Differential Revision: https://reviews.llvm.org/D28539

llvm-svn: 291692
2017-01-11 19:23:57 +00:00
Simon Pilgrim 6cfb5caf05 Revert r290970 [SLPVectorizer] Regenerate test.
The check script will use var names before they are declared, which filecheck doesn't like.

llvm-svn: 290971
2017-01-04 16:12:07 +00:00
Simon Pilgrim 4629b46bba [SLPVectorizer] Regenerate test.
Missed var name

llvm-svn: 290970
2017-01-04 16:01:55 +00:00
Simon Pilgrim 1d5b0377af Regenerate test.
llvm-svn: 290969
2017-01-04 15:52:41 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Alexey Bataev 4160264e30 [TEST] Initial commit of tests for minmax horizontal reductions.
llvm-svn: 289817
2016-12-15 13:21:29 +00:00
Matthew Simpson 92ce0230b5 [SLP] Fix sign-extends for type-shrinking
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.

Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466

llvm-svn: 289470
2016-12-12 21:11:04 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Alexey Bataev 4f0d469d45 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 289043
2016-12-08 11:57:51 +00:00
Simon Pilgrim e633741c3a [SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations
e.g.
buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32))

llvm-svn: 288807
2016-12-06 13:29:55 +00:00
Renato Golin 5b8e7ecdb3 Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.

llvm-svn: 288508
2016-12-02 16:56:26 +00:00
Alexey Bataev e8e94a7176 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288497
2016-12-02 12:20:22 +00:00
Simon Pilgrim c70d3796fb [SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246)
llvm-svn: 288492
2016-12-02 10:54:46 +00:00
Artem Belevich 704395a25a Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts r288412 which causes severe compile-time regression.

llvm-svn: 288431
2016-12-01 22:52:15 +00:00
Alexey Bataev 2c01af5904 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 288412
2016-12-01 20:06:53 +00:00
Alexey Bataev 62af7252f1 [SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.

Differential Revision: https://reviews.llvm.org/D26277

llvm-svn: 288398
2016-12-01 18:42:42 +00:00
Alexey Bataev fc617690ab [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288377
2016-12-01 17:26:54 +00:00
Alexey Bataev e59a8351d0 Revert "[SLP] Additional tests with the cost of vector operations."
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.

llvm-svn: 288371
2016-12-01 16:45:04 +00:00
Alexey Bataev 2ff768475d [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288369
2016-12-01 16:11:48 +00:00
Alexey Bataev e951e5eb7b [SLP] Add a new test for tree vectorization starting from insertelement
instruction.

llvm-svn: 288148
2016-11-29 15:37:52 +00:00
Alexey Bataev 4fa063ebc9 [SLPVectorizer] Improved support of partial tree vectorization.
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.

Differential Revision: https://reviews.llvm.org/D25517

llvm-svn: 288115
2016-11-29 08:21:14 +00:00
Mohammad Shahid 2f5cb60b07 [SLP] Add new and update existing lit testfor providing more context to incoming patch for vectorization of jumbled load
Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9
llvm-svn: 287992
2016-11-27 03:35:31 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Alexey Bataev 2eaacda53e [SLP] Add more tests for SLP Vectorizer.
llvm-svn: 287801
2016-11-23 20:10:32 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Vyacheslav Klochkov b3dc774a99 Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
2016-11-16 00:55:50 +00:00
Vyacheslav Klochkov f1a12fe0f5 Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26543

llvm-svn: 286626
2016-11-11 19:55:29 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Alexey Bataev 46c0278e7d [SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer.
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.

Also the test is fixed to make it perform actual testing.

Differential Revision: https://reviews.llvm.org/D25671

llvm-svn: 285286
2016-10-27 12:02:28 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Simon Pilgrim cfef627b1f [SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests
llvm-svn: 283756
2016-10-10 14:28:06 +00:00
Simon Pilgrim 2c0733c678 [SLPVectorizer][X86] Add avx512 sitofp/uitofp tests
llvm-svn: 283751
2016-10-10 14:14:31 +00:00
Simon Pilgrim 6cadb5610e [SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests
Fixed copy+paste vector alignment to correct for per-element scalar loads

Increased to 512-bit data sizes in preparation of avx512 tests

llvm-svn: 283748
2016-10-10 14:10:41 +00:00
Alexey Bataev 6ad5da7c81 [SLPVectorizer] Fix for PR25748: reduction vectorization after loop
unrolling.

The next code is not vectorized by the SLPVectorizer:
```
 int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
 }
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.

Differential Revision: https://reviews.llvm.org/D24796

llvm-svn: 283535
2016-10-07 09:39:22 +00:00
Alexey Bataev 7e217c2402 [SLPVectorizer] Add a test with non-vectorizable IR.
llvm-svn: 283225
2016-10-04 15:07:23 +00:00
Sanjay Patel d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Simon Pilgrim 04e249b128 [SLPVectorizer][X86] Added fptosi/fptoui tests
llvm-svn: 283048
2016-10-01 19:35:59 +00:00
Simon Pilgrim 567c4fbdae [SLPVectorizer][X86] Added fcopysign tests
llvm-svn: 283046
2016-10-01 17:00:26 +00:00
Simon Pilgrim cceeb2a4fa [SLPVectorizer][X86] Added fabs tests
llvm-svn: 283045
2016-10-01 16:54:01 +00:00
Simon Pilgrim 34032cba14 Rename tests
llvm-svn: 281863
2016-09-18 20:25:41 +00:00
Matthew Simpson df2ab917ad [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00