Commit Graph

21 Commits

Author SHA1 Message Date
Sanjay Patel e50059f6b6 [x86] form reduction intrinsics from vectorizers instead of raw IR
Motivating examples are seen in the PhaseOrdering tests based on:
https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have
intrinsics there, some pass can fold them.

The intrinsics are still named "experimental" at this point, but
if there is no fallout from this patch, that will be a good
indicator that it is safe to finalize them.

Differential Revision: https://reviews.llvm.org/D80867
2020-06-05 12:38:49 -04:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Alexey Bataev 6a278d9073 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 372626
2019-09-23 16:25:03 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Craig Topper d1498ed8df [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621
2018-12-07 18:20:56 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Fedor Sergeev 8cd9d1b5ce Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541
2018-11-26 10:17:27 +00:00
Simon Pilgrim 924f193419 [TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970
2018-11-15 17:42:53 +00:00
Simon Pilgrim 4dd692ec2a [SLPVectorizer][X86] Regenerate reduction tests and add PR37731 test
Cleanup check prefixes

llvm-svn: 346964
2018-11-15 16:08:25 +00:00
Simon Pilgrim fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Simon Pilgrim 0c35aa114d [SLPVectorizer][X86] Add mul/and/or/xor unrolled reduction tests
We miss arithmetic reduction for everything but Add/FAdd (I assume because that's the only cases which x86 has horizontal ops for.....)

llvm-svn: 344849
2018-10-20 15:17:27 +00:00
Alexey Bataev 1e593fe73e [SLP] Update test checks, NFC.
llvm-svn: 324387
2018-02-06 20:00:05 +00:00
Alexey Bataev 62af7252f1 [SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.

Differential Revision: https://reviews.llvm.org/D26277

llvm-svn: 288398
2016-12-01 18:42:42 +00:00
Alexey Bataev fc617690ab [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288377
2016-12-01 17:26:54 +00:00
Alexey Bataev e59a8351d0 Revert "[SLP] Additional tests with the cost of vector operations."
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.

llvm-svn: 288371
2016-12-01 16:45:04 +00:00
Alexey Bataev 2ff768475d [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288369
2016-12-01 16:11:48 +00:00
Alexey Bataev 6ad5da7c81 [SLPVectorizer] Fix for PR25748: reduction vectorization after loop
unrolling.

The next code is not vectorized by the SLPVectorizer:
```
 int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
 }
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.

Differential Revision: https://reviews.llvm.org/D24796

llvm-svn: 283535
2016-10-07 09:39:22 +00:00