Commit Graph

1045 Commits

Author SHA1 Message Date
Brendon Cahoon 3cc15e2cb6 [SLP] Fix assert from non-constant index in insertelement
A call to getInsertIndex() in getTreeCost() is returning None,
which causes an assert because a non-constant index value for
insertelement was not expected. This case occurs when the
insertelement index value is defined with a PHI.

Differential Revision: https://reviews.llvm.org/D120223
2022-02-22 15:57:14 -06:00
Alexey Bataev b3f4535a03 [SLP][NFC]Add a test for bottom to top reordering. 2022-02-22 13:46:34 -08:00
Philip Reames 0539a26d91 [SLP] Schedule only sub-graph of vectorizable instructions
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-02-22 10:15:55 -08:00
Alexey Bataev b0a0df9809 [SLP]Fix vectorization of the alternate cmp instruction with swapped predicates.
If the alternate cmp instruction is a swapped predicate of the main cmp
instruction, need to generate alternate instruction, not the one with
the swapped predicate. Also, the lane with the alternate opcode should
be selected only, if the corresponding operands are not compatible.

Correctness confirmed:
https://alive2.llvm.org/ce/z/94BG66

Differential Revision: https://reviews.llvm.org/D119855
2022-02-18 04:27:45 -08:00
Alexey Bataev 27f72eb25e [SLP][NFC]Add another test for swapped main/alternate cmp, NFC. 2022-02-17 09:55:16 -08:00
Shao-Ce SUN 21ac474392 [NFC] Correct typo `interger` to `integer` 2022-02-17 21:17:47 +08:00
Arthur Eubanks 826fae51d2 [SLPVectorizer][OpaquePtrs] Check GEP source element type
Fixes a miscompile with opaque pointers.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119980
2022-02-16 14:47:20 -08:00
Arthur Eubanks 4e24397805 [test][SLPVectorizer][OpaquePtr] Precommit test 2022-02-16 14:47:20 -08:00
Alexey Bataev d6371a7c60 [SLP][NFC]Add a test for miscompilation of alternate cmp instructions,
NFC.
2022-02-15 08:29:17 -08:00
Anton Afanasyev b7574b092a [SLP] Don't try to vectorize pair with insertelement
Particularly this breaks vectorization of insertelements where some of
intermediate (i.e. not last) insertelements are used externally.

Fixes PR52275
Fixes #51617

Differential Revision: https://reviews.llvm.org/D119679
2022-02-15 16:12:59 +03:00
Anton Afanasyev f16a9dffce [Test][SLP] Add tests for PR52275 2022-02-15 16:12:59 +03:00
Anton Afanasyev 954ea0f044 [SLP] Simplify indices processing for insertelements
Get rid of non-constant and undef indices of insertelements
at `buildTree()` stage. Fix bugs.

Differential Revision: https://reviews.llvm.org/D119623
2022-02-14 14:50:44 +03:00
Simon Pilgrim ea071884b0 [SLP][X86] Add common check prefix for horizontal reduction tests 2022-02-12 21:48:31 +00:00
Philip Reames 9fa3243ffc [tests] Add coverage for SLP reschedule event
This is slightly reduced from the crash reported against D117951.
2022-02-03 12:23:11 -08:00
Alexey Bataev 802ceb8343 [SLP]Excluded external uses from the reordering estimation.
Compiler adds the estimation for the external uses during operands
reordering analysis, which makes it tend to prefer duplicates in the
lanes rather than diamond/shuffled match in the graph. It changes the sizes of
the vector operands and may prevent some vectorization. We don't need
this kind of estimation for the analysis phase, because we just need to
choose the most compatible instruction and it does not matter if it has
external user or used in the non-matching lane. Instead, we count the number
of unique instruction in the lane and see if the reassociation changes
the number of unique scalars to be power of 2 or not. If we have power
of 2 unique scalars in the lane, it is considered more profitable rather
than having non-power-of-2 number of unique scalars.

Metric: SLP.NumVectorInstructions

                          test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test   70.00   86.00   22.9%
                             test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test  346.00  353.00    2.0%
                            test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test  346.00  353.00    2.0%
                         test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test  235.00  239.00    1.7%
                  test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test  235.00  239.00    1.7%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00    1.3%
                                 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00    1.2%
                         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00    1.1%
                          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00    1.1%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00    0.9%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00    0.3%
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00    0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00    0.2%
                              test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00    0.1%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00    0.1%

test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  783.00  782.00   -0.1%
                      test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   69.00   68.00   -1.4%
        test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test  207.00  192.00   -7.2%
         test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test  207.00  192.00   -7.2%
 test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   89.00   80.00  -10.1%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   89.00   80.00  -10.1%
       test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  260.00  215.00  -17.3%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  256.00  211.00  -17.6%

MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same.
SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by
one <4 x> load.
External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and
not inlined anymore.
External/SPEC/CINT2017rate/541.leela_r - same
xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in
multi-block tree, the result is pretty the same.
External/SPEC/CINT2017speed/631.deepsjeng_s - same.
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same
as before.
MultiSource/Benchmarks/MiBench/consumer-jpeg - same.

Differential Revision: https://reviews.llvm.org/D116688
2022-02-03 06:50:06 -08:00
Alexey Bataev ad2a0ccf8f [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-03 06:24:10 -08:00
Alexey Bataev 8a1dfbc4d8 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 842a2360a8 to fix the
bugs reported by users in https://reviews.llvm.org/D115955#3291538.
2022-02-02 12:06:36 -08:00
Alexey Bataev 842a2360a8 [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-02 10:32:52 -08:00
Benjamin Kramer 0c3d22a592 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 83620bd2ad.

It's causing miscompilations, see review comments at
https://reviews.llvm.org/D115955
2022-02-02 13:08:51 +01:00
Alexey Bataev 83620bd2ad [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-01 09:54:20 -08:00
Alexey Bataev 0dc33c0a9c [SLP][NFC]Add a test for alternate vectorization in cmp instructions
with same/swapped predicate.
2022-02-01 09:28:06 -08:00
Benjamin Kramer 5281f0dab2 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit afaaecc88c.

Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276
2022-02-01 11:40:43 +01:00
Alexey Bataev afaaecc88c [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-01-31 11:11:25 -08:00
Alexey Bataev cec8b614f3 [SLP]Do not reorder top nodes if they do not require reordering.
No need to reorder the top nodes, if they are not stores or
insertelement instructions and each node should be analized only
once, when the bottom-to-top analysis is performed.
We still endup with extractelements for the top node scalars and
the final shuffle just adds an extra cost and currently
crashes the compiler for PHI nodes.

Differential Revision: https://reviews.llvm.org/D116760
2022-01-28 09:16:18 -08:00
Philip Reames 15f7857412 [tests] Refresh autogen tests for SLP 2022-01-24 17:05:58 -08:00
eopXD 3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
Kito Cheng f142c45f1e [RISCV] Set getMinVectorRegisterBitWidth to 16 if enable fixed length vector code gen for RVV
getMinVectorRegisterBitWidth means what vector types is supported in
this target, and actually RISC-V support all fixed length vector types with
vector length less than `getMinRVVVectorSizeInBits`, so set it to 16,
means 2 x i8, that is minimal fixed length vector size in theory.

That also fixed one issue, some testcase migth become non-vectorizable
when `-riscv-v-vector-bits-min` set to larger value, because the vector size is
smaller than `-riscv-v-vector-bits-min`.

For example, following code can vectorize by SLP with
`-riscv-v-vector-bits-min=128` or `-riscv-v-vector-bits-min=256`, but
can't vectorize `-riscv-v-vector-bits-min=512` or larger:

```
void foo(double *da) {
  da[0] = 0;
  da[1] = 1;
  da[2] = 2;
  da[3] = 3;
}
```

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116534
2022-01-08 11:16:21 +08:00
Alexey Bataev d130df544d [SLP]Improve reordering for the nodes beeing used in alternate vectorization.
No need to include the order of the scalars beeing used as part of the
alternate vectorization into account when trying to reorder the whole
graph. Such elements better to reorder in the following phase because
the subtree still ends up in shuffle.

Part of D116688, fixes the regression in D116690.

Differential Revision: https://reviews.llvm.org/D116740
2022-01-06 11:18:57 -08:00
Alexey Bataev 7cb19fe493 [SLP]Initialize the lane with the given value instead of default 0.
There is a bug in the reordering analysis stage. If the element with the
given hash is not added to the map but has the same number of APOs and
instructions with same parent, but different instruction opcode, it will
be initalized with default values and then the counter is increased by
1. But the lane is not updated and default to 0 instead of the actual
   `Lane` value. It leads to the fact that the analysis is useless in
   many cases and default to lane 0 instead of actual lane with the
   minimum amount of APO operands.

Differential Revision: https://reviews.llvm.org/D116690
2022-01-06 10:57:11 -08:00
Alexey Bataev bf5a688252 [SLP][NFC]Add a test for the extra shuffle after alternate node, NFC. 2022-01-06 06:34:58 -08:00
Alexey Bataev 7171af7445 [SLP][NFC]Add a test for shuffled entries with different vector sizes,
NFC.
2021-12-27 07:35:35 -08:00
Alexey Bataev ab9078f3d3 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 07:21:20 -08:00
Alexey Bataev 4459a11f4d Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy."
This reverts commit fcaf290d02 to fix test
mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531
2021-12-20 07:21:18 -08:00
Alexey Bataev fcaf290d02 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 05:15:01 -08:00
Alexey Bataev 65fc992579 [SLP]Early exit out of the reordering if shuffled/perfect diamond match found.
Need to early exit out of the reordering process if the perfect/shuffled match is found in the operands. Such pattern will result in not profitable reordering because of (false positive) external use of scalars.

Differential Revision: https://reviews.llvm.org/D115811
2021-12-16 11:09:49 -08:00
Alexey Bataev 292bbed6ab [SLP][NFC] Add a test for inefficient reordering, NFC. 2021-12-15 11:05:28 -08:00
Alexey Bataev 6f2e087631 [SLP]Do not represent splats as node with the reused scalars.
No need to represent splats as a node with the reused scalars, it may
increase the cost (currently pass just ignores extra shuffle cost and it
is still not correct).

Differential Revision: https://reviews.llvm.org/D115800
2021-12-15 06:33:11 -08:00
Alexey Bataev 46bbd254c1 [SLP][NFC]Add a test for broadcast cost with undefs, NFC. 2021-12-15 05:58:47 -08:00
Alexey Bataev f00da7c3bc [SLP][NFC]Update test checks, NFC. 2021-12-14 07:35:02 -08:00
Alexey Bataev bd05376986 [SLP]Improve multinode analysis.
Changes the preliminary multinode analysis:
1. Introduced scores for reversed loads/extractelements.
2. Improved shallow score calculation.
3. Lowered the cost of external uses (no need to consider it several times, just ones).
4. The initial lane for analysis is the one with the minimal possible
   reorderings.

These changes in general shall reduce compile time and improve the
reordering in many cases.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101109
2021-12-14 06:01:52 -08:00
Philip Reames e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Alexey Bataev e5b191a433 [SLP]Improve/fix reodering for gather nodes with extractelements/undefs.
If the gather node is a mix of undefvalues and exractelement
instructions, need to take the ordering for such nodes into account too.
It allows to reorder some (sub)trees and remove some extra shuffles,
improving overall vectorization.
Also, outlined common functionality into a separate function.

Differential Revision: https://reviews.llvm.org/D115358
2021-12-13 10:59:38 -08:00
Alexey Bataev 19c5cf4167 [SLP]Fix comparator for cmp instruction vectorization.
The comparator for the sort functions should provide strict weak
ordering relation between parameters. Current solution causes compiler
crash with some standard c++ library implementations, because it does
not meet this criteria. Tried to fix it + it improves the iverall
vectorization result.

Differential Revision: https://reviews.llvm.org/D115268
2021-12-09 10:57:57 -08:00
Alexey Bataev a101a9b64b [SLP]Fix compiler crash when calculating extract cost for undefs.
Need to add an extra check for potential undef values in
computeExtractCost function to avoid compiler crash on casting to
instructon.

Differential Revision: https://reviews.llvm.org/D115162
2021-12-06 10:46:13 -08:00
Alexey Bataev ba74bb3a22 [SLP]Fix reused extracts cost.
If the extractelement instruction is used multiple times in the
different tree entries (either vectorized, or gathered), need to
compensate the scalar cost of such instructions. They are completely
removed if all users are part of the tree but we need to compensate the
cost only once for each instruction.

Differential Revision: https://reviews.llvm.org/D114958
2021-12-02 10:52:00 -08:00
Alexey Bataev 8ceccbd321 [SLP]Outline and fix code for finding common insertelement vectors.
Need to outline the code for finding common vectors in insertelement
instructions into a separate function for future patches. It also
improves the process by adding some extra checks for early exit and
fixes a bug where it always finds the match because of erroneous compare
of the same values.

Differential Revision: https://reviews.llvm.org/D114909
2021-12-02 09:18:25 -08:00
Alexey Bataev 92fbd76af5 [SLP]Improve registering and merging of compatible shuffles.
If several shuffle instructions are emitted, some of them might
same/compatible (less defined) with the previously emitted ones. Such
shuffles can be removed safely, improving the total cost of the
vectorized code.

Differential Revision: https://reviews.llvm.org/D114087
2021-12-02 08:48:29 -08:00
Alexey Bataev 75106413d0 [SLP][NFC]Add a test for extractelements with many uses vectorization, NFC. 2021-12-02 06:30:37 -08:00
Alexey Bataev afc9e7517a [SLP]Improve cost model for the shuffled extracts.
Improved the calculation of the shuffled extracts, where possible. Need
to calculate the cost for the extracted scalars if some users are not
insertelements + improved the total estimation of the shuffled scalars
used in insertelements build vectors.

Differential Revision: https://reviews.llvm.org/D113782
2021-12-01 08:10:57 -08:00