Commit Graph

3398 Commits

Author SHA1 Message Date
Philip Reames 536095a27c [RISCV] Refine costs for i1 reductions
Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware.

The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem.

Differential Revision: https://reviews.llvm.org/D127511
2022-06-10 13:21:52 -07:00
Philip Reames 679aa92d2e [RISCV] Minor test improvements for scalable scatter/gather tests added in 275b2e524 2022-06-10 13:13:49 -07:00
Philip Reames 275b2e5243 [RISCV] Add cost model coverage for scalable scatter/gather 2022-06-10 12:51:26 -07:00
Philip Reames eb912411e9 [RISCV] Add cost model coverage for mask reductions requiring legalization 2022-06-10 12:04:48 -07:00
Philip Reames d459530804 [RISCV] Fix accidental deletion of test lines in 2247e4d
Apparently, update_analyze_test_checks.sh does *not* warn on conflicting CHECKs, it just silently drops those lines from the generated test.  That is.. less than helpful.
2022-06-10 09:02:41 -07:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Philip Reames 2247e4de83 [RISCV] Use common prefixes to reduce duplication in cost model tests 2022-06-10 08:36:14 -07:00
Philip Reames 2b83467d9e [RISCV] Broaden cost model coverage for fixed vectors w/i1 element type 2022-06-10 08:12:42 -07:00
Philip Reames 0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Florian Hahn 20d798bd47
Recommit "[SCEV] Look through single value PHIs." (take 3)
This reverts commit 1fbdbb5595.

All known issues surfaced by this patch should have been fixed now.

The fixes included fixing issues with SCEV expansion in LV and DA's
reliance on LCSSA phis.
2022-06-09 15:20:10 +01:00
Bardia Mahjour c9677f6db4 [DA] Handle mismatching loop levels by considering them non-linear
To represent various loop levels within a nest, DA implements a special
numbering scheme (see comment atop establishNestingLevels). The goal of
this numbering scheme appears to be representing each unique loop
distinctively by using as little memory as possible. This numbering
scheme is simple when the source and destination of the dependence are
in the same loop. In such cases the level is simply the depth of the
loop in which src and dst reside. When the src and dst are not in the
same loop, we could run into the following situation exposed by
https://reviews.llvm.org/D71539. This patch fixes this by detecting
such cases in checkSubscripts and treating them as non-linear/non-affine.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110973
2022-06-08 11:15:37 -04:00
William Huang ba26e45ca9 [ValueTracking] Add support to deduce a PHI node being a power of 2 if each incoming value is a power of 2.
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D124889
2022-06-07 18:52:31 +00:00
David Green 53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
Philip Reames beb06f3c53 [RISCV] Add cost model test coverage of scalable reductions 2022-06-06 14:32:30 -07:00
Matthias Braun 850d53a197 LTO: Decide upfront whether to use opaque/non-opaque pointer types
LTO code may end up mixing bitcode files from various sources varying in
their use of opaque pointer types. The current strategy to decide
between opaque / typed pointers upon the first bitcode file loaded does
not work here, since we could be loading a non-opaque bitcode file first
and would then be unable to load any files with opaque pointer types
later.

So for LTO this:
- Adds an `lto::Config::OpaquePointer` option and enforces an upfront
  decision between the two modes.
- Adds `-opaque-pointers`/`-no-opaque-pointers` options to the gold
  plugin; disabled by default.
- `--opaque-pointers`/`--no-opaque-pointers` options with
  `-plugin-opt=-opaque-pointers`/`-plugin-opt=-no-opaque-pointers`
  aliases to lld; disabled by default.
- Adds an `-lto-opaque-pointers` option to the `llvm-lto2` tool.
- Changes the clang driver to pass `-plugin-opt=-opaque-pointers` to
  the linker in LTO modes when clang was configured with opaque
  pointers enabled by default.

This fixes https://github.com/llvm/llvm-project/issues/55377

Differential Revision: https://reviews.llvm.org/D125847
2022-06-01 18:05:53 -07:00
Alexander Kornienko 7aa8a67882 Revert "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 5890b30105 as per discussion
on the review thread: https://reviews.llvm.org/D114487#3547560.
2022-06-01 15:24:27 +02:00
Nikita Popov 03aceab08b [ValueTracking] Enable -branch-on-poison-as-ub by default
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.

Differential Revision: https://reviews.llvm.org/D125299
2022-06-01 10:46:06 +02:00
William Huang 35b0955aa5 [ValueTracking] Added support to deduce PHI Nodes values being a power of 2
Add Value Tracking support to deduce induction variable being a power of 2, allowing urem optimizations

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D126018
2022-05-26 20:30:31 +00:00
Florian Hahn 6af5f5697c
[SCEV] Collect conditions from assumes same way as for branches.
Also collect conditions from assume up-front in applyLoopGuards.
This allows re-using the logic to handle logical ANDs as assume
conditions.

It should should pave the road for a fix for #55645.
2022-05-26 18:17:13 +01:00
Florian Hahn 9c66ed9b73
[SCEV] Add test with loop guarded by assume with an AND condition.
Show a missed case where the AND is currently blocks applying the
information from the assume.
2022-05-26 16:08:32 +01:00
Ivan Kosarev ad1d60c3be [FileCheck] Catch missspelled directives.
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D125604
2022-05-26 11:37:19 +01:00
David Green 75631438e3 [AArch64] Costmodel tests for llvm.vscale intrinsics. NFC
These shows that the cost of a @llvm.vscale is indeed 1, not 10.
2022-05-26 10:16:21 +01:00
Congzhe Cao 80ab16d0ed [NFC][LoopCacheAnalysis] Update test cases to make sure the outputs follow the right order
In this patch we change test cases from using "CHECK" to using
"CHECK-NEXT", which is to ensure the order of loops output by
loop cache analysis is correct. After D124725 we fixed the
non-deterministic output order hence we did not use "CHECK-DAG"
anymore, and now we should really use "CHECK-NEXT" to make sure
the loops in the output loop vector follow the right order.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124984
2022-05-25 23:32:00 -04:00
Simon Pilgrim 6c80267d0f [CostModel][X86] getScalarizationOverhead - improve extraction costs for > 128-bit vectors
We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the individual element extraction costs.

This is fine for 128-bit vectors, but for 256/512-bit vectors each element extraction also has to account for extracting the upper 128-bit subvector extraction before it can handle the element. For scalarization costs we only need to extract each demanded subvector once.

Differential Revision: https://reviews.llvm.org/D125527
2022-05-24 15:18:08 +01:00
David Green b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green 80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
William Huang 0f37ba7b23 [ValueTracking] Baseline tests for Power-of-2 value tracking on PHI nodes
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D124885
2022-05-19 18:30:14 +00:00
David Green 4c6a070a2c [AArch64] Teach perfect shuffles tables about D-lane movs
Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle
tables, slightly lowering the costs a little more. This is a rough
improvement in general, especially if you ignore mov v0.16b, v2.16b type
moves that are often artefacts of the calling convention.

The D register movs are encoded as (0x4 | LaneIdx), and to generate a D
register move we are required to bitcast into a higher type, but it is
otherwise very similar to the S-lane mov's already supported.

Differential Revision: https://reviews.llvm.org/D125477
2022-05-17 18:16:45 +01:00
Florian Hahn 5890b30105
[LAA] Initial support for runtime checks with pointer selects.
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.

The more sophisticated logic for analyzing forks is in D108699

Reviewed By: huntergr

Differential Revision: https://reviews.llvm.org/D114487
2022-05-12 19:33:48 +01:00
Simon Pilgrim 47be07074a [CostModel][X86] Auto generate partial interleaved load LV costs using UTC_ARGS --filter control 2022-05-12 17:46:41 +01:00
Simon Pilgrim 14e83ada16 [CostModel][X86] Auto generate masked load/store LV costs using UTC_ARGS --filter control
Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2
2022-05-12 17:40:40 +01:00
Simon Pilgrim a5c45c4dc1 [CostModel][X86] Auto generate gather/scatter LV costs using UTC_ARGS --filter control
Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2
2022-05-12 17:39:06 +01:00
Craig Topper 39e63bd2d8 [IR][CostModel] A scalable vector shuffle can't be an identity or reverse shuffle.
Even if the minimum number of elements is 1 and the length doesn't change,
we don't know what vscale is so we can't classify it as identity mask. Instead it
is a zero element splat.

For reverse, we shouldn't classify it as a reverse unless there are at least 2 elements
in the mask. This applies to both fixed and scalable vectors. For fixed vectors, a single
element would be an identity shuffle. For scalable vector it's a zero elt splat.

Reviewed By: sdesmalen, liaolucy

Differential Revision: https://reviews.llvm.org/D124655
2022-05-09 21:37:25 -07:00
Nikita Popov 68e1ba8188 [SCEV] Fold umin_seq using known predicate
Fold %x umin_seq %y to %x if %x ule %y. This also subsumes the
special handling for constant operands, as if %y is constant this
folds to umin via implied poison reasoning, and if %x is constant
then either %x is not zero and it folds to umin, or it is known
zero, in which case it is ule anything.
2022-05-09 16:35:08 +02:00
Nikita Popov 7dddf12f44 [SCEV] Add more tests for umin_seq with known predicate (NFC) 2022-05-09 16:18:09 +02:00
Nikita Popov 18eaff1510 [ScalarEvolution] Fold %x umin_seq %y if %x cannot be zero
Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only
differ in semantics for %x==0.

More generally %x *_seq %y folds to %x * %y if %x cannot be the
saturation fold (though currently we only have umin_seq).
2022-05-09 15:11:05 +02:00
Nikita Popov 33f02de5df [ScalarEvolution] Add tests for umin_seq with non-zero operand (NFC) 2022-05-09 15:03:12 +02:00
David Green dccc69a38d [AArch64] Add extra reverse costs.
This adds some extra costs for reverse shuffles under AArch64, filling
in the i16/f16/i8 gaps in the cost model.

Differential Revision: https://reviews.llvm.org/D124786
2022-05-06 18:23:36 +01:00
Simon Pilgrim 3d107ce2b2 [CostModel][X86] Relax fcmp costs on SSE41 targets or later
Only pre-SSE41 targets double-pump the fp comparison ops
2022-05-06 13:29:40 +01:00
Simon Pilgrim cbfa857346 [CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op
Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values
2022-05-06 13:07:34 +01:00
Simon Pilgrim d21bf51494 [CostModel][X86] Adjust pre-SSE41 fp scalar select costs to account for vector ops
Based off the script from D103695, we now mainly use BLENDV or OR(AND,ANDN) to select scalar float/double ops
2022-05-06 11:41:55 +01:00
Simon Pilgrim f0e8c1d6d9 [CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op
Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(AND,ANDN))), and even on AVX2 Haswell still struggles with BLENDV ops
2022-05-06 11:27:37 +01:00
Simon Pilgrim 4236a10717 [CostModel][X86] Add more complete float/double select cost test coverage
We were only testing basic vector types
2022-05-06 10:45:36 +01:00
Peter Waller 75f9e83ace [AArch64] Add -aarch64-insert-extract-base-cost
The new flag -aarch64-insert-extract-base-cost can be used to
set the value of AArch64Subtarget::getVectorInsertExtractBaseCost(),
for the purposes of experimentation.

Differential Revision: https://reviews.llvm.org/D124835
2022-05-05 10:35:45 +00:00
Nikita Popov 47c559d6c1 [SCEV] Fold umin_seq to umin using implied poison reasoning
Similar to how we convert logical and/or to bitwise and/or, we should
also convert umin_seq to umin based on implied poison reasoning. In
%x umin_seq %y, if %y being poison implies %x being poison, then we
don't need the sequential evaluation: Having %y contribute towards
the result will never make the result more poisonous. An important
corollary of this is that if %y is never poison, we also don't need
the sequential evaluation.

This avoids some of the regressions in D124910.

Differential Revision: https://reviews.llvm.org/D124921
2022-05-05 09:43:49 +02:00
Congzhe Cao 5e004fb787 [LoopCacheAnalysis][NFC] Add a test case for improved loop cache analysis cost calculation
Added a motivating test case for D123400 where the loopnest has a
suboptimal loop order j-i-k. After D123400 we ensure that the order
of loop cache analysis output is loop i-j-k, despite the suboptimal
order in the original loopnest.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D122776
2022-05-04 17:13:10 -04:00
Nikita Popov 2f64a6cf9c [SCEV] Add additional poison implication tests (NFC) 2022-05-04 15:23:39 +02:00
Nikita Popov b62e9f63bb [SCEV] Add poison implication tests for umin_seq (NFC) 2022-05-04 14:47:46 +02:00
Nikita Popov 2929c34da6 [SCEV] Regenerate test checks (NFC) 2022-05-03 17:43:05 +02:00
Bardia Mahjour ef4ecc3cef [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost
Reviewed By: congzhe, etiotto

Differential Revision: https://reviews.llvm.org/D123400
2022-05-02 16:49:10 -04:00