Commit Graph

495 Commits

Author SHA1 Message Date
Jonas Paulsson 19871f848b [CodeMetrics] Don't let extends of i1 be free.
getUserCost() currently returns TCC_Free for any extend of a compare (i1)
result. It seems this is only true in a limited number of cases where for
example two compares are chained. Even in those types of cases it seems
unlikely that they are generally free, while they may be in some cases.

This patch therefore removes this special handling of cast of i1. No tests
are failing because of this.

If some target want the old behavior, it could override getUserCost().

Review: Hal Finkel, Chandler Carruth, Evgeny Astigeevich, Simon Pilgrim,
        Ulrich Weigand
https://reviews.llvm.org/D54742/new/

llvm-svn: 360970
2019-05-17 01:26:35 +00:00
Simon Pilgrim 6b10fde69b [CostModel][X86] Add min/max reduction costs for all SSE targets
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).

I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.

llvm-svn: 360528
2019-05-11 17:12:52 +00:00
David L. Jones fccb505f0f Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.

llvm-svn: 359648
2019-05-01 05:01:03 +00:00
Sjoerd Meijer ea31ddb36f [ARM] Implement TTI::getMemcpyCost
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.

Differential Revision: https://reviews.llvm.org/D59787

llvm-svn: 359547
2019-04-30 10:28:50 +00:00
Roland Froese 4b17772b9e [PowerPC] Update P9 vector costs for insert/extract element
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions.  Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.

Differential Revision: https://reviews.llvm.org/D60160

llvm-svn: 359313
2019-04-26 16:14:17 +00:00
David Green 0d741507f7 [ARM] Rewrite isLegalT2AddressImmediate
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.

Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.

I have not attempted to do much with vectors yet. That will need changing once
MVE is added.

Differential Revision: https://reviews.llvm.org/D60677

llvm-svn: 358845
2019-04-21 09:54:29 +00:00
Roland Froese a5dd08cac2 [PowerPC] Add some PPC vec cost tests to prep for D60160 NFC
llvm-svn: 358699
2019-04-18 18:12:09 +00:00
Simon Pilgrim 9daacec816 [CostModel][X86] Add bool anyof/allof reduction costs
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......

Differential Revision: https://reviews.llvm.org/D60403

llvm-svn: 358574
2019-04-17 10:58:19 +00:00
Simon Pilgrim 2ea8dbf564 [CostModel][X86] Add more exhaustive masked load/store/gather/scatter/expand/compress cost tests
llvm-svn: 357838
2019-04-06 12:08:37 +00:00
Sjoerd Meijer 633fb0f266 [TTI] getMemcpyCost
This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be
modeled and queried. The default implementation returns Expensive, but targets
can override this function to model the cost more accurately.

Differential Revision: https://reviews.llvm.org/D59252

llvm-svn: 356555
2019-03-20 14:15:46 +00:00
Matt Arsenault e0c1f9e76d AMDGPU: Partially fix default device for HSA
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

llvm-svn: 356347
2019-03-17 21:31:35 +00:00
Tim Renouf e30aa6a136 [AMDGPU] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Fixed assumptions of power-of-2 vector type in kernel arg handling,
  and added v5 kernel arg tests and v3/v5 shader arg tests.

* Added v5 tests for cost analysis.

* Added vec3/vec5 arg test cases.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58928

Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
2019-03-17 21:04:16 +00:00
Simon Pilgrim 42bf2dd629 [TTI] Add generic cost model for smul/umul overflow intrinsics
Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO.

llvm-svn: 354784
2019-02-25 13:30:23 +00:00
Simon Pilgrim 9caf0f0d15 [TTI] Add generic cost model for fixed point smul/umul
Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.

Differential Revision: https://reviews.llvm.org/D57925

llvm-svn: 354774
2019-02-25 11:59:23 +00:00
Simon Pilgrim fbb3086fc8 [CostModel][X86] Add UMUL fixed point cost tests
llvm-svn: 353153
2019-02-05 10:55:38 +00:00
Nikita Popov 8e1a464e6a [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.

Differential Revision: https://reviews.llvm.org/D56869

llvm-svn: 352409
2019-01-28 19:19:09 +00:00
Simon Pilgrim adca820927 [TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.

This completes PR40316

Differential Revision: https://reviews.llvm.org/D57239

llvm-svn: 352315
2019-01-27 13:51:59 +00:00
Nemanja Ivanovic 7d007ddedf [PowerPC] Update Vector Costs for P9
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D55461

llvm-svn: 352261
2019-01-26 01:18:48 +00:00
Simon Pilgrim 30b206b5da [CostModel][X86] Add SMUL fixed point cost tests
llvm-svn: 352046
2019-01-24 13:48:20 +00:00
Simon Pilgrim 47ca8606ba [TTI] Add generic SADDO/SSUBO costs
Added x86 scalar sadd_with_overflow/ssub_with_overflow costs.

llvm-svn: 352045
2019-01-24 13:36:45 +00:00
Simon Pilgrim a131e4e296 [TTI] Add generic UADDSAT/USUBSAT costs
Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352044
2019-01-24 12:27:10 +00:00
Simon Pilgrim 2d1964b90f [TTI] Add generic UADDO/USUBO costs
Added x86 scalar uadd_with_overflow/usub_with_overflow costs.

Differential Revision: https://reviews.llvm.org/D56907

llvm-svn: 352043
2019-01-24 12:10:20 +00:00
Simon Pilgrim f87226eb70 [IR] Match intrinsic parameter by scalar/vectorwidth
This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth.

The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth.

I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type.

The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour

Differential Revision: https://reviews.llvm.org/D57090

llvm-svn: 351957
2019-01-23 16:00:22 +00:00
Simon Pilgrim ee900efb30 [CostModel][X86] Add ICMP Predicate specific costs
First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly.

Differential Revision: https://reviews.llvm.org/D57013

llvm-svn: 351810
2019-01-22 12:29:38 +00:00
Simon Pilgrim 44feb4a87b [CostModel][X86] Add XOP icmp cost tests (PR40376)
llvm-svn: 351741
2019-01-21 11:33:52 +00:00
Simon Pilgrim c934d3a01b [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

llvm-svn: 351685
2019-01-20 13:55:01 +00:00
Simon Pilgrim 1231904c48 [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets
Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era

llvm-svn: 351684
2019-01-20 13:21:43 +00:00
Simon Pilgrim 60e5a3accb [CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes
llvm-svn: 351682
2019-01-20 12:10:42 +00:00
Simon Pilgrim 5d7182ecb6 [CostModel][X86] Add masked load/store/gather/scatter tests for SSE2/SSE42/AVX1 targets
llvm-svn: 351681
2019-01-20 11:23:01 +00:00
Simon Pilgrim a8b009fd14 [CostModel][X86] Add non-constant vselect cost tests
Also add AVX512 costs at the same time

llvm-svn: 351680
2019-01-20 11:19:35 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Simon Pilgrim 71d61567c0 [CostModel][X86] Add SSE1 fp cost tests
llvm-svn: 350401
2019-01-04 16:37:01 +00:00
Simon Pilgrim f0c533b7db [CostModel][X86] Add truncate cost tests to cover all legal destination types
We were only testing costs for legal source vector element counts

llvm-svn: 350323
2019-01-03 14:49:39 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Simon Pilgrim 55ea89305d [X86] Add ADD/SUB SSAT/USAT cost tests (PR40123)
llvm-svn: 350293
2019-01-03 11:29:24 +00:00
Craig Topper c6bfb05762 [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction
This is split from D55452 with the correct patch this time.

Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.

Differential Revision: https://reviews.llvm.org/D55615

llvm-svn: 349072
2018-12-13 19:08:10 +00:00
Craig Topper 84d62ce6c1 [CostModel][X86][AArch64] Adjust cost of the scalarization part of min/max reduction.
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?

Reviewers: RKSimon, spatel, ABataev

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55480

llvm-svn: 348739
2018-12-10 06:58:58 +00:00
Craig Topper d1498ed8df [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621
2018-12-07 18:20:56 +00:00
Craig Topper 381b4fb0ab [X86] Remove -costmodel-reduxcost=true from the experimental vector reduction intrinsic tests as it appears to be unnecessary. NFC
I think this has something to do with matching reductions from extractelement, binops, and shuffles. But we're not matching here.

llvm-svn: 348340
2018-12-05 07:56:50 +00:00
Craig Topper b4719e5842 [X86] Add more cost model tests for vector reductions with narrow vector types. NFC
llvm-svn: 348339
2018-12-05 07:26:57 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Simon Pilgrim 102854f4d4 [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076
2018-12-01 14:18:31 +00:00
Craig Topper 8e10e9423d [X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests.
llvm-svn: 348056
2018-12-01 00:21:49 +00:00
Jonas Paulsson b1d014883c [SystemZ::TTI] i8/i16 operands extension costs revisited
Three minor changes to these extra costs:

* For ICmp instructions, instead of adding 2 all the time for extending each
  operand, this is only done if that operand is neither a load or an
  immediate.

* The operands extension costs for divides removed, because we now use a high
  cost already for the divide (20).

* The costs for lhsr/ashr extra costs removed as this did not seem useful.

Review: Ulrich Weigand
https://reviews.llvm.org/D55053

llvm-svn: 347961
2018-11-30 07:09:34 +00:00
Craig Topper 81f1b4a361 [X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal.
Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal.

This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature.

Differential Revision: https://reviews.llvm.org/D54984

llvm-svn: 347786
2018-11-28 18:11:42 +00:00
Craig Topper d3bb036bc9 [X86] Add some cost model entries for sext/zext for avx512bw
This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst.

I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types.

Differential Revision: https://reviews.llvm.org/D54979

llvm-svn: 347785
2018-11-28 18:11:39 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Jonas Paulsson 06acb3a236 [SystemZ::TTI] Improve cost for compare of i64 with extended i32 load
CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value
in memory.

This patch makes such a load considered foldable and so gets a 0 cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D54944

llvm-svn: 347735
2018-11-28 08:58:27 +00:00
Jonas Paulsson d6b7aca911 [SystemZ::TTI] Improve costs for i16 add, sub and mul against memory.
AH, SH and MH costs are already covered in the cases where LHS is 32 bits and
RHS is 16 bits of memory sign-extended to i32.

As these instructions are also used when LHS is i16, this patch recognizes
that the loads will get folded then as well.

Review: Ulrich Weigand
https://reviews.llvm.org/D54940

llvm-svn: 347734
2018-11-28 08:31:50 +00:00
Jonas Paulsson 011a503f25 [SystemZ::TTI] Improved cost values for comparison against memory.
Single instructions exist for i8 and i16 comparisons of memory against a
small immediate.

This patch makes sure that if the load in these cases has a single user (the
ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1.

Review: Ulrich Weigand
https://reviews.llvm.org/D54897

llvm-svn: 347733
2018-11-28 08:08:05 +00:00
Jonas Paulsson 5da8e432b9 [SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap.
Since byte-swapping loads and stores are supported, a 'load -> bswap' or
'bswap -> store' sequence should have the cost of one.

Review: Ulrich Weigand
https://reviews.llvm.org/D54870

llvm-svn: 347732
2018-11-28 07:52:34 +00:00
Craig Topper 078e58da15 [X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext.
The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost.

llvm-svn: 347722
2018-11-28 00:33:34 +00:00
Craig Topper 5e7dcc65bd [X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1.
Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost.

Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables.

llvm-svn: 347719
2018-11-27 22:46:05 +00:00
Craig Topper e535babe4c [X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization
llvm-svn: 347697
2018-11-27 19:44:40 +00:00
Craig Topper 2f9e5c40a6 [X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization
llvm-svn: 347696
2018-11-27 19:44:36 +00:00
Craig Topper 69cf86e327 [X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization
llvm-svn: 347695
2018-11-27 19:44:34 +00:00
Craig Topper 1a2f9f765e [X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization.
llvm-svn: 347694
2018-11-27 19:44:30 +00:00
Fedor Sergeev 8cd9d1b5ce Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541
2018-11-26 10:17:27 +00:00
Jonas Paulsson 96782c2c0b [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.
Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

llvm-svn: 347445
2018-11-22 07:17:29 +00:00
Simon Pilgrim 924f193419 [TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970
2018-11-15 17:42:53 +00:00
Simon Pilgrim 2b166c5044 [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
llvm-svn: 346868
2018-11-14 15:04:08 +00:00
Simon Pilgrim cdb170794b [CostModel] Add generic expansion funnel shift cost support
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.

This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.

llvm-svn: 346854
2018-11-14 12:24:50 +00:00
Simon Pilgrim e827fe09b3 [CostModel][X86] Fix constant vector XOP rights shifts
We'll constant fold these cases so they are as cheap as vector left shift cases.

Noticed while improving funnel shift costs.

llvm-svn: 346760
2018-11-13 16:40:10 +00:00
Simon Pilgrim 2fe1076a08 [CostModel][X86] Add more cost tests for funnel shifts
Added full uniform/constant coverage for funnel shifts + rotates

llvm-svn: 346754
2018-11-13 12:11:15 +00:00
Simon Pilgrim 93c64e5c76 [CostModel][X86] Add funnel shift rotation special case costs
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.

llvm-svn: 346688
2018-11-12 18:27:54 +00:00
Simon Pilgrim 49e93d2f0e [CostModel][X86] Add SHLD/SHRD scalar funnel shift costs
The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level

llvm-svn: 346683
2018-11-12 17:56:59 +00:00
Simon Pilgrim 095ea939c3 [CostModel][X86] Add some initial cost tests for funnel shifts
Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling

llvm-svn: 346670
2018-11-12 16:39:41 +00:00
Simon Pilgrim f4cd292ba2 [CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector
llvm-svn: 346664
2018-11-12 15:48:06 +00:00
Jonas Paulsson 5cea85dd59 [SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.

This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).

Review: Ulrich Weigand
https://reviews.llvm.org/D54423

llvm-svn: 346663
2018-11-12 15:32:27 +00:00
Simon Pilgrim 47d38198eb [CostModel] Add more realistic SK_InsertSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

llvm-svn: 346662
2018-11-12 15:20:24 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Simon Pilgrim fc8f1d7da7 [CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
2018-11-09 19:04:27 +00:00
Simon Pilgrim d0c71609c5 [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)
Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks.

llvm-svn: 346510
2018-11-09 16:28:19 +00:00
Jonas Paulsson cced2a2775 [SystemZ::TTI] Improve cost handling of uint/sint to fp conversions.
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.

Since the load already does the extension, there is no extra cost (previously
returned 2).

Review: Ulrich Weigand
https://reviews.llvm.org/D54028

llvm-svn: 346009
2018-11-02 17:53:31 +00:00
Jonas Paulsson 6749c24f40 [SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion
Scalar i1 to fp conversions are done with a branch sequence, so it should
have a higher cost.

Review: Ulrich Weigand
https://reviews.llvm.org/D53924

llvm-svn: 345818
2018-11-01 09:05:32 +00:00
Jonas Paulsson f15a53bc81 [SystemZ::TTI] Accurate costs for i1->double vector conversions
This factors out a new method getBoolVecToIntConversionCost() containing the
code for vector sext/zext of i1, in order to reuse it for i1 to double vector
conversions.

Review: Ulrich Weigand
https://reviews.llvm.org/D53923

llvm-svn: 345817
2018-11-01 09:01:51 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Jonas Paulsson af8e036c29 [SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv.
Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a
load. This patch adds a check for this.

Review: Ulrich Weigand
https://reviews.llvm.org/D53791

llvm-svn: 345596
2018-10-30 13:41:03 +00:00
Craig Topper faf423ca74 [X86] Add -LABEL to some FileCheck checks. NFC
llvm-svn: 345407
2018-10-26 17:21:19 +00:00
Jonas Paulsson b7caa809e1 [SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted.
The SystemZ backend can do arithmetic of memory by loading and then extending
one of the operands. Similarly, a load + truncate can be folded into an
operand.

This patch improves the SystemZ TTI cost function to recognize this.

Review: Ulrich Weigand
https://reviews.llvm.org/D52692

llvm-svn: 345327
2018-10-25 22:28:25 +00:00
Jonas Paulsson 4645711a8d [SystemZ] Improve handling and cost estimates of vector integer div/rem
Enable the DAG optimization that converts vector div/rem with constants into
multiply+shifts sequences by expanding them early. This is needed since
ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be
available to BuildSDIV after legalization.

Better cost values for these instructions based on how they will be
implemented (a constant divisor is cheaper).

Review: Ulrich Weigand
https://reviews.llvm.org/D53196

llvm-svn: 345321
2018-10-25 21:47:22 +00:00
Simon Pilgrim 53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim 0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Simon Pilgrim ac84005841 [CostModel][X86] Add vXi8 vector division by constants costs.
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.

llvm-svn: 345175
2018-10-24 18:44:12 +00:00
Simon Pilgrim 2cce074e8c [CostModel][X86] Enable non-uniform vector division by constants costs.
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.

llvm-svn: 345164
2018-10-24 17:30:29 +00:00
Simon Pilgrim f04a04c2b6 [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering.
llvm-svn: 345048
2018-10-23 16:45:26 +00:00
Simon Pilgrim f1d8b7c49e [CostModel][X86] Add transpose shuffle cost tests
llvm-svn: 345045
2018-10-23 16:27:14 +00:00
Simon Pilgrim 9a2ee1ea2d Add BROADCAST shuffle cost tests.
Part of a lot of cleanup necessary before PR39368.

llvm-svn: 345025
2018-10-23 13:14:54 +00:00
Simon Pilgrim 051feee5c2 Add BROADCAST shuffle cost tests.
Part of a lot of cleanup necessary before PR39368.

llvm-svn: 345023
2018-10-23 13:00:22 +00:00
Simon Pilgrim 8c3d87b8cf [ARM] Regenerate reverse shuffle costs
Came about while cleaning up general shuffle costs for PR39368

llvm-svn: 344966
2018-10-22 22:26:00 +00:00
Simon Pilgrim acd10b9f6c [CostModel][X86] Add some initial extract/insert subvector shuffle cost tests
Just f64/i64 tests initially to demonstrate PR39368

llvm-svn: 344857
2018-10-20 17:38:33 +00:00
Simon Pilgrim e612ab0c80 [CostModel][X86] Add integer vector reduction cost tests
llvm-svn: 344846
2018-10-20 14:29:59 +00:00
Jonas Paulsson bf66f38705 [SystemZ] Temporarily disable high VFs with integer div/rem.
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).

llvm-svn: 344129
2018-10-10 09:30:29 +00:00
Jonas Paulsson 2c8b33770c [SystemZ] Take better care when computing needed vector registers in TTI.
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.

getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).

A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.

Review: Ulrich Weigand
llvm-svn: 344115
2018-10-10 07:36:27 +00:00
Jonas Paulsson 77df2f2f38 [SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.

This involves sext, zext and xor of i1.

The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.

Review: Ulrich Weigand
https://reviews.llvm.org/D51339

llvm-svn: 342207
2018-09-14 06:46:55 +00:00
Craig Topper a72012c206 [X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F.
Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51267

llvm-svn: 340708
2018-08-26 18:47:44 +00:00
Simon Pilgrim 667a5b541f [TargetTransformInfo] Add pow2 analysis for scalar constants
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.

llvm-svn: 336827
2018-07-11 17:51:27 +00:00