Commit Graph

329 Commits

Author SHA1 Message Date
Simon Pilgrim 4162d77744 [TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput
This enables us to detect more fast path sdiv cases under cost analysis.

This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.

Found while working on D46276

Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.

Differential Revision: https://reviews.llvm.org/D46637

llvm-svn: 332969
2018-05-22 10:40:09 +00:00
Adhemerval Zanella f384bc7166 [AArch64] Improve cost of vector division by constant
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32).  The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.

Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.

llvm-svn: 331873
2018-05-09 12:48:22 +00:00
Simon Pilgrim fe5c5277ed [CostModel][X86] Split off SLM checks
A future patch will require this and the diff is much better if we perform the split separately.

llvm-svn: 331867
2018-05-09 11:42:34 +00:00
Matthew Simpson b4096ebe26 [TTI, AArch64] Add transpose shuffle kind
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.

Differential Revision: https://reviews.llvm.org/D45982

llvm-svn: 330941
2018-04-26 13:48:33 +00:00
Simon Pilgrim 0ae4bba911 [CostModel][X86] Add div/rem tests for non-uniform constant divisors
llvm-svn: 330852
2018-04-25 18:03:31 +00:00
Matthew Simpson 3fd67df3f8 [AArch64] Add cost model test case for transpose
This patch adds a cost model test case for vector shuffles having transpose
masks. The given costs are inaccurate and will be updated in a follow-on patch.

llvm-svn: 330625
2018-04-23 18:21:29 +00:00
Simon Pilgrim ab9798765c [CostModel][X86] Add vector element insert/extract cost tests
llvm-svn: 330439
2018-04-20 15:26:59 +00:00
Simon Pilgrim 863ffeb750 [CostModel][X86] Add srem/urem constant cost tests
llvm-svn: 330436
2018-04-20 15:01:03 +00:00
Simon Pilgrim 8a15d72550 [CostModel][X86] Add SLM/GLM/BtVer2 compare + division/remainder cost tests
llvm-svn: 330435
2018-04-20 14:50:34 +00:00
Simon Pilgrim cd9ccf8824 [CostModel][X86] Split off BtVer2 cost checks
llvm-svn: 330433
2018-04-20 13:50:33 +00:00
Simon Pilgrim 25b7782975 [CostModel][X86] Add GoldmontPlus cost tests
Just reuses goldmont costs atm

llvm-svn: 330432
2018-04-20 13:42:53 +00:00
Simon Pilgrim 34b397a318 [CostModel][X86] Add some specific cpu targets to the cost models
We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well.

llvm-svn: 330056
2018-04-13 19:30:15 +00:00
Simon Pilgrim 3ede11b58c [CostModel][X86] Split fma arith costs tests from other fp tests
Was proving cumbersome to test with/without fma

llvm-svn: 330054
2018-04-13 19:12:32 +00:00
Simon Pilgrim 237730a196 [CostModel][X86] Regenerate latency/codesize cost tests
llvm-svn: 330052
2018-04-13 18:56:58 +00:00
Simon Pilgrim 0c07ccc4e3 [CostModel][X86] Regenerate cast conversion cost tests
llvm-svn: 330051
2018-04-13 18:56:05 +00:00
Simon Pilgrim e30db80d30 [CostModel][X86] Regenerate masked intrinsic cost tests
llvm-svn: 330050
2018-04-13 18:54:16 +00:00
Simon Pilgrim 8a8ff4f6d4 [CostModel][X86] Regenerate vector reduction cost tests with update_analyze_test_checks.py
NOTE: We're only really interested in the extractelement cost (which represents the entire reduction).
llvm-svn: 329504
2018-04-07 14:20:10 +00:00
Simon Pilgrim 7bd5ff8b4a [CostModel][X86] Regenerate vector select cost tests with update_analyze_test_checks.py
llvm-svn: 329502
2018-04-07 14:09:54 +00:00
Simon Pilgrim 495b660269 [CostModel][X86] Regenerate vector integer truncation cost tests with update_analyze_test_checks.py
llvm-svn: 329500
2018-04-07 14:05:35 +00:00
Simon Pilgrim 84d8498fc5 [CostModel][X86] Regenerate silvermont (and added goldmont) cost tests with update_analyze_test_checks.py
llvm-svn: 329499
2018-04-07 14:02:14 +00:00
Simon Pilgrim 80ce1dde44 [CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets
llvm-svn: 329498
2018-04-07 13:24:33 +00:00
Simon Pilgrim a49a1b9ccc [CostModel][X86] Regenerate vector comparison cost tests with update_analyze_test_checks.py
llvm-svn: 329497
2018-04-07 12:47:35 +00:00
Simon Pilgrim 61f704e4bd [CostModel][X86] Regenerate bit count cost tests with update_analyze_test_checks.py
llvm-svn: 329413
2018-04-06 16:14:27 +00:00
Simon Pilgrim 63ae5579e7 [CostModel][X86] Regenerate vector shuffle cost tests with update_analyze_test_checks.py
llvm-svn: 329410
2018-04-06 16:00:28 +00:00
Simon Pilgrim d55ad63bfe [CostModel][X86] Regenerate bswap/bitreverse cost tests with update_analyze_test_checks.py
llvm-svn: 329407
2018-04-06 15:46:26 +00:00
Simon Pilgrim 74402acb00 [CostModel][X86] Regenerate integer extension/truncation cost tests with update_analyze_test_checks.py
llvm-svn: 329402
2018-04-06 15:28:26 +00:00
Simon Pilgrim 06fba8b204 [CostModel][X86] Regenerate integer division/remainder tests with update_analyze_test_checks.py
llvm-svn: 329401
2018-04-06 15:23:26 +00:00
Simon Pilgrim 60fc843fc6 [CostModel][X86] Regenerate vector shift cost tests with update_analyze_test_checks.py
llvm-svn: 329400
2018-04-06 15:14:34 +00:00
Simon Pilgrim ad768585ff [CostModel][X86] Regenerate int<->fp cost tests with update_analyze_test_checks.py
llvm-svn: 329398
2018-04-06 15:12:36 +00:00
Simon Pilgrim 5334a2c571 [UpdateTestChecks] Add update_analyze_test_checks.py for cost model analysis generation
The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550.

If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported.

I've regenerated a couple of x86 test files to show the effect.

Differential Revision: https://reviews.llvm.org/D45272

llvm-svn: 329390
2018-04-06 12:36:27 +00:00
Simon Pilgrim d152d55ab2 [X86][CostModel] Use generic SSE levels instead of particular CPUs for shuffle costs
llvm-svn: 329168
2018-04-04 11:14:12 +00:00
Craig Topper a985919d3e [X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont
Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44644

llvm-svn: 328451
2018-03-25 15:58:12 +00:00
Matthew Simpson eacfefd056 [AArch64] Implement getArithmeticReductionCost
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.

Differential Revision: https://reviews.llvm.org/D44490

llvm-svn: 327702
2018-03-16 11:34:15 +00:00
Matthew Simpson 883e96c9d4 [TTI, AArch64] Allow the cost model analysis to test vector reduce intrinsics
This patch considers the experimental vector reduce intrinsics in the default
implementation of getIntrinsicInstrCost. The cost of these intrinsics is
computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch
also adds a test case for AArch64 that indicates the costs we currently compute
for vector reduce intrinsics. These costs are inaccurate and will be updated in
a follow-on patch.

Differential Revision: https://reviews.llvm.org/D44489

llvm-svn: 327698
2018-03-16 10:00:30 +00:00
Evandro Menezes f9bd871d32 [AArch64] Adjust the cost of integer vector division
Since there is no instruction for integer vector division, factor in the
cost of singling out each element to be used with the scalar division
instruction.

Differential revision: https://reviews.llvm.org/D43974

llvm-svn: 326955
2018-03-07 22:35:32 +00:00
Simon Pilgrim 74ff6ff437 [X86] Add silvermont fp arithmetic cost model tests
Add silvermont to existing high coverage tests instead of repeating in slm-arith-costs.ll

llvm-svn: 326747
2018-03-05 22:13:22 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Sanjay Patel e6143904b9 revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)
There are too many perf regressions resulting from this, so we need to 
investigate (and add tests for) targets like ARM and AArch64 before 
trying to reinstate.

llvm-svn: 325658
2018-02-21 01:42:52 +00:00
Sanjay Patel 3e8a76abfd [TTI CostModel] change default cost of FP ops to 1 (PR36280)
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this: 
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.

The div/rem costs for x86 look very wrong in some cases, but that's already true, 
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent 
problem (though the solution may be adjusted after this change is made).

Differential Revision: https://reviews.llvm.org/D43079

llvm-svn: 325515
2018-02-19 16:11:44 +00:00
Simon Pilgrim cb9a02f60e [X86][SSE] Increase PMULLD costs to better match hardware
Until Skylake, most hardware could only issue a PMULLD op every other cycle

llvm-svn: 324823
2018-02-10 19:27:10 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Craig Topper d58c165545 [X86] Make v2i1 and v4i1 legal types without VLX
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967
2018-01-07 18:20:37 +00:00
Craig Topper 7034d401f8 [X86] Use mattr instead of mcpu in some of the cost model tests.
Based on the names of the check lines, features seems more appropriate that cpu.

Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes.

llvm-svn: 320959
2017-12-18 07:21:58 +00:00
Craig Topper fbf7b3bf3e [X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.
llvm-svn: 319266
2017-11-29 00:32:09 +00:00
Mohammed Agabaria 115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Mohammed Agabaria 6e6d5326a1 [TTI][X86] update costs of interleaved load\store of i64\double
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.

Reviewers: delena, RKSimon, craig.topper, dorit

Reviewed By: dorit

Differential Revision: https://reviews.llvm.org/D40008

llvm-svn: 318385
2017-11-16 09:38:32 +00:00
Mohammed Agabaria 6691758364 [LV][X86] update the cost of interleaving mem. access of floats
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317471
2017-11-06 10:56:20 +00:00
Mohammed Agabaria acd69dbc7c [REVERT][LV][X86] update the cost of interleaving mem. access of floats
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317433
2017-11-05 09:36:54 +00:00
Mohammed Agabaria f74c767de6 [LV][X86] update the cost of interleaving mem. access of floats
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.

Differential Revision: https://reviews.llvm.org/D39403

llvm-svn: 317432
2017-11-05 09:06:23 +00:00
Michael Zuckerman 49293264cc [AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8}
This patch adds accurate instructions cost.
The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride.

Reviewers:
1. delena
2. Farhana
3. zvi
4. dorit
5. Ayal

Differential Revision: https://reviews.llvm.org/D38762

Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7
llvm-svn: 316072
2017-10-18 11:41:55 +00:00