llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	667a5b541f	[TargetTransformInfo] Add pow2 analysis for scalar constants Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. llvm-svn: 336827	2018-07-11 17:51:27 +00:00
Simon Pilgrim	dc113dc7ed	[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486	2018-07-07 16:53:30 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Simon Pilgrim	ac193d4b5c	[CostModel][X86] Add cost tests for fp rounding intrinsics Add cost tests for fp ceil, floor, nearbyint, rint and trunc. llvm-svn: 336122	2018-07-02 17:07:01 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	607a1e2196	[CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masks Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714	2018-06-14 14:20:20 +00:00
Simon Pilgrim	32702cc86a	[CostModel] Recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334698	2018-06-14 09:35:00 +00:00
Simon Pilgrim	9fd634db22	[CostModel][X86] Test showing failure to recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334623	2018-06-13 17:12:11 +00:00
Simon Pilgrim	54a138a0c5	[CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334620	2018-06-13 16:52:02 +00:00
Simon Pilgrim	5af0b99ea4	[CostModel][X86] Test showing failure to recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334616	2018-06-13 16:33:42 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	0783921987	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506	2018-06-12 14:47:13 +00:00
Simon Pilgrim	cfd96329f0	[CostModel][X86] Add extra Identity shuffle mask cost tests (D47986) llvm-svn: 334486	2018-06-12 09:18:13 +00:00
Simon Pilgrim	5297506625	[CostModel][X86] Add 'select' style shuffle costs tests (PR33744) llvm-svn: 334351	2018-06-09 16:08:25 +00:00
Simon Pilgrim	f2f043acbb	[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets. Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023	2018-06-05 15:17:39 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Adhemerval Zanella	f384bc7166	[AArch64] Improve cost of vector division by constant With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873	2018-05-09 12:48:22 +00:00
Simon Pilgrim	fe5c5277ed	[CostModel][X86] Split off SLM checks A future patch will require this and the diff is much better if we perform the split separately. llvm-svn: 331867	2018-05-09 11:42:34 +00:00
Matthew Simpson	b4096ebe26	[TTI, AArch64] Add transpose shuffle kind This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941	2018-04-26 13:48:33 +00:00
Simon Pilgrim	0ae4bba911	[CostModel][X86] Add div/rem tests for non-uniform constant divisors llvm-svn: 330852	2018-04-25 18:03:31 +00:00
Matthew Simpson	3fd67df3f8	[AArch64] Add cost model test case for transpose This patch adds a cost model test case for vector shuffles having transpose masks. The given costs are inaccurate and will be updated in a follow-on patch. llvm-svn: 330625	2018-04-23 18:21:29 +00:00
Simon Pilgrim	ab9798765c	[CostModel][X86] Add vector element insert/extract cost tests llvm-svn: 330439	2018-04-20 15:26:59 +00:00
Simon Pilgrim	863ffeb750	[CostModel][X86] Add srem/urem constant cost tests llvm-svn: 330436	2018-04-20 15:01:03 +00:00
Simon Pilgrim	8a15d72550	[CostModel][X86] Add SLM/GLM/BtVer2 compare + division/remainder cost tests llvm-svn: 330435	2018-04-20 14:50:34 +00:00
Simon Pilgrim	cd9ccf8824	[CostModel][X86] Split off BtVer2 cost checks llvm-svn: 330433	2018-04-20 13:50:33 +00:00
Simon Pilgrim	25b7782975	[CostModel][X86] Add GoldmontPlus cost tests Just reuses goldmont costs atm llvm-svn: 330432	2018-04-20 13:42:53 +00:00
Simon Pilgrim	34b397a318	[CostModel][X86] Add some specific cpu targets to the cost models We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well. llvm-svn: 330056	2018-04-13 19:30:15 +00:00
Simon Pilgrim	3ede11b58c	[CostModel][X86] Split fma arith costs tests from other fp tests Was proving cumbersome to test with/without fma llvm-svn: 330054	2018-04-13 19:12:32 +00:00
Simon Pilgrim	237730a196	[CostModel][X86] Regenerate latency/codesize cost tests llvm-svn: 330052	2018-04-13 18:56:58 +00:00
Simon Pilgrim	0c07ccc4e3	[CostModel][X86] Regenerate cast conversion cost tests llvm-svn: 330051	2018-04-13 18:56:05 +00:00
Simon Pilgrim	e30db80d30	[CostModel][X86] Regenerate masked intrinsic cost tests llvm-svn: 330050	2018-04-13 18:54:16 +00:00
Simon Pilgrim	8a8ff4f6d4	[CostModel][X86] Regenerate vector reduction cost tests with update_analyze_test_checks.py NOTE: We're only really interested in the extractelement cost (which represents the entire reduction). llvm-svn: 329504	2018-04-07 14:20:10 +00:00
Simon Pilgrim	7bd5ff8b4a	[CostModel][X86] Regenerate vector select cost tests with update_analyze_test_checks.py llvm-svn: 329502	2018-04-07 14:09:54 +00:00
Simon Pilgrim	495b660269	[CostModel][X86] Regenerate vector integer truncation cost tests with update_analyze_test_checks.py llvm-svn: 329500	2018-04-07 14:05:35 +00:00
Simon Pilgrim	84d8498fc5	[CostModel][X86] Regenerate silvermont (and added goldmont) cost tests with update_analyze_test_checks.py llvm-svn: 329499	2018-04-07 14:02:14 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Simon Pilgrim	a49a1b9ccc	[CostModel][X86] Regenerate vector comparison cost tests with update_analyze_test_checks.py llvm-svn: 329497	2018-04-07 12:47:35 +00:00
Simon Pilgrim	61f704e4bd	[CostModel][X86] Regenerate bit count cost tests with update_analyze_test_checks.py llvm-svn: 329413	2018-04-06 16:14:27 +00:00
Simon Pilgrim	63ae5579e7	[CostModel][X86] Regenerate vector shuffle cost tests with update_analyze_test_checks.py llvm-svn: 329410	2018-04-06 16:00:28 +00:00
Simon Pilgrim	d55ad63bfe	[CostModel][X86] Regenerate bswap/bitreverse cost tests with update_analyze_test_checks.py llvm-svn: 329407	2018-04-06 15:46:26 +00:00
Simon Pilgrim	74402acb00	[CostModel][X86] Regenerate integer extension/truncation cost tests with update_analyze_test_checks.py llvm-svn: 329402	2018-04-06 15:28:26 +00:00
Simon Pilgrim	06fba8b204	[CostModel][X86] Regenerate integer division/remainder tests with update_analyze_test_checks.py llvm-svn: 329401	2018-04-06 15:23:26 +00:00
Simon Pilgrim	60fc843fc6	[CostModel][X86] Regenerate vector shift cost tests with update_analyze_test_checks.py llvm-svn: 329400	2018-04-06 15:14:34 +00:00
Simon Pilgrim	ad768585ff	[CostModel][X86] Regenerate int<->fp cost tests with update_analyze_test_checks.py llvm-svn: 329398	2018-04-06 15:12:36 +00:00
Simon Pilgrim	5334a2c571	[UpdateTestChecks] Add update_analyze_test_checks.py for cost model analysis generation The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550. If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported. I've regenerated a couple of x86 test files to show the effect. Differential Revision: https://reviews.llvm.org/D45272 llvm-svn: 329390	2018-04-06 12:36:27 +00:00
Simon Pilgrim	d152d55ab2	[X86][CostModel] Use generic SSE levels instead of particular CPUs for shuffle costs llvm-svn: 329168	2018-04-04 11:14:12 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00

1 2 3 4 5 ...

346 Commits