llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonas Paulsson	06acb3a236	[SystemZ::TTI] Improve cost for compare of i64 with extended i32 load CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735	2018-11-28 08:58:27 +00:00
Jonas Paulsson	d6b7aca911	[SystemZ::TTI] Improve costs for i16 add, sub and mul against memory. AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734	2018-11-28 08:31:50 +00:00
Jonas Paulsson	011a503f25	[SystemZ::TTI] Improved cost values for comparison against memory. Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733	2018-11-28 08:08:05 +00:00
Jonas Paulsson	5da8e432b9	[SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap. Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732	2018-11-28 07:52:34 +00:00
Craig Topper	078e58da15	[X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost. llvm-svn: 347722	2018-11-28 00:33:34 +00:00
Craig Topper	5e7dcc65bd	[X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables. llvm-svn: 347719	2018-11-27 22:46:05 +00:00
Craig Topper	e535babe4c	[X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization llvm-svn: 347697	2018-11-27 19:44:40 +00:00
Craig Topper	2f9e5c40a6	[X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization llvm-svn: 347696	2018-11-27 19:44:36 +00:00
Craig Topper	69cf86e327	[X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization llvm-svn: 347695	2018-11-27 19:44:34 +00:00
Craig Topper	1a2f9f765e	[X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization. llvm-svn: 347694	2018-11-27 19:44:30 +00:00
Fedor Sergeev	8cd9d1b5ce	Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541	2018-11-26 10:17:27 +00:00
Jonas Paulsson	96782c2c0b	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445	2018-11-22 07:17:29 +00:00
Simon Pilgrim	924f193419	[TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970	2018-11-15 17:42:53 +00:00
Simon Pilgrim	2b166c5044	[TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue llvm-svn: 346868	2018-11-14 15:04:08 +00:00
Simon Pilgrim	cdb170794b	[CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854	2018-11-14 12:24:50 +00:00
Simon Pilgrim	e827fe09b3	[CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760	2018-11-13 16:40:10 +00:00
Simon Pilgrim	2fe1076a08	[CostModel][X86] Add more cost tests for funnel shifts Added full uniform/constant coverage for funnel shifts + rotates llvm-svn: 346754	2018-11-13 12:11:15 +00:00
Simon Pilgrim	93c64e5c76	[CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688	2018-11-12 18:27:54 +00:00
Simon Pilgrim	49e93d2f0e	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683	2018-11-12 17:56:59 +00:00
Simon Pilgrim	095ea939c3	[CostModel][X86] Add some initial cost tests for funnel shifts Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling llvm-svn: 346670	2018-11-12 16:39:41 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Jonas Paulsson	5cea85dd59	[SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663	2018-11-12 15:32:27 +00:00
Simon Pilgrim	47d38198eb	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662	2018-11-12 15:20:24 +00:00
Simon Pilgrim	631f2bf51e	[CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656	2018-11-12 14:25:23 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Simon Pilgrim	d0c71609c5	[CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510	2018-11-09 16:28:19 +00:00
Jonas Paulsson	cced2a2775	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions. Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009	2018-11-02 17:53:31 +00:00
Jonas Paulsson	6749c24f40	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818	2018-11-01 09:05:32 +00:00
Jonas Paulsson	f15a53bc81	[SystemZ::TTI] Accurate costs for i1->double vector conversions This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817	2018-11-01 09:01:51 +00:00
Simon Pilgrim	44a9a71d2a	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617	2018-10-30 18:10:02 +00:00
Jonas Paulsson	af8e036c29	[SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv. Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596	2018-10-30 13:41:03 +00:00
Craig Topper	faf423ca74	[X86] Add -LABEL to some FileCheck checks. NFC llvm-svn: 345407	2018-10-26 17:21:19 +00:00
Jonas Paulsson	b7caa809e1	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted. The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327	2018-10-25 22:28:25 +00:00
Jonas Paulsson	4645711a8d	[SystemZ] Improve handling and cost estimates of vector integer div/rem Enable the DAG optimization that converts vector div/rem with constants into multiply+shifts sequences by expanding them early. This is needed since ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be available to BuildSDIV after legalization. Better cost values for these instructions based on how they will be implemented (a constant divisor is cheaper). Review: Ulrich Weigand https://reviews.llvm.org/D53196 llvm-svn: 345321	2018-10-25 21:47:22 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Simon Pilgrim	0573b8d8b6	[CostModel][X86] Add realistic i64 uitofp f64 scalar costs llvm-svn: 345261	2018-10-25 12:42:10 +00:00
Simon Pilgrim	ac84005841	[CostModel][X86] Add vXi8 vector division by constants costs. ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175	2018-10-24 18:44:12 +00:00
Simon Pilgrim	2cce074e8c	[CostModel][X86] Enable non-uniform vector division by constants costs. Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164	2018-10-24 17:30:29 +00:00
Simon Pilgrim	f04a04c2b6	[TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering. llvm-svn: 345048	2018-10-23 16:45:26 +00:00
Simon Pilgrim	f1d8b7c49e	[CostModel][X86] Add transpose shuffle cost tests llvm-svn: 345045	2018-10-23 16:27:14 +00:00
Simon Pilgrim	9a2ee1ea2d	Add BROADCAST shuffle cost tests. Part of a lot of cleanup necessary before PR39368. llvm-svn: 345025	2018-10-23 13:14:54 +00:00
Simon Pilgrim	051feee5c2	Add BROADCAST shuffle cost tests. Part of a lot of cleanup necessary before PR39368. llvm-svn: 345023	2018-10-23 13:00:22 +00:00
Simon Pilgrim	8c3d87b8cf	[ARM] Regenerate reverse shuffle costs Came about while cleaning up general shuffle costs for PR39368 llvm-svn: 344966	2018-10-22 22:26:00 +00:00
Simon Pilgrim	acd10b9f6c	[CostModel][X86] Add some initial extract/insert subvector shuffle cost tests Just f64/i64 tests initially to demonstrate PR39368 llvm-svn: 344857	2018-10-20 17:38:33 +00:00
Simon Pilgrim	e612ab0c80	[CostModel][X86] Add integer vector reduction cost tests llvm-svn: 344846	2018-10-20 14:29:59 +00:00
Jonas Paulsson	bf66f38705	[SystemZ] Temporarily disable high VFs with integer div/rem. Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129	2018-10-10 09:30:29 +00:00
Jonas Paulsson	2c8b33770c	[SystemZ] Take better care when computing needed vector registers in TTI. A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115	2018-10-10 07:36:27 +00:00
Jonas Paulsson	77df2f2f38	[SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM After recent improvements which makes better use of LOC instead of IPM, the TTI cost functions also needs to be updated to reflect this. This involves sext, zext and xor of i1. The tests were updated so that for z13 the new costs are expected, while the old costs are still checked for on zEC12. Review: Ulrich Weigand https://reviews.llvm.org/D51339 llvm-svn: 342207	2018-09-14 06:46:55 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
Simon Pilgrim	667a5b541f	[TargetTransformInfo] Add pow2 analysis for scalar constants Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. llvm-svn: 336827	2018-07-11 17:51:27 +00:00
Simon Pilgrim	dc113dc7ed	[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486	2018-07-07 16:53:30 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Simon Pilgrim	ac193d4b5c	[CostModel][X86] Add cost tests for fp rounding intrinsics Add cost tests for fp ceil, floor, nearbyint, rint and trunc. llvm-svn: 336122	2018-07-02 17:07:01 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	607a1e2196	[CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masks Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714	2018-06-14 14:20:20 +00:00
Simon Pilgrim	32702cc86a	[CostModel] Recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334698	2018-06-14 09:35:00 +00:00
Simon Pilgrim	9fd634db22	[CostModel][X86] Test showing failure to recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334623	2018-06-13 17:12:11 +00:00
Simon Pilgrim	54a138a0c5	[CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334620	2018-06-13 16:52:02 +00:00
Simon Pilgrim	5af0b99ea4	[CostModel][X86] Test showing failure to recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334616	2018-06-13 16:33:42 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	0783921987	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506	2018-06-12 14:47:13 +00:00
Simon Pilgrim	cfd96329f0	[CostModel][X86] Add extra Identity shuffle mask cost tests (D47986) llvm-svn: 334486	2018-06-12 09:18:13 +00:00
Simon Pilgrim	5297506625	[CostModel][X86] Add 'select' style shuffle costs tests (PR33744) llvm-svn: 334351	2018-06-09 16:08:25 +00:00
Simon Pilgrim	f2f043acbb	[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets. Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023	2018-06-05 15:17:39 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Adhemerval Zanella	f384bc7166	[AArch64] Improve cost of vector division by constant With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873	2018-05-09 12:48:22 +00:00
Simon Pilgrim	fe5c5277ed	[CostModel][X86] Split off SLM checks A future patch will require this and the diff is much better if we perform the split separately. llvm-svn: 331867	2018-05-09 11:42:34 +00:00
Matthew Simpson	b4096ebe26	[TTI, AArch64] Add transpose shuffle kind This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941	2018-04-26 13:48:33 +00:00
Simon Pilgrim	0ae4bba911	[CostModel][X86] Add div/rem tests for non-uniform constant divisors llvm-svn: 330852	2018-04-25 18:03:31 +00:00
Matthew Simpson	3fd67df3f8	[AArch64] Add cost model test case for transpose This patch adds a cost model test case for vector shuffles having transpose masks. The given costs are inaccurate and will be updated in a follow-on patch. llvm-svn: 330625	2018-04-23 18:21:29 +00:00
Simon Pilgrim	ab9798765c	[CostModel][X86] Add vector element insert/extract cost tests llvm-svn: 330439	2018-04-20 15:26:59 +00:00
Simon Pilgrim	863ffeb750	[CostModel][X86] Add srem/urem constant cost tests llvm-svn: 330436	2018-04-20 15:01:03 +00:00
Simon Pilgrim	8a15d72550	[CostModel][X86] Add SLM/GLM/BtVer2 compare + division/remainder cost tests llvm-svn: 330435	2018-04-20 14:50:34 +00:00
Simon Pilgrim	cd9ccf8824	[CostModel][X86] Split off BtVer2 cost checks llvm-svn: 330433	2018-04-20 13:50:33 +00:00
Simon Pilgrim	25b7782975	[CostModel][X86] Add GoldmontPlus cost tests Just reuses goldmont costs atm llvm-svn: 330432	2018-04-20 13:42:53 +00:00
Simon Pilgrim	34b397a318	[CostModel][X86] Add some specific cpu targets to the cost models We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well. llvm-svn: 330056	2018-04-13 19:30:15 +00:00
Simon Pilgrim	3ede11b58c	[CostModel][X86] Split fma arith costs tests from other fp tests Was proving cumbersome to test with/without fma llvm-svn: 330054	2018-04-13 19:12:32 +00:00
Simon Pilgrim	237730a196	[CostModel][X86] Regenerate latency/codesize cost tests llvm-svn: 330052	2018-04-13 18:56:58 +00:00
Simon Pilgrim	0c07ccc4e3	[CostModel][X86] Regenerate cast conversion cost tests llvm-svn: 330051	2018-04-13 18:56:05 +00:00
Simon Pilgrim	e30db80d30	[CostModel][X86] Regenerate masked intrinsic cost tests llvm-svn: 330050	2018-04-13 18:54:16 +00:00
Simon Pilgrim	8a8ff4f6d4	[CostModel][X86] Regenerate vector reduction cost tests with update_analyze_test_checks.py NOTE: We're only really interested in the extractelement cost (which represents the entire reduction). llvm-svn: 329504	2018-04-07 14:20:10 +00:00
Simon Pilgrim	7bd5ff8b4a	[CostModel][X86] Regenerate vector select cost tests with update_analyze_test_checks.py llvm-svn: 329502	2018-04-07 14:09:54 +00:00
Simon Pilgrim	495b660269	[CostModel][X86] Regenerate vector integer truncation cost tests with update_analyze_test_checks.py llvm-svn: 329500	2018-04-07 14:05:35 +00:00
Simon Pilgrim	84d8498fc5	[CostModel][X86] Regenerate silvermont (and added goldmont) cost tests with update_analyze_test_checks.py llvm-svn: 329499	2018-04-07 14:02:14 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Simon Pilgrim	a49a1b9ccc	[CostModel][X86] Regenerate vector comparison cost tests with update_analyze_test_checks.py llvm-svn: 329497	2018-04-07 12:47:35 +00:00
Simon Pilgrim	61f704e4bd	[CostModel][X86] Regenerate bit count cost tests with update_analyze_test_checks.py llvm-svn: 329413	2018-04-06 16:14:27 +00:00
Simon Pilgrim	63ae5579e7	[CostModel][X86] Regenerate vector shuffle cost tests with update_analyze_test_checks.py llvm-svn: 329410	2018-04-06 16:00:28 +00:00
Simon Pilgrim	d55ad63bfe	[CostModel][X86] Regenerate bswap/bitreverse cost tests with update_analyze_test_checks.py llvm-svn: 329407	2018-04-06 15:46:26 +00:00
Simon Pilgrim	74402acb00	[CostModel][X86] Regenerate integer extension/truncation cost tests with update_analyze_test_checks.py llvm-svn: 329402	2018-04-06 15:28:26 +00:00
Simon Pilgrim	06fba8b204	[CostModel][X86] Regenerate integer division/remainder tests with update_analyze_test_checks.py llvm-svn: 329401	2018-04-06 15:23:26 +00:00
Simon Pilgrim	60fc843fc6	[CostModel][X86] Regenerate vector shift cost tests with update_analyze_test_checks.py llvm-svn: 329400	2018-04-06 15:14:34 +00:00
Simon Pilgrim	ad768585ff	[CostModel][X86] Regenerate int<->fp cost tests with update_analyze_test_checks.py llvm-svn: 329398	2018-04-06 15:12:36 +00:00
Simon Pilgrim	5334a2c571	[UpdateTestChecks] Add update_analyze_test_checks.py for cost model analysis generation The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550. If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported. I've regenerated a couple of x86 test files to show the effect. Differential Revision: https://reviews.llvm.org/D45272 llvm-svn: 329390	2018-04-06 12:36:27 +00:00
Simon Pilgrim	d152d55ab2	[X86][CostModel] Use generic SSE levels instead of particular CPUs for shuffle costs llvm-svn: 329168	2018-04-04 11:14:12 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Matthew Simpson	883e96c9d4	[TTI, AArch64] Allow the cost model analysis to test vector reduce intrinsics This patch considers the experimental vector reduce intrinsics in the default implementation of getIntrinsicInstrCost. The cost of these intrinsics is computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch also adds a test case for AArch64 that indicates the costs we currently compute for vector reduce intrinsics. These costs are inaccurate and will be updated in a follow-on patch. Differential Revision: https://reviews.llvm.org/D44489 llvm-svn: 327698	2018-03-16 10:00:30 +00:00
Evandro Menezes	f9bd871d32	[AArch64] Adjust the cost of integer vector division Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955	2018-03-07 22:35:32 +00:00
Simon Pilgrim	74ff6ff437	[X86] Add silvermont fp arithmetic cost model tests Add silvermont to existing high coverage tests instead of repeating in slm-arith-costs.ll llvm-svn: 326747	2018-03-05 22:13:22 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Craig Topper	d58c165545	[X86] Make v2i1 and v4i1 legal types without VLX Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967	2018-01-07 18:20:37 +00:00
Craig Topper	7034d401f8	[X86] Use mattr instead of mcpu in some of the cost model tests. Based on the names of the check lines, features seems more appropriate that cpu. Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes. llvm-svn: 320959	2017-12-18 07:21:58 +00:00
Craig Topper	fbf7b3bf3e	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization. llvm-svn: 319266	2017-11-29 00:32:09 +00:00
Mohammed Agabaria	115f68ea3e	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
Mohammed Agabaria	6e6d5326a1	[TTI][X86] update costs of interleaved load\store of i64\double This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385	2017-11-16 09:38:32 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Michael Zuckerman	49293264cc	[AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072	2017-10-18 11:41:55 +00:00
Daniel Jasper	3344a21236	Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680	2017-10-13 14:04:21 +00:00
Justin Lebar	f84f7c7467	Convert an APInt to int64_t properly in TTI::getGEPCost(). Summary: If the pointer width is 32 bits and the calculated GEP offset is negative, we call APInt::getLimitedValue(), which does a zero-extension of the offset. That's wrong -- we should do an sext. Fixes a bug introduced in rL314362 and found by Evgeny Astigeevich. Reviewers: efriedma Subscribers: sanjoy, javed.absar, llvm-commits, eastig Differential Revision: https://reviews.llvm.org/D38557 llvm-svn: 314935	2017-10-04 20:47:33 +00:00
Guozhi Wei	eb301875b8	[TargetTransformInfo] Check if function pointer is valid before calling isLoweredToCall Function isLoweredToCall can only accept non-null function pointer, but a function pointer can be null for indirect function call. So check it before calling isLoweredToCall from getInstructionLatency. Differential Revision: https://reviews.llvm.org/D38204 llvm-svn: 314927	2017-10-04 20:14:08 +00:00
Jun Bum Lim	d40e03c2d8	Recommit : Use the basic cost if a GEP is not used as addressing mode Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923	2017-10-04 18:33:52 +00:00
Craig Topper	360c816cbb	[X86] Add AVX512 check lines to the cost model truncate test. llvm-svn: 314758	2017-10-03 03:47:34 +00:00
Alex Shlyapnikov	e76aa3b0b2	Revert "Use the basic cost if a GEP is not used as addressing mode" This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560	2017-09-29 22:04:45 +00:00
Jun Bum Lim	0e16a59e83	Use the basic cost if a GEP is not used as addressing mode Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517	2017-09-29 14:50:16 +00:00
Justin Lebar	8ea84426c9	Check for overflows when calculating the offset in GetGEPCost. Summary: This avoids C++ UB if the GEP is weird and the calculation overflows int64_t, and it's also observable in the cost model's results. Such GEPs are almost surely not valid pointers, but LLVM nonetheless generates them sometimes. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38337 llvm-svn: 314362	2017-09-27 23:16:56 +00:00
Guozhi Wei	bce228ca42	[TargetTransformInfo] Handle intrinsic call in getInstructionLatency() Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency(). Differential Revision: https://reviews.llvm.org/D38104 llvm-svn: 314003	2017-09-22 18:25:53 +00:00
Guozhi Wei	3d1305f6da	[TargetTransformInfo] Static alloca has 0 cost Static alloca usually doesn't generate any machine instructions, so it has 0 cost. Differential Revision: https://reviews.llvm.org/D37879 llvm-svn: 313410	2017-09-15 22:28:12 +00:00
Guozhi Wei	21f8fad909	[TargetTransformInfo] Detect 0 latency instructions For instructions that unlikely generate machine instructions, they should also have 0 latency. Differential Revision: https://reviews.llvm.org/D37833 llvm-svn: 313288	2017-09-14 19:20:02 +00:00
Guozhi Wei	62d6414465	[TargetTransformInfo] Add a new public interface getInstructionCost Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832	2017-09-08 22:29:17 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Matt Arsenault	376f1bd73c	AMDGPU: Don't assert in TTI with fp32 denorms enabled Also refine for f16 and rcp cases. llvm-svn: 312213	2017-08-31 05:47:00 +00:00
Simon Pilgrim	c63f93a197	[CostModel][X86][XOP] Improve costs for XOP shuffles VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions llvm-svn: 311005	2017-08-16 13:50:20 +00:00
Simon Pilgrim	b59c2d9d73	[CostModel][X86] Add SSE2 two-src shuffle costs llvm-svn: 310654	2017-08-10 19:32:35 +00:00
Simon Pilgrim	7354531b82	[CostModel][X86] Add avx1 two-src shuffle costs llvm-svn: 310650	2017-08-10 19:02:51 +00:00
Simon Pilgrim	ac2e50a4ca	[CostModel][X86] Add avx2 two-src shuffle costs llvm-svn: 310645	2017-08-10 18:29:34 +00:00
Simon Pilgrim	2f529412e1	[CostModel][X86] Extend two src shuffle cost tests Cover most 128/256/512/1024-bit cases for vXf64/vXi64, vXf32/vXi32, vXi16 + vXi8 llvm-svn: 310641	2017-08-10 18:02:45 +00:00
Simon Pilgrim	fe67612eba	[CostModel][X86] Add avx512vbmi broadcast/reverse/single-src shuffle cost tests llvm-svn: 310633	2017-08-10 17:33:25 +00:00
Simon Pilgrim	702e5fa391	[CostModel][X86] Improve single src shuffle costs Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles llvm-svn: 310632	2017-08-10 17:27:20 +00:00
Simon Pilgrim	419215abb7	[CostModel][X86] Added v2f64/v2i64 single src shuffle model tests Fixed label checks for all prefixes llvm-svn: 310606	2017-08-10 15:25:08 +00:00
Ulrich Weigand	33435c4c9c	[SystemZ] Add support for IBM z14 processor (2/3) This adds support for the new 32-bit vector float instructions of z14. This includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions, including new LLVM intrinsics. - Scheduler description support for the instructions. - Update to the vector cost function calculations. In general, CodeGen support for the new v4f32 instructions closely matches support for the existing v2f64 instructions. llvm-svn: 308195	2017-07-17 17:42:48 +00:00
Mohammed Agabaria	eb09a810e6	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974	2017-07-02 12:16:15 +00:00
Dorit Nuzman	e0e0f1ddb0	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2 The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238	2017-06-25 08:26:25 +00:00
Simon Pilgrim	68204b83a7	[CostModel][X86] Add scalar arithmetic cost tests llvm-svn: 305810	2017-06-20 17:10:27 +00:00
Simon Pilgrim	36c17935e4	[CostModel][X86] Declare costs variables based on type The alphabetical progression isn't that useful llvm-svn: 305808	2017-06-20 17:04:46 +00:00
Matthew Simpson	6349380fa4	Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771	2017-05-24 16:48:39 +00:00
Simon Pilgrim	c74e7f0a42	Fix line-endings. llvm-svn: 303448	2017-05-19 19:47:29 +00:00
Simon Pilgrim	6bba6068be	[X86][AVX512] Add 512-bit vector ctpop costs + tests llvm-svn: 303342	2017-05-18 10:42:34 +00:00
Simon Pilgrim	23ef26728a	[X86][AVX512] Add 512-bit vector ctlz costs + tests llvm-svn: 303300	2017-05-17 21:02:18 +00:00
Simon Pilgrim	d0365967c4	[X86][AVX512] Add 512-bit vector cttz costs + tests llvm-svn: 303293	2017-05-17 20:22:54 +00:00
Simon Pilgrim	91b46c99be	[X86] Split ctpop/ctlz/cttz cost tests This will make things a lot easier to test all the permutations of avx512 llvm-svn: 303290	2017-05-17 19:57:20 +00:00
Simon Pilgrim	a9a92a1a6a	[X86][AVX512] Add 512-bit vector bitreverse costs + tests llvm-svn: 303283	2017-05-17 19:20:20 +00:00

1 2 3 4 5 ...

495 Commits