llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	75631438e3	[AArch64] Costmodel tests for llvm.vscale intrinsics. NFC These shows that the cost of a @llvm.vscale is indeed 1, not 10.	2022-05-26 10:16:21 +01:00
Simon Pilgrim	6c80267d0f	[CostModel][X86] getScalarizationOverhead - improve extraction costs for > 128-bit vectors We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the individual element extraction costs. This is fine for 128-bit vectors, but for 256/512-bit vectors each element extraction also has to account for extracting the upper 128-bit subvector extraction before it can handle the element. For scalarization costs we only need to extract each demanded subvector once. Differential Revision: https://reviews.llvm.org/D125527	2022-05-24 15:18:08 +01:00
David Green	b4dd9fc370	[ARM] Cost modelling for MVE vector fptoi_sat Building on top of D125665, this adds MVE costs for fptosi.sat and fptoui.sat, providing MVE is available and the types are legal. Differential Revision: https://reviews.llvm.org/D125666	2022-05-20 11:00:34 +01:00
David Green	80aab0312a	[ARM] Cost modelling for scalar fptoi_sat Similar to D124357, this adds some cost modelling for fptoi_sat for Arm targets. Where VFP2 is available (and FP64/FP16 for the relevant types), the operations are legal as the Arm instructions naturally saturate. Otherwise they will need an extra smin/smax clamp, similar to AArch64. Differential Revision: https://reviews.llvm.org/D125665	2022-05-19 19:53:21 +01:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
Simon Pilgrim	47be07074a	[CostModel][X86] Auto generate partial interleaved load LV costs using UTC_ARGS --filter control	2022-05-12 17:46:41 +01:00
Simon Pilgrim	14e83ada16	[CostModel][X86] Auto generate masked load/store LV costs using UTC_ARGS --filter control Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2	2022-05-12 17:40:40 +01:00
Simon Pilgrim	a5c45c4dc1	[CostModel][X86] Auto generate gather/scatter LV costs using UTC_ARGS --filter control Also fix a sse42 -> sse4.2 typo so that we actually test costs for sse4.2	2022-05-12 17:39:06 +01:00
Craig Topper	39e63bd2d8	[IR][CostModel] A scalable vector shuffle can't be an identity or reverse shuffle. Even if the minimum number of elements is 1 and the length doesn't change, we don't know what vscale is so we can't classify it as identity mask. Instead it is a zero element splat. For reverse, we shouldn't classify it as a reverse unless there are at least 2 elements in the mask. This applies to both fixed and scalable vectors. For fixed vectors, a single element would be an identity shuffle. For scalable vector it's a zero elt splat. Reviewed By: sdesmalen, liaolucy Differential Revision: https://reviews.llvm.org/D124655	2022-05-09 21:37:25 -07:00
David Green	dccc69a38d	[AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D124786	2022-05-06 18:23:36 +01:00
Simon Pilgrim	3d107ce2b2	[CostModel][X86] Relax fcmp costs on SSE41 targets or later Only pre-SSE41 targets double-pump the fp comparison ops	2022-05-06 13:29:40 +01:00
Simon Pilgrim	cbfa857346	[CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values	2022-05-06 13:07:34 +01:00
Simon Pilgrim	d21bf51494	[CostModel][X86] Adjust pre-SSE41 fp scalar select costs to account for vector ops Based off the script from D103695, we now mainly use BLENDV or OR(AND,ANDN) to select scalar float/double ops	2022-05-06 11:41:55 +01:00
Simon Pilgrim	f0e8c1d6d9	[CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(AND,ANDN))), and even on AVX2 Haswell still struggles with BLENDV ops	2022-05-06 11:27:37 +01:00
Simon Pilgrim	4236a10717	[CostModel][X86] Add more complete float/double select cost test coverage We were only testing basic vector types	2022-05-06 10:45:36 +01:00
Peter Waller	75f9e83ace	[AArch64] Add -aarch64-insert-extract-base-cost The new flag -aarch64-insert-extract-base-cost can be used to set the value of AArch64Subtarget::getVectorInsertExtractBaseCost(), for the purposes of experimentation. Differential Revision: https://reviews.llvm.org/D124835	2022-05-05 10:35:45 +00:00
David Green	2dcb2d8562	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357	2022-05-02 11:36:05 +01:00
Simon Pilgrim	86bb7df6e6	[CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512) We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op. The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases. The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.	2022-05-02 09:58:39 +01:00
David Green	986de8f50b	[AArch64] Add more comprehensive reverse shuffle costmodel tests. NFC	2022-05-02 09:16:57 +01:00
Simon Pilgrim	d5198cf92f	[CostModel][X86] Check for 'null op' truncations If the legalized src/dst types are the same, assume the "truncation" is free. This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths	2022-05-01 12:03:40 +01:00
Simon Pilgrim	c2964746e3	[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput	2022-05-01 09:32:14 +01:00
Alexey Bataev	371412e065	[COST]Fix crash for non-power-2 vector shuffle mask. Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elements in masks.	2022-04-29 07:28:07 -07:00
LiaoChunyu	03a3654203	[RISCV] Add cost model for SK_Broadcast Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost with scalable vector. The cost model might not the best. For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost, so this patch relies on the existing cost model in BasicTTIImpl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124101	2022-04-29 13:28:02 +08:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Alexey Bataev	ac23cf738a	[COST][NFC]Add a test for non-power-2 shuffles, NFC.	2022-04-28 09:08:28 -07:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
David Green	05b0a49832	[AArch64] Add a fp128 shuffle test. NFC These legalize to scalar types, so it's useful to have a test case that covers them.	2022-04-28 14:28:45 +01:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Vasileios Porpodas	c7bb5ac5ca	[NFC] Renamed /test/Analysis/CostModel/X86/splat-load.ll test and added more checks. Renamed test/Analysis/CostModel/X86/splat-load.ll to shuffle-load.ll to align it with AArch64's similar test. Also added a complete list of checks for all vector combinations up to 512-bits. Differential Revision: https://reviews.llvm.org/D124528	2022-04-27 09:47:43 -07:00
David Green	8e2a0e61f5	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414	2022-04-27 13:51:50 +01:00
David Green	d42f222f9d	[AArch64] Add some larger shuffle cost tests. NFC	2022-04-27 13:30:50 +01:00
David Green	d6327050e0	[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle. Differential Revision: https://reviews.llvm.org/D123409	2022-04-27 12:09:01 +01:00
David Green	4a8c13a6f4	[CostModel] Add basic fptoi_sat costs This adds some basic fptosi_sat and fptoui_sat target independent cost modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the value, followed by a fp convert. The signed values then have an additional fcmp+select for handling Nan correctly. The AArch64/Arm costs may be more incorrect, as the instruction exist natively. This can be fixed with target specific cost updates. Differential Revision: https://reviews.llvm.org/D124269	2022-04-27 09:30:00 +01:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Vasileios Porpodas	957ada4164	[AArch64][NFC] Deleted llvm/test/Analysis/CostModel/AArch64/splat-load.ll test This test is no longer necessary as it is a subset of: llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll Differential Revision: https://reviews.llvm.org/D124456	2022-04-26 10:22:11 -07:00
David Green	1159984802	[CostModel] Add fptoi_sat costmodel tests. NFC	2022-04-25 18:44:35 +01:00
David Green	091c2f953d	[AArch64] Add some splat of load cost model tests. NFC They do not work yet, but we can hopefully adjust the cost for them to get them to be recognized	2022-04-22 09:38:06 +01:00
Vasileios Porpodas	e83ad23daf	[TTI] Pre-commit cost model tests splat-loads.	2022-04-21 14:45:51 -07:00
Roman Lebedev	be5c15c7ae	[NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests	2022-04-15 17:43:18 +03:00
LiaoChunyu	505fce5a9e	[RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic Scalable vectors llvm.experimental.stepvector intrinsic will crash due to an invalid cost when run the code through the loopunroll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D122782	2022-04-11 10:19:23 +08:00
David Green	fa784f6382	[AArch64] Insert subvector costs An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend. Some of the costs (v16i16_4_0) are still high because they get matched as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen tests for inserting subvectors, which I were added as llvm/test/CodeGen/AArch64/insert-subvector.ll. Differential Revision: https://reviews.llvm.org/D120880	2022-04-07 19:27:41 +01:00
David Green	750bf3582a	[AArch64] Increase cost of v2i64 multiplies The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4extract + 2insert + 2*mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance. Differential Revision: https://reviews.llvm.org/D123007	2022-04-04 17:42:20 +01:00
David Green	2abaa027d9	[AArch64] Teach the costmodel about widening muls A vector mul(sext, sext) or mul(zext, zext) will be code generated as a single smull or umull instruction. This most notably effects v2i64 multiplies, which are otherwise not legal and need to be expanded. The oneuse check has also been slightly changed, as it is already checked from the use of isWideningInstruction in getCastInstrCost. Differential Revision: https://reviews.llvm.org/D123006	2022-04-04 12:45:04 +01:00
David Green	2e2f38a1ac	[AArch64] Add widening arithmetic cost tests. NFC	2022-04-04 12:19:45 +01:00
Simon Pilgrim	d663166acb	[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput	2022-03-30 09:11:55 +01:00
Arthur Eubanks	d051c566cd	[test] Remove the last couple uses of -analyze in llvm/test	2022-03-23 11:31:12 -07:00
David Green	c56dd20f69	[AArch64] Add extra insert subvector cost model tests. NFC	2022-03-22 12:20:19 +00:00
Yeting Kuo	ecd7a0132a	[RISCV] Add basic cost model for vector casting To perform the cost model of vector casting, the patch consider most vector casts as their scalar form and consider those vector form of free scalr castings as 1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121771	2022-03-22 14:17:08 +08:00

1 2 3 4 5 ...

1157 Commits