llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	68e1ba8188	[SCEV] Fold umin_seq using known predicate Fold %x umin_seq %y to %x if %x ule %y. This also subsumes the special handling for constant operands, as if %y is constant this folds to umin via implied poison reasoning, and if %x is constant then either %x is not zero and it folds to umin, or it is known zero, in which case it is ule anything.	2022-05-09 16:35:08 +02:00
Nikita Popov	7dddf12f44	[SCEV] Add more tests for umin_seq with known predicate (NFC)	2022-05-09 16:18:09 +02:00
Nikita Popov	18eaff1510	[ScalarEvolution] Fold %x umin_seq %y if %x cannot be zero Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only differ in semantics for %x==0. More generally %x _seq %y folds to %x %y if %x cannot be the saturation fold (though currently we only have umin_seq).	2022-05-09 15:11:05 +02:00
Nikita Popov	33f02de5df	[ScalarEvolution] Add tests for umin_seq with non-zero operand (NFC)	2022-05-09 15:03:12 +02:00
David Green	dccc69a38d	[AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D124786	2022-05-06 18:23:36 +01:00
Simon Pilgrim	3d107ce2b2	[CostModel][X86] Relax fcmp costs on SSE41 targets or later Only pre-SSE41 targets double-pump the fp comparison ops	2022-05-06 13:29:40 +01:00
Simon Pilgrim	cbfa857346	[CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values	2022-05-06 13:07:34 +01:00
Simon Pilgrim	d21bf51494	[CostModel][X86] Adjust pre-SSE41 fp scalar select costs to account for vector ops Based off the script from D103695, we now mainly use BLENDV or OR(AND,ANDN) to select scalar float/double ops	2022-05-06 11:41:55 +01:00
Simon Pilgrim	f0e8c1d6d9	[CostModel][X86] Adjust 256-bit select costs to account for slow BLENDV op Based off the script from D103695, on AVX1, Jaguar/Bulldozer both have low throughput for ymm select patterns (BLENDV + OR(AND,ANDN))), and even on AVX2 Haswell still struggles with BLENDV ops	2022-05-06 11:27:37 +01:00
Simon Pilgrim	4236a10717	[CostModel][X86] Add more complete float/double select cost test coverage We were only testing basic vector types	2022-05-06 10:45:36 +01:00
Peter Waller	75f9e83ace	[AArch64] Add -aarch64-insert-extract-base-cost The new flag -aarch64-insert-extract-base-cost can be used to set the value of AArch64Subtarget::getVectorInsertExtractBaseCost(), for the purposes of experimentation. Differential Revision: https://reviews.llvm.org/D124835	2022-05-05 10:35:45 +00:00
Nikita Popov	47c559d6c1	[SCEV] Fold umin_seq to umin using implied poison reasoning Similar to how we convert logical and/or to bitwise and/or, we should also convert umin_seq to umin based on implied poison reasoning. In %x umin_seq %y, if %y being poison implies %x being poison, then we don't need the sequential evaluation: Having %y contribute towards the result will never make the result more poisonous. An important corollary of this is that if %y is never poison, we also don't need the sequential evaluation. This avoids some of the regressions in D124910. Differential Revision: https://reviews.llvm.org/D124921	2022-05-05 09:43:49 +02:00
Congzhe Cao	5e004fb787	[LoopCacheAnalysis][NFC] Add a test case for improved loop cache analysis cost calculation Added a motivating test case for D123400 where the loopnest has a suboptimal loop order j-i-k. After D123400 we ensure that the order of loop cache analysis output is loop i-j-k, despite the suboptimal order in the original loopnest. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D122776	2022-05-04 17:13:10 -04:00
Nikita Popov	2f64a6cf9c	[SCEV] Add additional poison implication tests (NFC)	2022-05-04 15:23:39 +02:00
Nikita Popov	b62e9f63bb	[SCEV] Add poison implication tests for umin_seq (NFC)	2022-05-04 14:47:46 +02:00
Nikita Popov	2929c34da6	[SCEV] Regenerate test checks (NFC)	2022-05-03 17:43:05 +02:00
Bardia Mahjour	ef4ecc3cef	[LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost Reviewed By: congzhe, etiotto Differential Revision: https://reviews.llvm.org/D123400	2022-05-02 16:49:10 -04:00
David Green	2dcb2d8562	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357	2022-05-02 11:36:05 +01:00
Simon Pilgrim	86bb7df6e6	[CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512) We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op. The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases. The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.	2022-05-02 09:58:39 +01:00
David Green	986de8f50b	[AArch64] Add more comprehensive reverse shuffle costmodel tests. NFC	2022-05-02 09:16:57 +01:00
Congzhe Cao	3d6fe7ace8	[LoopCacheAnalysis] Use stable_sort() to avoid non-deterministic print output The print output of loop cache analysis sometimes has a non-deterministic order and therefore we have been using `CHECK-DAG` in its lit tests. This patch changes the sorting of LoopCosts to llvm::stable_sort() where we compare loop cost numbers and sort the loops. In case of the same loop cost numbers, llvm::stable_sort() now would output a deterministic loop order. Reviewed By: Meinersbur, fhahn, #loopoptwg Differential Revision: https://reviews.llvm.org/D124725	2022-05-02 00:49:45 -04:00
Simon Pilgrim	d5198cf92f	[CostModel][X86] Check for 'null op' truncations If the legalized src/dst types are the same, assume the "truncation" is free. This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths	2022-05-01 12:03:40 +01:00
Simon Pilgrim	c2964746e3	[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput	2022-05-01 09:32:14 +01:00
Congzhe Cao	97b8a54b25	[LoopCacheAnalysis] Minor test case update Changed the test case in https://reviews.llvm.org/D122857 from using `CHECK` to using `CHECK-DAG` to incorporate nondeterministic output.	2022-04-29 18:47:20 -04:00
Congzhe Cao	c428a3d2a0	[LoopCacheAnalysis] Enable delinearization of fixed sized arrays Currently loop cache cost (LCC) cannot analyze fix-sized arrays since it cannot delinearize them. This patch adds the capability to delinearize fix-sized arrays to LCC. Most of the code is ported from DependenceAnalysis.cpp and some refactoring will be done in a next patch. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D122857	2022-04-29 16:01:27 -04:00
Alexey Bataev	371412e065	[COST]Fix crash for non-power-2 vector shuffle mask. Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elements in masks.	2022-04-29 07:28:07 -07:00
Florian Hahn	fb4113ef0c	[Passes] Remove legacy LoopUnswitch pass. The legacy LoopUnswitch pass is only used in the legacy pass manager pipeline, which is deprecated. The NewPM replacement is SimpleLoopUnswitch and I think it is time to remove the legacy LoopUnswitch code. Fixes #31000. Reviewed By: aeubanks, Meinersbur, asbirlea Differential Revision: https://reviews.llvm.org/D124376	2022-04-29 10:30:49 +01:00
LiaoChunyu	03a3654203	[RISCV] Add cost model for SK_Broadcast Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost with scalable vector. The cost model might not the best. For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost, so this patch relies on the existing cost model in BasicTTIImpl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124101	2022-04-29 13:28:02 +08:00
Roman Lebedev	fd20eb55f1	[NFC][SCEV] Tests with modellable pointer `select`s	2022-04-29 02:37:05 +03:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Alexey Bataev	ac23cf738a	[COST][NFC]Add a test for non-power-2 shuffles, NFC.	2022-04-28 09:08:28 -07:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
David Green	05b0a49832	[AArch64] Add a fp128 shuffle test. NFC These legalize to scalar types, so it's useful to have a test case that covers them.	2022-04-28 14:28:45 +01:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Vasileios Porpodas	c7bb5ac5ca	[NFC] Renamed /test/Analysis/CostModel/X86/splat-load.ll test and added more checks. Renamed test/Analysis/CostModel/X86/splat-load.ll to shuffle-load.ll to align it with AArch64's similar test. Also added a complete list of checks for all vector combinations up to 512-bits. Differential Revision: https://reviews.llvm.org/D124528	2022-04-27 09:47:43 -07:00
David Green	8e2a0e61f5	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414	2022-04-27 13:51:50 +01:00
David Green	d42f222f9d	[AArch64] Add some larger shuffle cost tests. NFC	2022-04-27 13:30:50 +01:00
David Green	d6327050e0	[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle. Differential Revision: https://reviews.llvm.org/D123409	2022-04-27 12:09:01 +01:00
David Green	4a8c13a6f4	[CostModel] Add basic fptoi_sat costs This adds some basic fptosi_sat and fptoui_sat target independent cost modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the value, followed by a fp convert. The signed values then have an additional fcmp+select for handling Nan correctly. The AArch64/Arm costs may be more incorrect, as the instruction exist natively. This can be fixed with target specific cost updates. Differential Revision: https://reviews.llvm.org/D124269	2022-04-27 09:30:00 +01:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Vasileios Porpodas	957ada4164	[AArch64][NFC] Deleted llvm/test/Analysis/CostModel/AArch64/splat-load.ll test This test is no longer necessary as it is a subset of: llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll Differential Revision: https://reviews.llvm.org/D124456	2022-04-26 10:22:11 -07:00
David Green	1159984802	[CostModel] Add fptoi_sat costmodel tests. NFC	2022-04-25 18:44:35 +01:00
Florian Hahn	0a5db8912c	[MemorySSA] Use -simple-loop-unswitch instead of -loop-unswitch in test.	2022-04-25 09:22:52 +01:00
Florian Hahn	cd81ecba2c	[MemorySSA] Generate check lines for test. This is to ensure we produce the same code when switching to SimpleLoopUnswitch.	2022-04-25 09:02:42 +01:00
David Green	091c2f953d	[AArch64] Add some splat of load cost model tests. NFC They do not work yet, but we can hopefully adjust the cost for them to get them to be recognized	2022-04-22 09:38:06 +01:00
Vasileios Porpodas	e83ad23daf	[TTI] Pre-commit cost model tests splat-loads.	2022-04-21 14:45:51 -07:00
Andrew Litteken	3de29ad209	[IRSim] Ignore debug instructions when creating canonical numbering When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D123903	2022-04-19 13:18:28 -05:00
Roman Lebedev	be5c15c7ae	[NFC][Costmodel][LV][X86] Refresh one or two interleaved load/store tests	2022-04-15 17:43:18 +03:00

1 2 3 4 5 ...

3365 Commits