llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	7979f24954	[test] Fix some unused check prefixes in test/Analysis/CostModel/X86	2020-10-31 23:29:57 -07:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Sanjay Patel	251dd7c0f9	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Sanjay Patel	d5a75e7738	[x86] add test for umul intrinsic costs; NFC	2020-10-29 12:12:52 -04:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Sanjay Patel	9df32c9044	[CostModel] remove cost-kind predicate for funnel shift costs Completing the series of FIXME removals for special-case intrinsics: `50dfa19cc7` `f2c25c7079` `c963bde015` `01ea93d85d` This one looks quite different than the others. The size/blended cost is still potentially very far off from the throughput cost, but this is hopefully not worse on the whole. It looks like the underlying costs for the expanded shift/logic have their own cost-kind limitations. Also, we are not asking the target if it has a legal funnel shift op, so we just assume that the intrinsic gets expanded.	2020-10-28 14:02:34 -04:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Sanjay Patel	50dfa19cc7	[CostModel] remove cost-kind predicate for FP add/mul vector reduction costs This was originally part of: `f2c25c7079` but that was reverted because there was an underlying bug in processing the vector type of these intrinsics. That was fixed with: `74ffc823ed` This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-27 18:00:20 -04:00
Sanjay Patel	138fda5dd2	[CostModel] add tests for FP reductions; NFC	2020-10-27 18:00:20 -04:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Joe Ellis	0383a1a8c2	[SVE][AArch64] Fix TypeSize warning in GEP cost analysis The warning would fire when calling getGEPCost for analyzing the cost of a GEP instruction. This would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes the issue by using getKnownMinSize instead of the implicit cast. This is possible because the code is already scalable-vector aware. The semantic behaviour of the code is unchanged by this patch. Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D89872	2020-10-26 17:40:19 +00:00
Tyker	d3205bbca3	[Annotation] Allows annotation to carry some additional constant arguments. This allows using annotation in a much more contexts than it currently has. especially when annotation with template or constexpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D88645	2020-10-26 10:50:05 +01:00
Sanjay Patel	f2c25c7079	[CostModel] remove cost-kind predicate for some vector reduction costs This is a modified 2nd try of `22d10b8ab4` (reverted by `1c8371692d` because it managed to expose an existing crashing bug that should be fixed by `74ffc823` ). Original commit message: This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-25 15:17:52 -04:00
Sanjay Patel	74ffc823ed	[CostModel] fix operand/type accounting for fadd/fmul reductions I'm not sure if/how this ever worked, but it must not be tested currently because the basic tests added here were crashing as noted in the post-review comments for `1c83716` (which reverted another cost-model fix in `22d10b8ab4`).	2020-10-25 15:01:19 -04:00
Martin Storsjö	1c8371692d	Revert "[CostModel] remove cost-kind predicate for vector reduction costs" This reverts commit `22d10b8ab4`. This broke compilation e.g. like this: $ cat synth.c a; float b; c() { for (;;) { float d = -b a++; d -= --b * a++; d -= --b * a; d -= --b * a; e(d); } } $ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl _cl<To, const From>::doit(const From*) [with To = llvm::PointerType; Fr om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail ed.	2020-10-25 08:47:54 +02:00
Sanjay Patel	22d10b8ab4	[CostModel] remove cost-kind predicate for vector reduction costs This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-24 13:20:17 -04:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
Florian Hahn	089c1ccd6d	[AArch64] Add vector compare/select cost-model tests.	2020-10-23 20:43:04 +01:00
Florian Hahn	0fcc6f7a76	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Florian Hahn	c1705e0ba4	[AArch64] Add min/max cost-model tests for v2i32.	2020-10-22 16:04:13 +01:00
Florian Hahn	d6efc87518	[AArch64] Add min/max cost-model tests for v4i16.	2020-10-22 15:47:50 +01:00
Florian Hahn	fbb6375db0	[AArch64] Add cost model tests for min/max intrinsics.	2020-10-22 13:28:04 +01:00
Sanjay Patel	c963bde015	[CostModel] remove cost-kind predicate for scatter/gather cost This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant ARM could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. X86 has the same throughput restriction as the basic implementation, so it is still unchanged. Paraphrasing from the previous commit: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-21 14:26:05 -04:00
Sanjay Patel	729610a51a	[ARM] add cost-kind tests for intrinsics; NFC This is a copy of the x86 file to provide better coverage; x86 may have strange overrides that mask changes in the generic model.	2020-10-21 14:26:04 -04:00
Sanjay Patel	01ea93d85d	[CostModel] remove cost-kind predicate for memcpy cost The default implementation base returns TCC_Expensive (currently set to '4'), so that explains the test diff. This probably does not make sense for most callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. The ARM target has an override that tries to model codegen expansion, and that should likely be adapted for general usage. This probably does not affect anything because the vectorizers are the primary users of the throughput cost, but memcpy is not listed as a trivially vectorizable intrinsic.	2020-10-21 08:50:44 -04:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
Sanjay Patel	9f6048f83d	[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation The cost modeling for intrinsics is a patchwork based on different expectations from the callers, so it's a mess. I'm hoping to untangle this to allow canonicalization to the new min/max intrinsics in IR. The general goal is to remove the cost-kind restriction here in the basic implementation class. Ie, if some intrinsic has throughput cost of 104, assume that it has the same size, latency, and blended costs. Effectively, an intrinsic with cost N is composed of N simple instructions. If that's not correct, the target should provide a more accurate override. The x86-64 SSE2 subtarget cost diffs require explanation: 1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or "TEST+BSF+CMOV/BRANCH", so not cheap. 2. The 128-bit SSE vector width versions assume cost of 18 or 26 (no explanation provided in the tables, but this corresponds to a bunch of shift/logic/compare). 3. The 512-bit vectors in the test file are scaled up by a factor of 4 from the legal vector width costs. 4. The plain latency cost-kind is not affected in this patch because that calc is diverted before we get to getIntrinsicInstrCost(). Differential Revision: https://reviews.llvm.org/D89461	2020-10-15 13:14:41 -04:00
Sanjay Patel	ef748583c2	[CostModel] rearrange basic intrinsic cost implementation This is bigger/uglier than before, but it should allow fixing all of the broken paths more easily. Test coverage added with rGfab028b and other commits. This is not NFC - the scalable vector test would crash without this patch.	2020-10-13 11:52:00 -04:00
Sanjay Patel	1b94261e36	[x86] add cost model test for memcpy; NFC This is treated as a special-case in the base class implementation of getIntrinsicInstrCost().	2020-10-13 11:42:44 -04:00
Sanjay Patel	1c90878e60	[AArch64] fix spacing in test's RUN lines; NFC	2020-10-13 10:44:18 -04:00
Sanjay Patel	fab028b914	[x86] add tests for cost model kinds of intrinsics; NFC This provides coverage for existing special-cases and a sampling of other intrinsics. Current output appears to be wrong in several cases.	2020-10-13 10:39:43 -04:00
Sanjay Patel	937d782e38	[AArch64] add cost model test for scalable vector math; NFC Testing for the various cost model "TargetCostKind" is limited, and testing for scalable vectors is limited. The motivating example of an intrinsic is not included here yet because that just crashes.	2020-10-13 08:39:04 -04:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
David Green	4c3515cd62	[ARM] Add MVE vecreduce costmodel tests. NFC There were some existing tests that were not super useful. New ones are added for testing MVE specific patterns.	2020-10-09 16:25:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sanjay Patel	816b0a9c9f	[CostModel] add cl option to check size and latency costs; NFC This is a setting used by SimplifyCFG, LoopUnroll, and InlineCost, but there is apparently no direct test coverage for any of those cost model values.	2020-09-27 09:52:56 -04:00
Jonas Paulsson	370a8c8025	[SystemZ] Make sure not to call getZExtValue on a >64 bit constant. Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than calling getZExtValue() since the immediate operand may be wider than 64 bits, which is not allowed with getZExtValue(). Fixes https://bugs.llvm.org/show_bug.cgi?id=47600 Review: Simon Pilgrim	2020-09-23 15:36:32 +02:00
Bing1 Yu	ec24e50553	[CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87884	2020-09-23 10:29:10 +08:00
Simon Pilgrim	18a3ebcd30	[CostModel][X86] Add some select shuffle costs tests for D87884	2020-09-21 16:09:05 +01:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Sam Parker	0af4147804	[ARM][CostModel] CodeSize costs for i1 arith ops When optimising for size, make the cost of i1 logical operations relatively expensive so that optimisations don't try to combine predicates. Differential Revision: https://reviews.llvm.org/D86525	2020-09-07 09:27:18 +01:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sam Parker	da4ada116e	[NFC][ARM] arith code size cost tests Add a run to measure the code size cost of arithmetic instructions and add a function for i1 types.	2020-08-25 11:16:01 +01:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Christopher Tetreault	5eff21c8ff	[NFC][documentation] clarify comment in test test referenced a relative path to a file, but the path was not correct relative to the project the test is in Differential Revision: https://reviews.llvm.org/D86368	2020-08-21 14:30:47 -07:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00

1 2 3 4 5 ...

664 Commits