llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Sanjay Patel	251dd7c0f9	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Roman Lebedev	b4916918e5	[SCEV] SCEVPtrToIntExpr simplifications If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown, we want to sink the SCEVPtrToIntExpr into an operand, so that the operation is performed on integers, and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89692	2020-10-30 11:13:35 +03:00
Roman Lebedev	81fc53a36a	[SCEV] Introduce SCEVPtrToIntExpr (PR46786) And use it to model LLVM IR's `ptrtoint` cast. This is essentially an alternative to D88806, but with no chance for all the problems it caused due to having the cast as implicit there. (see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3) As we've established by now, there are at least two reasons why we want this: * It will allow SCEV to actually model the `ptrtoint` casts and their operands, instead of treating them as `SCEVUnknown` * It should help with initial problem of PR46786 - this should eventually allow us to not loose pointer-ness of an expression in more cases As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 \| PR46786 ]], in principle, we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()` should sink the cast as far down into the expression as possible, so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`. But i think that it isn't the best solution, because it doesn't really matter from memory consumption side - there probably won't be that many `SCEVPtrToIntExpr`s for it to matter, and it allows for much better discoverability. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89456	2020-10-30 11:13:35 +03:00
Dávid Bolvanský	7a2abf5aca	[InferAttrs] Add nocapture/writeonly to string/mem libcalls One step closer to fix PR47644. Differential Revision: https://reviews.llvm.org/D89645	2020-10-29 20:06:43 +01:00
Sanjay Patel	d5a75e7738	[x86] add test for umul intrinsic costs; NFC	2020-10-29 12:12:52 -04:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Sanjay Patel	9df32c9044	[CostModel] remove cost-kind predicate for funnel shift costs Completing the series of FIXME removals for special-case intrinsics: `50dfa19cc7` `f2c25c7079` `c963bde015` `01ea93d85d` This one looks quite different than the others. The size/blended cost is still potentially very far off from the throughput cost, but this is hopefully not worse on the whole. It looks like the underlying costs for the expanded shift/logic have their own cost-kind limitations. Also, we are not asking the target if it has a legal funnel shift op, so we just assume that the intrinsic gets expanded.	2020-10-28 14:02:34 -04:00
Max Kazantsev	5ef84688fb	Re-enable "[SCEV] Prove implications of different type via truncation" When we need to prove implication of expressions of different type width, the default strategy is to widen everything to wider type and prove in this type. This does not interact well with AddRecs with negative steps and unsigned predicates: such AddRec will likely not have a `nuw` flag, and its `zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec in some cases can easily be proved to be an `AddRec` too. This patch introduces an alternative way to handling implications of different type widths. If we can prove that wider type values actually fit in the narrow type, we truncate them and prove the implication in narrow type. The return was due to revert of underlying patch that this one depends on. Unit test temporarily disabled because the required logic in SCEV is switched off due to compile time reasons. Differential Revision: https://reviews.llvm.org/D89548	2020-10-28 16:02:14 +07:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Max Kazantsev	624fc63a05	[SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3 We can sharpen the range of a AddRec if we know that it does not self-wrap and know the symbolic iteration count in the loop. If we can evaluate the value of AddRec on the last iteration and prove that at least one its intermediate value lies between start and end, then no-wrap flag allows us to conclude that all of them also lie between start and end. So the estimate of range can be improved to union of ranges of start and end. Switched off by default, can be turned on by flag. Differential Revision: https://reviews.llvm.org/D89381 Reviewed By: lebedev.ri, nikic	2020-10-28 12:39:41 +07:00
Sanjay Patel	50dfa19cc7	[CostModel] remove cost-kind predicate for FP add/mul vector reduction costs This was originally part of: `f2c25c7079` but that was reverted because there was an underlying bug in processing the vector type of these intrinsics. That was fixed with: `74ffc823ed` This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-27 18:00:20 -04:00
Sanjay Patel	138fda5dd2	[CostModel] add tests for FP reductions; NFC	2020-10-27 18:00:20 -04:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Shimin Cui	22e4346e05	[ValueTracking] Add tracking of the alignment assume bundle This patch is to add the support of the value tracking of the alignment assume bundle. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88669	2020-10-27 12:16:45 +00:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Joe Ellis	bf60bb26ec	[SVE] Fix TypeSize warning in llvm::getGEPInductionOperand We do not need to use the implicit cast here. We can instead can rely on a comparison between two TypeSize objects instead. This algorithm will work fine with scalable vectors. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D90146	2020-10-26 17:40:32 +00:00
Joe Ellis	0383a1a8c2	[SVE][AArch64] Fix TypeSize warning in GEP cost analysis The warning would fire when calling getGEPCost for analyzing the cost of a GEP instruction. This would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes the issue by using getKnownMinSize instead of the implicit cast. This is possible because the code is already scalable-vector aware. The semantic behaviour of the code is unchanged by this patch. Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D89872	2020-10-26 17:40:19 +00:00
Tyker	d3205bbca3	[Annotation] Allows annotation to carry some additional constant arguments. This allows using annotation in a much more contexts than it currently has. especially when annotation with template or constexpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D88645	2020-10-26 10:50:05 +01:00
Sanjay Patel	f2c25c7079	[CostModel] remove cost-kind predicate for some vector reduction costs This is a modified 2nd try of `22d10b8ab4` (reverted by `1c8371692d` because it managed to expose an existing crashing bug that should be fixed by `74ffc823` ). Original commit message: This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-25 15:17:52 -04:00
Sanjay Patel	74ffc823ed	[CostModel] fix operand/type accounting for fadd/fmul reductions I'm not sure if/how this ever worked, but it must not be tested currently because the basic tests added here were crashing as noted in the post-review comments for `1c83716` (which reverted another cost-model fix in `22d10b8ab4`).	2020-10-25 15:01:19 -04:00
Nikita Popov	ebeef022aa	[SCEV] Strenthen nowrap flags after constant folding for mul exprs Same change as `0dda633317`, but for mul expressions. We want to first fold any constant operans and then strengthen the nowrap flags, as we can compute more precise flags at that point.	2020-10-25 19:43:58 +01:00
Nikita Popov	1ff313f098	[SCEV] Always constant fold mul expression operands Establish parity with the handling of add expressions, by always constant folding mul expression operands before checking the depth limit (this is a non-recursive simplification). The code was already unconditionally constant folding the case where all operands were constants, but was not folding multiple constant operands together if there were also non-constant operands. This requires picking out a different demonstration for depth-based folding differences in the limit-depth.ll test.	2020-10-25 18:50:06 +01:00
Nikita Popov	0dda633317	[SCEV] Strength nowrap flags after constant folding We should first try to constant fold the add expression and only strengthen nowrap flags afterwards. This allows us to determine stronger flags if e.g. only two operands are left after constant folding (and thus "guaranteed no wrap region" code applies) or the resulting operands are non-negative and thus nsw->nuw strengthening applies.	2020-10-25 18:00:22 +01:00
Martin Storsjö	1c8371692d	Revert "[CostModel] remove cost-kind predicate for vector reduction costs" This reverts commit `22d10b8ab4`. This broke compilation e.g. like this: $ cat synth.c a; float b; c() { for (;;) { float d = -b a++; d -= --b * a++; d -= --b * a; d -= --b * a; e(d); } } $ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl _cl<To, const From>::doit(const From*) [with To = llvm::PointerType; Fr om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail ed.	2020-10-25 08:47:54 +02:00
Sanjay Patel	22d10b8ab4	[CostModel] remove cost-kind predicate for vector reduction costs This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-24 13:20:17 -04:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
Florian Hahn	089c1ccd6d	[AArch64] Add vector compare/select cost-model tests.	2020-10-23 20:43:04 +01:00
Florian Hahn	0fcc6f7a76	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Nikita Popov	1882568fcb	[BasicAA] Only add visited phi blocks temporarily Visited phi blocks only need to be added for the duration of the recursive alias queries, they should not leak into following code. Once again, while this also improves analysis precision, this is mainly intended to clarify the applicability scope of VisitedPhiBBs.	2020-10-22 22:26:29 +02:00
Nikita Popov	2b372570ee	[BasicAA] Don't track visited blocks for phi-phi alias query We only need the VisitedPhiBBs to disambiguate comparisons of values from two different loop iterations. If we're comparing two phis from the same basic block in lock-step, the compared values will always be on the same iteration. While this also increases precision, this is mainly intended to clarify the scope of VisitedPhiBBs.	2020-10-22 22:12:21 +02:00
Nikita Popov	17690ee79a	[BasicAA] Add additional phi tests (NFC)	2020-10-22 21:53:19 +02:00
Florian Hahn	c1705e0ba4	[AArch64] Add min/max cost-model tests for v2i32.	2020-10-22 16:04:13 +01:00
Florian Hahn	d6efc87518	[AArch64] Add min/max cost-model tests for v4i16.	2020-10-22 15:47:50 +01:00
Florian Hahn	fbb6375db0	[AArch64] Add cost model tests for min/max intrinsics.	2020-10-22 13:28:04 +01:00
Arthur Eubanks	55c4ff9860	[test] Fix tests using -analyze that fail under NPM Many of these tests don't use the output of -analyze.	2020-10-21 21:54:30 -07:00
Arthur Eubanks	7d6c3e509a	[test] Fix quadradic-exit-value.ll under NPM	2020-10-21 13:33:01 -07:00
Arthur Eubanks	1d1217c4ea	[test] Fix no-wrap-symbolic-becount.ll under NPM	2020-10-21 13:15:15 -07:00
Sanjay Patel	c963bde015	[CostModel] remove cost-kind predicate for scatter/gather cost This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant ARM could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. X86 has the same throughput restriction as the basic implementation, so it is still unchanged. Paraphrasing from the previous commit: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-21 14:26:05 -04:00
Sanjay Patel	729610a51a	[ARM] add cost-kind tests for intrinsics; NFC This is a copy of the x86 file to provide better coverage; x86 may have strange overrides that mask changes in the generic model.	2020-10-21 14:26:04 -04:00
Sanjay Patel	01ea93d85d	[CostModel] remove cost-kind predicate for memcpy cost The default implementation base returns TCC_Expensive (currently set to '4'), so that explains the test diff. This probably does not make sense for most callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. The ARM target has an override that tries to model codegen expansion, and that should likely be adapted for general usage. This probably does not affect anything because the vectorizers are the primary users of the throughput cost, but memcpy is not listed as a trivially vectorizable intrinsic.	2020-10-21 08:50:44 -04:00
Max Kazantsev	bed02fa8b0	Revert "[SCEV] Prove implications of different type via truncation" This reverts commit `80852a4f2f`. Test is now broken because underlying required patch was also reverted SUDDENLY.	2020-10-21 13:03:46 +07:00
Max Kazantsev	80852a4f2f	[SCEV] Prove implications of different type via truncation When we need to prove implication of expressions of different type width, the default strategy is to widen everything to wider type and prove in this type. This does not interact well with AddRecs with negative steps and unsigned predicates: such AddRec will likely not have a `nuw` flag, and its `zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec in some cases can easily be proved to be an `AddRec` too. This patch introduces an alternative way to handling implications of different type widths. If we can prove that wider type values actually fit in the narrow type, we truncate them and prove the implication in narrow type. Differential Revision: https://reviews.llvm.org/D89548 Reviewed By: fhahn	2020-10-21 12:53:22 +07:00
Fangrui Song	d9f91a3d14	Revert D89381 "[SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2" This reverts commit `a10a64e7e3`. It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll The difference suggests that this may be a serious issue.	2020-10-20 21:03:58 -07:00
Roman Lebedev	d1946469d6	[NFC][SCEV] Improve/rework test coverage for ptrtoint handling	2020-10-20 14:17:56 +03:00
sstefan1	fbfb1c7909	[IR] Make nosync, nofree and willreturn default for intrinsics. D70365 allows us to make attributes default. This is a follow up to actually make nosync, nofree and willreturn default. The approach we chose, for now, is to opt-in to default attributes to avoid introducing problems to target specific intrinsics. Intrinsics with default attributes can be created using `DefaultAttrsIntrinsic` class.	2020-10-20 11:57:19 +02:00
Max Kazantsev	a10a64e7e3	[SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2 Fixed wrapping range case & proof methods reduced to constant range checks to save compile time. Differential Revision: https://reviews.llvm.org/D89381	2020-10-20 11:32:36 +07:00

1 2 3 4 5 ...

2374 Commits