llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Nikita Popov	e42636d3c1	[CVP] Add additional test for phi common val transform (NFC)	2021-05-24 17:28:38 +02:00
Nikita Popov	069174a634	[CVP] Add test for PR50399 (NFC)	2021-05-22 11:21:34 +02:00
Nikita Popov	db9d00c5e7	[LVI] Handle mask not equal zero conditions If V & Mask != 0, we know that at least one of the bits in Mask must be set, so the value must be >= the lowest bit in Mask.	2021-05-01 23:08:49 +02:00
Nikita Popov	7aafd104bf	[CVP] Add tests for mask not equal zero guard (NFC)	2021-05-01 23:08:48 +02:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Roman Lebedev	d610f05104	[NFC][CVP] Add tests for @llvm.[us]{min,max}() intrinsics	2021-04-11 00:33:47 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
Roman Lebedev	b6aaa8836e	[NFC][CVP] Add `@llvm.abs` test cases	2021-04-10 16:47:31 +03:00
Nikita Popov	fd73e4d4b2	[CVP] Add more tests for select with overdefined operand (NFC) Also check the case where one operand isn't constant, which isn't handled right now, because the SPF code requires both operands to be ranges. Move the tests to directly check ranges rather than go through an and, to make it more obvious that this has no relation to bitmasks.	2021-04-04 13:54:06 +02:00
Nikita Popov	72e0846ef8	[LVI] Don't bail on overdefined value in select Even if one of the operands is overdefined, we may still produce a non-overdefined result, e.g. due to a min/max operation. This matches our handling elsewhere, e.g. for binary operators. The slot poisoning comment refers to a much older LVI cache implementation.	2021-04-04 11:11:01 +02:00
Nikita Popov	3ac2541b5c	[CVP] Add test for and of min (NFC) The and currently doesn't get optimized away because %a is overdefined.	2021-04-04 11:11:01 +02:00
Nikita Popov	4a3e006830	[LVI] Use range metadata on intrinsics If we don't know how to handle an intrinsic, we should still make use of normal call range metadata.	2021-04-02 16:45:31 +02:00
Nikita Popov	93135091b1	[CVP] Add test for !range on intrinsic (NFC)	2021-04-02 16:45:30 +02:00
Nikita Popov	2b494f85f1	[CVP] Remove -cvp-dont-add-nowrap-flags option This option was originally added to work around a bug in LFTR. The bug has long since been fixed.	2021-03-07 18:19:31 +01:00
Nikita Popov	a917fb89dc	[LVI] Simplify and generalize handling of clamp patterns Instead of handling a number of special cases for selects, handle this generally when inferring ranges from conditions. We already infer ranges from `x + C pred C2` to `x`, so doing the same for `x pred C2` to `x + C` is straightforward.	2021-03-06 10:42:41 +01:00
Nikita Popov	906deaa0d9	[CVP] Add additional tests for clamp patterns (NFC) These are the same as the existing tests, but using different predicates that are not handled by the current code.	2021-03-06 10:42:40 +01:00
Nikita Popov	019ae8220f	[CVP] Fix tests for clamp patterns (NFC) These tests didn't test the pattern they were supposed to, because %a instead of %add was used in the select, which turned this into a normal min/max). Noticed this when commenting out the clamp handling code did not result in any test failures...	2021-03-06 10:24:44 +01:00
Nikita Popov	14e540febc	[LVI] Handle unions of conditions LVI previously handled "if (L && R)" conditions, but not "if (L \|\| R)" conditions. The latter case can still produce useful information if L and R both constrain the same variable. This adds support for handling the "if (L \|\| R)" case as well. The only difference is that we take the union instead of the intersection of the lattice values.	2021-01-01 16:46:21 +01:00
Nikita Popov	13b1c9abaf	[CVP] Add tests for union of conditions (NFC) We currently handle intersected conditions, but not unioned conditions.	2021-01-01 16:46:21 +01:00
Nikita Popov	0af42d3dc7	[PatternMatch][LVI] Handle select-form and/or in LVI Following the discussion in D93065, this adds m_LogicalAnd() and m_LogicalOr() matchers, that match A && B and A \|\| B logical operations, either as bitwise operations or select expressions. As an example usage, LVI is adapted to use these matchers for its condition reasoning. The plan here is to switch other parts of LLVM that reason about and/or of conditions to also support the select forms, and then merge D93065 (or a variant thereof) to disable the poison-unsafe select to and/or transform. Differential Revision: https://reviews.llvm.org/D93827	2020-12-27 17:39:02 +01:00
Nikita Popov	5bc5c016c4	[CVP] Add tests for select form of and/or (NFC) This tests their handling inside LVI. See D93065 for wider context.	2020-12-26 21:48:24 +01:00
Nikita Popov	22dba707b0	[AC] Handle (X+C1)<C2 assumes (PR48408) InstCombine canonicalizes X>C && X<C' style comparisons into (X+C1)<C2. This type of expression is recognized by some analyses like LVI, but currently not when used inside assumptions, because AssumptionCache does not track affected values for it.	2020-12-13 21:00:32 +01:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Nikita Popov	ff523aa441	[CVP] Add additional switch tests (NFC) These cover cases handled by getPredicateAt(), but not by the current implementation: * Assumes based on context instruction. * Value from phi node in same block (using per-pred reasoning). * Value from non-phi node in same block (using block-val reasoning).	2020-12-12 20:58:00 +01:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `10f2a0d662`. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Philip Reames	e46d74b589	[CVP] Allow two transforms in one invocation For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early. This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.	2020-09-28 15:11:42 -07:00
Nikita Popov	01bde7310b	[CVP] Remove unnecessary block splits in tests (NFC) These are no longer necessary since D69686.	2020-09-27 20:55:28 +02:00
Nikita Popov	fe79061be2	[LVI][CVP] Use block value when simplifying icmps Add a flag to getPredicateAt() that allows making use of the block value. This allows us to take into account range information from the current block, rather than only information that is threaded over edges, making the icmp simplification in CVP a lot more powerful. I'm not changing getPredicateAt() to use the block value unconditionally to avoid any impact on the JumpThreading pass, which is somewhat picky about LVI query order. Most test changes here are just icmps that now get dropped (while previously only a result used in a return was replaced). The three tests in icmp.ll show some representative improvements. Some of the folds this enables have been covered by IPSCCP in the meantime, but LVI can reason about some cases which are hard to support in IPSCCP, such as in test_br_cmp_with_offset. The compile-time time cost of doing this is fairly minimal, with a ~0.05% CTMark regression for ReleaseThinLTO: https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions This is because the block values will typically already be queried and cached by other CVP optimizations anyway. Differential Revision: https://reviews.llvm.org/D69686	2020-09-27 20:25:16 +02:00
Nikita Popov	4f6e11948c	[CVP] Make srem test more robust (NFC) D69686 will be able to determine that the icmp is always false. As this is not the purpose of the test, use a different modulus that doesn't trivialize the condition.	2020-09-27 18:57:07 +02:00
Nikita Popov	c8abf1c12d	[CVP] Pass context instruction when narrowing div/rem This fold was the only place not passing the context instruction. The tests worked around that fact by introducing a basic block split, which is now no longer necessary.	2020-09-27 17:51:30 +02:00
Roman Lebedev	b289dc5306	[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands This is practically identical to what we already do for UDiv/URem: https://rise4fun.com/Alive/04K Name: narrow udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow exact udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv exact i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow urem Pre: C0 u<= 255 && C1 u<= 255 %r = urem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = urem i8 %t0, %t1 %r = zext i8 %t2 to i16 ... only here we need to look for 'min signed bits', not 'active bits', and there's an UB to be aware of: https://rise4fun.com/Alive/KG86 https://rise4fun.com/Alive/LwR Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv exact i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = srem i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv exact i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = srem i8 %t0, %t1 %r = sext i8 %t2 to i16 The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks the logic, that we can losslessly truncate ConstantRange to `getMinSignedBits()` and signext it back, and it will be identical to the original CR. On vanilla llvm test-suite + RawSpeed, this fires 1262 times, while the same fold for UDiv/URem only fires 384 times. Sic! Additionally, this causes +606.18% (+1079) extra cases of aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145) of aggressive-instcombine.NumInstrsReduced folds.	2020-09-22 21:37:30 +03:00
Roman Lebedev	cb10d5d714	[NFC][CVP] Add tests for SDiv/SRem narrowing	2020-09-22 21:37:30 +03:00
Roman Lebedev	4eeeb356fc	[CVP] Enhance SRem -> URem fold to work not just on non-negative operands This is a continuation of `8d487668d0`, the logic is pretty much identical for SRem: Name: pos pos Pre: C0 >= 0 && C1 >= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, C1 Name: pos neg Pre: C0 >= 0 && C1 <= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, -C1 Name: neg pos Pre: C0 <= 0 && C1 >= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, C1 %r = sub i8 0, %t0 Name: neg neg Pre: C0 <= 0 && C1 <= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, -C1 %r = sub i8 0, %t0 https://rise4fun.com/Alive/Vd6 Now, this new logic does not result in any new catches as of vanilla llvm test-suite + RawSpeed. but it should be virtually compile-time free, and it may be important to be consistent in their handling, because if we had a pair of sdiv-srem, and only converted one of them, -divrempairs will no longer see them as a pair, and thus not "merge" them.	2020-09-22 21:37:28 +03:00
Roman Lebedev	36ea18b064	[NFC][CVP] Add tests for srem with potentially different sigdness domains	2020-09-22 21:37:28 +03:00
Nikita Popov	1a27238098	[CVP] Additional tests for comparison with offset (NFC) Both icmps have an additional offset here. We would fold this if the second one didn't.	2020-09-20 22:10:34 +02:00
Nikita Popov	445db89b53	[LVI] Get value range from mask comparison InstCombine likes to canonicalize comparisons of the form X == C \|\| X == C+1 into (X & -2) == C'. Make sure LVI can still recover the value range from this. Can of course also be useful for proper mask comparisons. For the sake of clarity, the implementation goes through KnownBits to compute the range.	2020-09-20 21:13:57 +02:00
Nikita Popov	91af6a78d0	[CVP] Add tests for mask comparisons (NFC)	2020-09-20 21:13:57 +02:00
Nikita Popov	cb392c870d	[CVP] Regenerate test checks (NFC)	2020-08-30 16:23:59 +02:00
Nikita Popov	6d88f6efd4	Reapply [LVI] Normalize pointer behavior This got reverted because a dependency was reverted. It has since been reapplied, so reapply this as well. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this more powerful than the previous implementation as a side-effect, and this does actually seem benefitial in practice. Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914	2020-08-29 21:17:03 +02:00
Nikita Popov	9ebeac6788	[ConstantRange][CVP] Make use of abs poison flag Pass the abs poison flag to the underlying ConstantRange implementation, allowing CVP to simplify based on it. Importantly, this recognizes that abs with poison flag is actually non-negative...	2020-07-30 23:06:10 +02:00
Nikita Popov	d8a98a9c35	[ConstantRange][CVP] Compute min/max/abs intrinsic ranges Wire up ConstantRange::intrinsic() to the existing primitives for min, max and abs. The poison flag on abs is not yet taken into account.	2020-07-30 22:21:34 +02:00
Nikita Popov	95d1e668ed	[CVP] Add tests for min/max/abs intrinsic comparisons (NFC)	2020-07-30 22:17:03 +02:00
Roman Lebedev	8d487668d0	[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands. Yes, if operands are non-positive this comes at the extra cost of two extra negations. But a. division is already just ridiculously costly, two more subtractions can't hurt much :) and b. we have better/more analyzes/folds for an unsigned division, we could end up narrowing it's bitwidth, converting it to lshr, etc. This is essentially a take two on `0fdcca07ad`, which didn't fix the potential regression i was seeing, because ValueTracking's computeKnownBits() doesn't make use of dominating conditions in it's analysis. While i could teach it that, this seems like the more general fix. This big hammer actually does catch said potential regression. Over vanilla test-suite + RawSpeed + darktable (10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more (+2%) SDiv's, the total instruction count at the end of middle-end pipeline is only +6, so out of +10 extra negations, ~half are folded away, and asm instr count is only +1, so practically speaking all extra negations are folded away and are therefore free. Sadly, all these new UDiv's remained, none folded away. But there are two less basic blocks. https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1	2020-07-18 17:59:56 +03:00
Roman Lebedev	7b16fd8a25	[NFC][CVP] Add tests for possible sdiv->udiv where operands are not non-negative Currently that fold requires both operands to be non-negative, but the only real requirement for the fold is that we must know the domains of the operands.	2020-07-18 17:59:31 +03:00
Nikita Popov	91836fd7f3	[LVI][CVP] Handle (x \| y) < C style conditions InstCombine may convert conditions like (x < C) && (y < C) into (x \| y) < C (for some C). This patch teaches LVI to recognize that in this case, it can infer either x < C or y < C along the edge. This fixes the issue reported at https://github.com/rust-lang/rust/issues/73827. Differential Revision: https://reviews.llvm.org/D82715	2020-07-01 20:43:24 +02:00
Nikita Popov	0f6afd946d	[CVP] Use different number in test (NFC) To make it clear that this is not intended to be specific to mask / bit tests.	2020-07-01 18:43:59 +02:00

1 2 3 4 5

215 Commits