llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Evgeniy Brevnov	35e95c6817	[CVP] processCallSite returns wrong status Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100538	2021-04-19 12:13:22 +07:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Roman Lebedev	a407738def	[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction	2021-04-10 21:23:20 +03:00
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
Nikita Popov	2b494f85f1	[CVP] Remove -cvp-dont-add-nowrap-flags option This option was originally added to work around a bug in LFTR. The bug has long since been fixed.	2021-03-07 18:19:31 +01:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Philip Reames	e46d74b589	[CVP] Allow two transforms in one invocation For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early. This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.	2020-09-28 15:11:42 -07:00
Nikita Popov	fe79061be2	[LVI][CVP] Use block value when simplifying icmps Add a flag to getPredicateAt() that allows making use of the block value. This allows us to take into account range information from the current block, rather than only information that is threaded over edges, making the icmp simplification in CVP a lot more powerful. I'm not changing getPredicateAt() to use the block value unconditionally to avoid any impact on the JumpThreading pass, which is somewhat picky about LVI query order. Most test changes here are just icmps that now get dropped (while previously only a result used in a return was replaced). The three tests in icmp.ll show some representative improvements. Some of the folds this enables have been covered by IPSCCP in the meantime, but LVI can reason about some cases which are hard to support in IPSCCP, such as in test_br_cmp_with_offset. The compile-time time cost of doing this is fairly minimal, with a ~0.05% CTMark regression for ReleaseThinLTO: https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions This is because the block values will typically already be queried and cached by other CVP optimizations anyway. Differential Revision: https://reviews.llvm.org/D69686	2020-09-27 20:25:16 +02:00
Nikita Popov	9b959b59df	[LVI] Require context instruction in external API (NFCI) Require CxtI in getConstant() and getConstantRange() APIs. Accordingly drop the BB parameter, as it is implied by CxtI->getParent(). This makes sure we don't forget to pass the context instruction, and makes the API contract clearer (also clean up the comments to that effect -- the value holds at the context instruction, not the end of the block).	2020-09-27 18:07:24 +02:00
Nikita Popov	c8abf1c12d	[CVP] Pass context instruction when narrowing div/rem This fold was the only place not passing the context instruction. The tests worked around that fact by introducing a basic block split, which is now no longer necessary.	2020-09-27 17:51:30 +02:00
Martin Storsjö	b90132399a	[CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC.	2020-09-23 09:03:01 +03:00
Roman Lebedev	b289dc5306	[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands This is practically identical to what we already do for UDiv/URem: https://rise4fun.com/Alive/04K Name: narrow udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow exact udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv exact i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow urem Pre: C0 u<= 255 && C1 u<= 255 %r = urem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = urem i8 %t0, %t1 %r = zext i8 %t2 to i16 ... only here we need to look for 'min signed bits', not 'active bits', and there's an UB to be aware of: https://rise4fun.com/Alive/KG86 https://rise4fun.com/Alive/LwR Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv exact i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = srem i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv exact i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = srem i8 %t0, %t1 %r = sext i8 %t2 to i16 The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks the logic, that we can losslessly truncate ConstantRange to `getMinSignedBits()` and signext it back, and it will be identical to the original CR. On vanilla llvm test-suite + RawSpeed, this fires 1262 times, while the same fold for UDiv/URem only fires 384 times. Sic! Additionally, this causes +606.18% (+1079) extra cases of aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145) of aggressive-instcombine.NumInstrsReduced folds.	2020-09-22 21:37:30 +03:00
Roman Lebedev	4977eadee5	[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms	2020-09-22 21:37:30 +03:00
Roman Lebedev	ba5afe5588	[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits() As an exhaustive test shows, this logic is fully identical to the old implementation, with exception of the case where both of the operands had empty ranges: ``` TEST_F(ConstantRangeTest, CVP_UDiv) { unsigned Bits = 4; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) { if(CR0.isEmptySet()) return; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) { if(CR0.isEmptySet()) return; unsigned MaxActiveBits = 0; for (const ConstantRange &CR : {CR0, CR1}) MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits()); ConstantRange OperandRange(Bits, /isFullSet=/false); for (const ConstantRange &CR : {CR0, CR1}) OperandRange = OperandRange.unionWith(CR); unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits(); EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1; }); }); } ```	2020-09-22 21:37:29 +03:00
Roman Lebedev	4eeeb356fc	[CVP] Enhance SRem -> URem fold to work not just on non-negative operands This is a continuation of `8d487668d0`, the logic is pretty much identical for SRem: Name: pos pos Pre: C0 >= 0 && C1 >= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, C1 Name: pos neg Pre: C0 >= 0 && C1 <= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, -C1 Name: neg pos Pre: C0 <= 0 && C1 >= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, C1 %r = sub i8 0, %t0 Name: neg neg Pre: C0 <= 0 && C1 <= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, -C1 %r = sub i8 0, %t0 https://rise4fun.com/Alive/Vd6 Now, this new logic does not result in any new catches as of vanilla llvm test-suite + RawSpeed. but it should be virtually compile-time free, and it may be important to be consistent in their handling, because if we had a pair of sdiv-srem, and only converted one of them, -divrempairs will no longer see them as a pair, and thus not "merge" them.	2020-09-22 21:37:28 +03:00
Nikita Popov	25af353b0e	[NewPM][LVI] Abandon LVI after CVP As mentioned on D70376, LVI can currently cause performance issues when running under NewPM. The problem is that, unlike the legacy pass manager, NewPM will not immediately discard the LVI analysis if the following pass does not need it. This is a problem, because LVI has a high memory requirement, and mass invalidation of LVI values is very inefficient. LVI should only be alive during passes that actively interact with it. This patch addresses the issue by explicitly abandoning LVI after CVP, which gets us back to the LegacyPM behavior. Differential Revision: https://reviews.llvm.org/D84959	2020-08-01 23:47:46 +02:00
Roman Lebedev	9dceb32f30	[NFC][CVP] processSDiv(): pacify gcc compilers	2020-07-18 19:41:43 +03:00
Roman Lebedev	8d487668d0	[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands. Yes, if operands are non-positive this comes at the extra cost of two extra negations. But a. division is already just ridiculously costly, two more subtractions can't hurt much :) and b. we have better/more analyzes/folds for an unsigned division, we could end up narrowing it's bitwidth, converting it to lshr, etc. This is essentially a take two on `0fdcca07ad`, which didn't fix the potential regression i was seeing, because ValueTracking's computeKnownBits() doesn't make use of dominating conditions in it's analysis. While i could teach it that, this seems like the more general fix. This big hammer actually does catch said potential regression. Over vanilla test-suite + RawSpeed + darktable (10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more (+2%) SDiv's, the total instruction count at the end of middle-end pipeline is only +6, so out of +10 extra negations, ~half are folded away, and asm instr count is only +1, so practically speaking all extra negations are folded away and are therefore free. Sadly, all these new UDiv's remained, none folded away. But there are two less basic blocks. https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1	2020-07-18 17:59:56 +03:00
Roman Lebedev	45b7388824	[NFC][CVP] Rename predicates - s/positive/non negative/ to better note that zero is ok	2020-07-18 17:59:32 +03:00
Roman Lebedev	2cde6984d8	[NFC][CVP] Refactor isPositive() out of hasPositiveOperands()	2020-07-18 17:59:32 +03:00
Mircea Trofin	ceb7f308b8	[llvm][NFC][CallSite] Removed CallSite from few implementation details Reviewers: dblaikie, craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78724	2020-04-23 10:36:36 -07:00
Florian Hahn	b37543750c	[ValueLattice] Distinguish between constant ranges with/without undef. This patch updates ValueLattice to distinguish between ranges that are guaranteed to not include undef and ranges that may include undef. A constant range guaranteed to not contain undef can be used to simplify instructions to arbitrary values. A constant range that may contain undef can only be used to simplify to a constant. If the value can be undef, it might take a value outside the range. For example, consider the snipped below define i32 @f(i32 %a, i1 %c) { br i1 %c, label %true, label %false true: %a.255 = and i32 %a, 255 br label %exit false: br label %exit exit: %p = phi i32 [ %a.255, %true ], [ undef, %false ] %f.1 = icmp eq i32 %p, 300 call void @use(i1 %f.1) %res = and i32 %p, 255 ret i32 %res } In the exit block, %p would be a constant range [0, 256) including undef as %p could be undef. We can use the range information to replace %f.1 with false because we remove the compare, effectively forcing the use of the constant to be != 300. We cannot replace %res with %p however, because if %a would be undef %cond may be true but the second use might not be < 256. Currently LazyValueInfo uses the new behavior just when simplifying AND instructions and does not distinguish between constant ranges with and without undef otherwise. I think we should address the remaining issues in LVI incrementally. Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D76931	2020-03-31 12:50:20 +01:00
Enna1	03bc311a16	[CorrelatedValuePropagation] Remove redundant if statement in processSelect() This statement if (ReplaceWith == S) ReplaceWith = UndefValue::get(S->getType()); is introduced in https://reviews.llvm.org/rG35609d97ae89b8e13f40f4e6b9b056954f8baa83 to fix a case where unreachable code can cause select instruction simplification to fail. In https://reviews.llvm.org/rGd10480657527ffb44ea213460fb3676a6b1300aa, we begin to perform a depth-first walk of basic blocks. This means we will not visit unreachable blocks. So we do not need this the special check any more. Differential Revision: https://reviews.llvm.org/D76753	2020-03-28 18:01:17 +01:00
Nikita Popov	9d9633fb70	[CVP] Simplify cmp of local phi node CVP currently does not simplify cmps with instructions in the same block, because LVI getPredicateAt() currently does not provide much useful information for that case (D69686 would change that, but is stuck.) However, if the instruction is a Phi node, then LVI can compute the result of the predicate by threading it into the predecessor blocks, which allows it simplify some conditions that nothing else can handle. Relevant code: `6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)` Differential Revision: https://reviews.llvm.org/D72169	2020-02-26 20:36:41 +01:00
Reid Kleckner	05da2fe521	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211	2019-11-13 16:34:37 -08:00
Sanjay Patel	f2e93d10fe	[CVP] prevent propagating poison when substituting edge values into a phi (PR43802) This phi simplification transform was added with: D45448 However as shown in PR43802: https://bugs.llvm.org/show_bug.cgi?id=43802 ...we must be careful not to propagate poison when we do the substitution. There might be some more complicated analysis possible to retain the overflow flag, but it should always be safe and easy to drop flags (we have similar behavior in instcombine and other passes). Differential Revision: https://reviews.llvm.org/D69442	2019-10-28 08:58:28 -04:00
Roman Lebedev	7cd7f4a83b	[CVP] No-wrap deduction for `shl` Summary: This is the last `OverflowingBinaryOperator` for which we don't deduce flags. D69217 taught `ConstantRange::makeGuaranteedNoWrapRegion()` about it. The effect is better than of the `mul` patch (D69203): \| statistic \| old \| new \| delta \| % change \| \| correlated-value-propagation.NumAddNUW \| 7145 \| 7144 \| -1 \| -0.0140% \| \| correlated-value-propagation.NumAddNW \| 12126 \| 12125 \| -1 \| -0.0082% \| \| correlated-value-propagation.NumAnd \| 443 \| 446 \| 3 \| 0.6772% \| \| correlated-value-propagation.NumNSW \| 5986 \| 7158 \| 1172 \| 19.5790% \| \| correlated-value-propagation.NumNUW \| 10512 \| 13304 \| 2792 \| 26.5601% \| \| correlated-value-propagation.NumNW \| 16498 \| 20462 \| 3964 \| 24.0272% \| \| correlated-value-propagation.NumShlNSW \| 0 \| 1172 \| 1172 \| \| \| correlated-value-propagation.NumShlNUW \| 0 \| 2793 \| 2793 \| \| \| correlated-value-propagation.NumShlNW \| 0 \| 3965 \| 3965 \| \| \| instcount.NumAShrInst \| 13824 \| 13790 \| -34 \| -0.2459% \| \| instcount.NumAddInst \| 277584 \| 277586 \| 2 \| 0.0007% \| \| instcount.NumAndInst \| 66061 \| 66056 \| -5 \| -0.0076% \| \| instcount.NumBrInst \| 709153 \| 709147 \| -6 \| -0.0008% \| \| instcount.NumICmpInst \| 483709 \| 483708 \| -1 \| -0.0002% \| \| instcount.NumSExtInst \| 79497 \| 79496 \| -1 \| -0.0013% \| \| instcount.NumShlInst \| 40691 \| 40654 \| -37 \| -0.0909% \| \| instcount.NumSubInst \| 61997 \| 61996 \| -1 \| -0.0016% \| \| instcount.NumZExtInst \| 68208 \| 68211 \| 3 \| 0.0044% \| \| instcount.TotalBlocks \| 843916 \| 843910 \| -6 \| -0.0007% \| \| instcount.TotalInsts \| 7387528 \| 7387448 \| -80 \| -0.0011% \| Reviewers: nikic, reames, sanjoy, timshen Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69277 llvm-svn: 375455	2019-10-21 21:31:19 +00:00
Roman Lebedev	2927716277	[CVP] Deduce no-wrap on `mul` Summary: `ConstantRange::makeGuaranteedNoWrapRegion()` knows how to deal with `mul` since rL335646, there is exhaustive test coverage. This is already used by CVP's `processOverflowIntrinsic()`, and by SCEV's `StrengthenNoWrapFlags()` That being said, currently, this doesn't help much in the end: \| statistic \| old \| new \| delta \| percentage \| \| correlated-value-propagation.NumMulNSW \| 4 \| 275 \| 271 \| 6775.00% \| \| correlated-value-propagation.NumMulNUW \| 4 \| 1323 \| 1319 \| 32975.00% \| \| correlated-value-propagation.NumMulNW \| 8 \| 1598 \| 1590 \| 19875.00% \| \| correlated-value-propagation.NumNSW \| 5715 \| 5986 \| 271 \| 4.74% \| \| correlated-value-propagation.NumNUW \| 9193 \| 10512 \| 1319 \| 14.35% \| \| correlated-value-propagation.NumNW \| 14908 \| 16498 \| 1590 \| 10.67% \| \| instcount.NumAddInst \| 275871 \| 275869 \| -2 \| 0.00% \| \| instcount.NumBrInst \| 708234 \| 708232 \| -2 \| 0.00% \| \| instcount.NumMulInst \| 43812 \| 43810 \| -2 \| 0.00% \| \| instcount.NumPHIInst \| 316786 \| 316784 \| -2 \| 0.00% \| \| instcount.NumTruncInst \| 62165 \| 62167 \| 2 \| 0.00% \| \| instcount.NumUDivInst \| 2528 \| 2526 \| -2 \| -0.08% \| \| instcount.TotalBlocks \| 842995 \| 842993 \| -2 \| 0.00% \| \| instcount.TotalInsts \| 7376486 \| 7376478 \| -8 \| 0.00% \| (^ test-suite plain, tests still pass) Reviewers: nikic, reames, luqmana, sanjoy, timshen Reviewed By: reames Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69203 llvm-svn: 375396	2019-10-21 08:21:44 +00:00
Roman Lebedev	e695f4c851	[CVP] setDeducedOverflowingFlags(): actually inc per-opcode stats This is really embarrassing. Those are pointers, so that offsets the pointers, not the statistics pointed-by the pointer... llvm-svn: 375290	2019-10-18 21:19:26 +00:00
Roman Lebedev	284b6d7f4d	[CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap Summary: CVP, unlike InstCombine, does not run till exaustion. It only does a single pass. When dealing with those special binops, if we prove that they can safely be demoted into their usual binop form, we do set the no-wrap we deduced. But when dealing with usual binops, we try to deduce both no-wraps. So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`, we won't attempt to check whether it can be `add nuw nsw`. This patch proposes to call `processBinOp()` on newly-created binop, which is identical to what we do for div/rem already. Reviewers: nikic, spatel, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69183 llvm-svn: 375273	2019-10-18 19:32:47 +00:00
Roman Lebedev	fa0ac2558e	[NFC][CVP] Count all the no-wraps we proved Summary: It looks like this is the only missing statistic in the CVP pass. Since we prove NSW and NUW separately i'd think we should count them separately too. Reviewers: nikic, spatel, reames Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68740 llvm-svn: 375230	2019-10-18 13:20:16 +00:00
Philip Reames	2d5820cd72	[CVP] Remove a masking operation if range information implies it's a noop This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP. InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle. Differential Revision: https://reviews.llvm.org/D68811 llvm-svn: 374506	2019-10-11 03:48:56 +00:00
Roman Lebedev	354ba6985c	[CVP} Replace SExt with ZExt if the input is known-non-negative Summary: zero-extension is far more friendly for further analysis. While this doesn't directly help with the shift-by-signext problem, this is not unrelated. This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager): \| Statistic \| old \| new \| delta \| percent change \| \| correlated-value-propagation.NumSExt \| 0 \| 6026 \| 6026 \| +100.00% \| \| instcount.NumAddInst \| 272860 \| 271283 \| -1577 \| -0.58% \| \| instcount.NumAllocaInst \| 27227 \| 27226 \| -1 \| 0.00% \| \| instcount.NumAndInst \| 63502 \| 63320 \| -182 \| -0.29% \| \| instcount.NumAShrInst \| 13498 \| 13407 \| -91 \| -0.67% \| \| instcount.NumAtomicCmpXchgInst \| 1159 \| 1159 \| 0 \| 0.00% \| \| instcount.NumAtomicRMWInst \| 5036 \| 5036 \| 0 \| 0.00% \| \| instcount.NumBitCastInst \| 672482 \| 672353 \| -129 \| -0.02% \| \| instcount.NumBrInst \| 702768 \| 702195 \| -573 \| -0.08% \| \| instcount.NumCallInst \| 518285 \| 518205 \| -80 \| -0.02% \| \| instcount.NumExtractElementInst \| 18481 \| 18482 \| 1 \| 0.01% \| \| instcount.NumExtractValueInst \| 18290 \| 18288 \| -2 \| -0.01% \| \| instcount.NumFAddInst \| 139035 \| 138963 \| -72 \| -0.05% \| \| instcount.NumFCmpInst \| 10358 \| 10348 \| -10 \| -0.10% \| \| instcount.NumFDivInst \| 30310 \| 30302 \| -8 \| -0.03% \| \| instcount.NumFenceInst \| 387 \| 387 \| 0 \| 0.00% \| \| instcount.NumFMulInst \| 93873 \| 93806 \| -67 \| -0.07% \| \| instcount.NumFPExtInst \| 7148 \| 7144 \| -4 \| -0.06% \| \| instcount.NumFPToSIInst \| 2823 \| 2838 \| 15 \| 0.53% \| \| instcount.NumFPToUIInst \| 1251 \| 1251 \| 0 \| 0.00% \| \| instcount.NumFPTruncInst \| 2195 \| 2191 \| -4 \| -0.18% \| \| instcount.NumFSubInst \| 92109 \| 92103 \| -6 \| -0.01% \| \| instcount.NumGetElementPtrInst \| 1221423 \| 1219157 \| -2266 \| -0.19% \| \| instcount.NumICmpInst \| 479140 \| 478929 \| -211 \| -0.04% \| \| instcount.NumIndirectBrInst \| 2 \| 2 \| 0 \| 0.00% \| \| instcount.NumInsertElementInst \| 66089 \| 66094 \| 5 \| 0.01% \| \| instcount.NumInsertValueInst \| 2032 \| 2030 \| -2 \| -0.10% \| \| instcount.NumIntToPtrInst \| 19641 \| 19641 \| 0 \| 0.00% \| \| instcount.NumInvokeInst \| 21789 \| 21788 \| -1 \| 0.00% \| \| instcount.NumLandingPadInst \| 12051 \| 12051 \| 0 \| 0.00% \| \| instcount.NumLoadInst \| 880079 \| 878673 \| -1406 \| -0.16% \| \| instcount.NumLShrInst \| 25919 \| 25921 \| 2 \| 0.01% \| \| instcount.NumMulInst \| 42416 \| 42417 \| 1 \| 0.00% \| \| instcount.NumOrInst \| 100826 \| 100576 \| -250 \| -0.25% \| \| instcount.NumPHIInst \| 315118 \| 314092 \| -1026 \| -0.33% \| \| instcount.NumPtrToIntInst \| 15933 \| 15939 \| 6 \| 0.04% \| \| instcount.NumResumeInst \| 2156 \| 2156 \| 0 \| 0.00% \| \| instcount.NumRetInst \| 84485 \| 84484 \| -1 \| 0.00% \| \| instcount.NumSDivInst \| 8599 \| 8597 \| -2 \| -0.02% \| \| instcount.NumSelectInst \| 45577 \| 45913 \| 336 \| 0.74% \| \| instcount.NumSExtInst \| 84026 \| 78278 \| -5748 \| -6.84% \| \| instcount.NumShlInst \| 39796 \| 39726 \| -70 \| -0.18% \| \| instcount.NumShuffleVectorInst \| 100272 \| 100292 \| 20 \| 0.02% \| \| instcount.NumSIToFPInst \| 29131 \| 29113 \| -18 \| -0.06% \| \| instcount.NumSRemInst \| 1543 \| 1543 \| 0 \| 0.00% \| \| instcount.NumStoreInst \| 805394 \| 804351 \| -1043 \| -0.13% \| \| instcount.NumSubInst \| 61337 \| 61414 \| 77 \| 0.13% \| \| instcount.NumSwitchInst \| 8527 \| 8524 \| -3 \| -0.04% \| \| instcount.NumTruncInst \| 60523 \| 60484 \| -39 \| -0.06% \| \| instcount.NumUDivInst \| 2381 \| 2381 \| 0 \| 0.00% \| \| instcount.NumUIToFPInst \| 5549 \| 5549 \| 0 \| 0.00% \| \| instcount.NumUnreachableInst \| 9855 \| 9855 \| 0 \| 0.00% \| \| instcount.NumURemInst \| 1305 \| 1305 \| 0 \| 0.00% \| \| instcount.NumXorInst \| 10230 \| 10081 \| -149 \| -1.46% \| \| instcount.NumZExtInst \| 60353 \| 66840 \| 6487 \| 10.75% \| \| instcount.TotalBlocks \| 829582 \| 829004 \| -578 \| -0.07% \| \| instcount.TotalFuncs \| 83818 \| 83817 \| -1 \| 0.00% \| \| instcount.TotalInsts \| 7316574 \| 7308483 \| -8091 \| -0.11% \| TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`. To be noted, clearly, not all new `zext`'s are produced by this fold. (And now i guess it might have been interesting to measure this for D68103 :S) Reviewers: nikic, spatel, reames, dberlin Reviewed By: nikic Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68654 llvm-svn: 374112	2019-10-08 20:29:48 +00:00
Nikita Popov	b9e668f2e7	[CVP] Generate simpler code for elided with.overflow intrinsics Use a { iN undef, i1 false } struct as the base, and only insert the first operand, instead of using { iN undef, i1 undef } as the base and inserting both. This is the same as what we do in InstCombine. Differential Revision: https://reviews.llvm.org/D67034 llvm-svn: 370573	2019-08-31 09:58:37 +00:00
David Bolvansky	d2904ccf88	Let CorrelatedValuePropagation preserve LazyValueInfo Summary: This patch makes CorrelatedValuePropagation preserve LazyValueInfo by adding LazyValueInfo::eraseValue & calling it whenever an instruction is erased. Passes `make check` , test-suite, and SPECrate 2017. Patch by aqjune (Juneyoung Lee) Reviewers: reames, mzolotukhin Reviewed By: reames Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59349 llvm-svn: 366942	2019-07-24 20:27:32 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Nikita Popov	f1ffc4305d	[CVP] Reenable nowrap flag inference Inference of nowrap flags in CVP has been disabled, because it triggered a bug in LFTR (https://bugs.llvm.org/show_bug.cgi?id=31181). This issue has been fixed in D60935, so we should be able to reenable nowrap flag inference now. Differential Revision: https://reviews.llvm.org/D62776 llvm-svn: 364228	2019-06-24 20:13:13 +00:00
Yevgeny Rouban	a3e16719c4	Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst" This reverts commit `5b32f60ec3`. The fix is in commit `4f9e68148b`. This patch fixes the CorrelatedValuePropagation pass to keep prof branch_weights metadata of SwitchInst consistent. It makes use of SwitchInstProfUpdateWrapper. New tests are added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D62126 llvm-svn: 362583	2019-06-05 05:46:40 +00:00
Nikita Popov	7bafae55c0	Reapply [CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Reapplying this with an additional check that the saturating intrinsic has integer type, as LVI currently does not support vector types. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362263	2019-05-31 20:48:26 +00:00
Nikita Popov	23a02f6a5f	[CVP] Fix assertion failure on vector with.overflow Noticed on D62703. LVI only handles plain integers, not vectors of integers. This was previously not an issue, because vector support for with.overflow is only a relatively recent addition. llvm-svn: 362261	2019-05-31 20:42:07 +00:00
Nikita Popov	ccb63e0bfe	Revert "[CVP] Simplify non-overflowing saturating add/sub" This reverts commit `1e692d1777`. Causes assertion failure in builtins-wasm.c clang test. llvm-svn: 362254	2019-05-31 19:04:47 +00:00
Nikita Popov	1e692d1777	[CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362242	2019-05-31 16:46:05 +00:00
Nikita Popov	e906f2a370	[CVP] Generalize willNotOverflow(); NFC Change argument from WithOverflowInst to BinaryOpIntrinsic, so this function can also be used for saturating math intrinsics. llvm-svn: 362152	2019-05-30 21:03:10 +00:00
Nikita Popov	5b32f60ec3	Revert "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst" This reverts commit `53f2f32865`. As reported on D62126, this causes assertion failures if the switch has incorrect branch_weights metadata, which may happen as a result of other transforms not handling it correctly yet. llvm-svn: 361881	2019-05-28 21:28:24 +00:00
Yevgeny Rouban	53f2f32865	[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst This patch fixes the CorrelatedValuePropagation pass to keep prof branch_weights metadata of SwitchInst consistent. It makes use of SwitchInstProfUpdateWrapper. New tests are added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D62126 llvm-svn: 361808	2019-05-28 11:33:50 +00:00
Nikita Popov	8b1fa07639	[CVP] Remove unnecessary checks for empty GNWR; NFC The guaranteed no-wrap region is never empty, it always contains at least zero, so these optimizations don't ever apply. To make this more obviously true, replace the conversative return in makeGNWR with an assertion. llvm-svn: 361698	2019-05-25 14:11:55 +00:00

1 2 3 4

170 Commits