llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	0845ac7331	[InstCombine] add another test for gep inbounds; NFC llvm-svn: 374190	2019-10-09 17:52:26 +00:00
Roman Lebedev	7cdeac43e5	[InstCombine] Fold conditional sign-extend of high-bit-extract into high-bit-extract-with-signext (PR42389) This can come up in Bit Stream abstractions. The pattern looks big/scary, but it can't be simplified any further. It only is so simple because a number of my preparatory folds had happened already (shift amount reassociation / shift amount reassociation in bit test, sign bit test detection). Highlights: * There are two main flavors: https://rise4fun.com/Alive/zWi The difference is add vs. sub, and left-shift of -1 vs. 1 * Since we only change the shift opcode, we can preserve the exact-ness: https://rise4fun.com/Alive/4u4 * There can be truncation after high-bit-extraction: https://rise4fun.com/Alive/slHc1 (the main pattern i'm after!) Which means that we need to ignore zext of shift amounts and of NBits. * The sign-extending magic can be extended itself (in add pattern via sext, in sub pattern via zext. not the other way around!) https://rise4fun.com/Alive/NhG (or those sext/zext can be sinked into `select`!) Which again means we should pay attention when matching NBits. * We can have both truncation of extraction and widening of magic: https://rise4fun.com/Alive/XTw In other words, i don't believe we need to have any checks on bitwidths of any of these constructs. This is worsened in general by the fact that we may have `sext` instead of `zext` for shift amounts, and we don't yet canonicalize to `zext`, although we should. I have not done anything about that here. Also, we really should have something to weed out `sub` like these, by folding them into `add` variant. https://bugs.llvm.org/show_bug.cgi?id=42389 llvm-svn: 373964	2019-10-07 20:53:27 +00:00
Roman Lebedev	3da71714cb	[InstCombine][NFC] Tests for "conditional sign-extend of high-bit-extract" pattern (PR42389) https://bugs.llvm.org/show_bug.cgi?id=42389 llvm-svn: 373963	2019-10-07 20:53:16 +00:00
Roman Lebedev	c3b394ffba	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): propagate undef shift amounts Summary: When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`, which means that for patterns a/b we'd assume that we must not produce any bits for that channel, while in reality we simply didn't care about that channel - i.e. we don't need to mask it. Reviewers: spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68239 llvm-svn: 373960	2019-10-07 20:52:52 +00:00
Sanjay Patel	aab8b3ab9c	[InstCombine] fold fneg disguised as select+fmul (PR43497) Extends rL373230 and solves the motivating bug (although in a narrow way): https://bugs.llvm.org/show_bug.cgi?id=43497 llvm-svn: 373851	2019-10-06 14:15:48 +00:00
Sanjay Patel	61c22a83de	[InstCombine] add fast-math-flags for better test coverage; NFC llvm-svn: 373848	2019-10-06 13:19:05 +00:00
Sanjay Patel	c38881a6b7	[InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform (PR43501) https://bugs.llvm.org/show_bug.cgi?id=43501 We can't declare a GEP 'inbounds' in general. But we may salvage that information if we have known dereferenceable bytes on the source pointer. Differential Revision: https://reviews.llvm.org/D68244 llvm-svn: 373847	2019-10-06 13:08:08 +00:00
Roman Lebedev	fb5af8b9b9	[InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0' We do indeed already get it right in some cases, but only transitively, with one-use restrictions. Since we only need to produce a single comparison, it makes sense to match the pattern directly: https://rise4fun.com/Alive/kPg llvm-svn: 373802	2019-10-04 22:16:22 +00:00
Roman Lebedev	f304d4d185	[InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391) Initially (D65380) i believed that if we have rightshift-trunc-rightshift, we can't do any folding. But as it usually happens, i was wrong. https://rise4fun.com/Alive/GEw https://rise4fun.com/Alive/gN2O In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have this very sequence, of two right shifts separated by trunc. And "just" so that happens, we apparently can fold the pattern if the total shift amount is either 0, or it's equal to the bitwidth of the innermost widest shift - i.e. if we are left with only the original sign bit. Which is exactly what is wanted there. llvm-svn: 373801	2019-10-04 22:16:11 +00:00
Roman Lebedev	ae738641d5	[NFC][InstCombine] Autogenerate shift.ll test llvm-svn: 373800	2019-10-04 22:15:57 +00:00
Roman Lebedev	007452532b	[NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll test llvm-svn: 373799	2019-10-04 22:15:49 +00:00
Roman Lebedev	3c56cc920f	[NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ trunc) (PR43564) https://rise4fun.com/Alive/x5IS llvm-svn: 373798	2019-10-04 22:15:41 +00:00
Roman Lebedev	6a954748c8	[NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ trunc) (PR43564, PR42391) https://rise4fun.com/Alive/GEw llvm-svn: 373797	2019-10-04 22:15:32 +00:00
Sanjay Patel	6e312388b6	[InstCombine] add tests for fneg disguised as fmul; NFC llvm-svn: 373788	2019-10-04 20:54:14 +00:00
Kevin P. Neal	68b8052121	[FPEnv] Strict FP tests should use the requisite function attributes. A set of function attributes is required in any function that uses constrained floating point intrinsics. None of our tests use these attributes. This patch fixes this. These tests have been tested against the IR verifier changes in D68233. Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand Approved by: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D67925 llvm-svn: 373761	2019-10-04 17:03:46 +00:00
Roman Lebedev	c780645736	[NFC][InstCombine] Some tests for sub-of-negatible pattern As we have previously estabilished, `sub` is an outcast, and should be considered non-canonical iff it can be converted to `add`. It can be converted to `add` if it's second operand can be negated. So far we mostly only do that for constants and negation itself, but we should be more through. llvm-svn: 373597	2019-10-03 13:36:00 +00:00
Roman Lebedev	ae3315af07	[InstCombine] Bypass high bit extract before variable sign-extension (PR43523) https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext https://rise4fun.com/Alive/7jk - the variable sext can be redundant https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver https://rise4fun.com/Alive/IF63 - without trunc we could view this as more general "drop redundant mask before right-shift", but let's handle it here for now https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant. There's more patterns for sure - e.g. we can have 'lshr' as the final shift, but that might be best handled by some more generic transform, e.g. "drop redundant masking before right-shift" (PR42456) I'm singling-out this sext patch because you can only extract high bits with `shr` (unlike abstract bit masking), and i know* this fold is wanted by existing code. I don't believe there is much to review here, so i'm gonna opt into post-review mode here. https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373542	2019-10-02 23:02:12 +00:00
Roman Lebedev	29339149c3	[NFC][InstCombine] Add tests for 'variable sext of variable high bit extract' pattern (PR43523) https://bugs.llvm.org/show_bug.cgi?id=43523 llvm-svn: 373541	2019-10-02 23:01:58 +00:00
David Bolvansky	6b45029676	[InstCombine] Transform bcopy to memmove bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here. llvm-svn: 373537	2019-10-02 22:49:20 +00:00
Florian Hahn	f2ffa7a1c0	[InstCombine] Precommit tests for D68265 llvm-svn: 373458	2019-10-02 12:32:37 +00:00
Roman Lebedev	053014f8f9	[InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63) Identical to it's trunc-less variant, just pretent-to hoist trunc, and everything else still holds: https://rise4fun.com/Alive/JRU llvm-svn: 373364	2019-10-01 17:50:20 +00:00
Roman Lebedev	65144149d0	[InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) fold https://rise4fun.com/Alive/yR4 llvm-svn: 373363	2019-10-01 17:50:09 +00:00
Roman Lebedev	f273fc793a	[NFC][InstCombine] (Better) tests for sign-bit-smearing pattern https://rise4fun.com/Alive/JRU https://rise4fun.com/Alive/yR4 <- we can preserve 'exact' llvm-svn: 373362	2019-10-01 17:49:58 +00:00
David Bolvansky	4037582d6b	Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX) Seems to be slower than memcpy + strlen. llvm-svn: 373335	2019-10-01 13:19:04 +00:00
David Bolvansky	8fc6a1bf56	[InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX) llvm-svn: 373333	2019-10-01 13:03:10 +00:00
Evandro Menezes	110b1138ba	[InstCombine] Expand the simplification of log() Expand the simplification of special cases of `log()` to include `log2()` and `log10()` as well as intrinsics and more types. Differential revision: https://reviews.llvm.org/D67199 llvm-svn: 373261	2019-09-30 20:52:21 +00:00
Roman Lebedev	0205be8f12	[NFC][InstCombine] Redundant-left-shift-input-masking: add some more undef tests llvm-svn: 373248	2019-09-30 19:15:51 +00:00
Sanjay Patel	712b7c2463	[InstCombine] fold negate disguised as select+mul Name: negate if true %sel = select i1 %cond, i32 -1, i32 1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %m, i32 %x Name: negate if false %sel = select i1 %cond, i32 1, i32 -1 %r = mul i32 %sel, %x => %m = sub i32 0, %x %r = select i1 %cond, i32 %x, i32 %m https://rise4fun.com/Alive/Nlh llvm-svn: 373230	2019-09-30 17:02:26 +00:00
Sanjay Patel	8913882fa2	[InstCombine] add tests for negate disguised as mul; NFC llvm-svn: 373222	2019-09-30 15:43:27 +00:00
Roman Lebedev	269f1bea0d	[InstCombine] Simplify shift-by-sext to shift-by-zext Summary: This is valid for any `sext` bitwidth pair: ``` Processing /tmp/opt.ll.. ---------------------------------------- %signed = sext %y %r = shl %x, %signed ret %r => %unsigned = zext %y %r = shl %x, %unsigned ret %r %signed = sext %y Done: 2016 Optimization is correct! ``` (This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.) Main motivation is the C++ semantics: ``` int shl(int a, char b) { return a << b; } ``` ends as ``` %3 = sext i8 %1 to i32 %4 = shl i32 %0, %3 ``` https://godbolt.org/z/0jgqUq which is, as this shows, too pessimistic. There is another problem here - we can only do the fold if sext is one-use. But we can trivially have cases where several shifts have the same sext shift amount. This should be resolved, later. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: efriedma, hiraditya, nlopes, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68103 llvm-svn: 373106	2019-09-27 18:12:15 +00:00
Roman Lebedev	0956480459	[NFC][InstCombine] Revisit shift-by-signext tests llvm-svn: 373055	2019-09-27 09:09:15 +00:00
Roman Lebedev	86b40b0bbf	[InstCombine][NFC] Add tests for shift-by-signext llvm-svn: 373013	2019-09-26 20:49:30 +00:00
Roman Lebedev	d1ef2e48fb	[InstCombine][NFC] Regenerate load-cmp.ll test llvm-svn: 373012	2019-09-26 20:49:21 +00:00
David Bolvansky	f1a5a93157	[NFC] Precommit tests for D68089 llvm-svn: 373006	2019-09-26 19:01:18 +00:00
Craig Topper	46721bb7f5	[InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop. The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop. The fix is to use m_Zero() to allow a vector of zeroes and undefs. Differential Revision: https://reviews.llvm.org/D67977 llvm-svn: 373000	2019-09-26 17:20:50 +00:00
Bjorn Pettersson	163c54d288	[InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant Summary: Removing an assumption (assert) that the CmpInst already has been simplified in getFlippedStrictnessPredicateAndConstant. Solution is to simply bail out instead of hitting the assertion. Instead we assume that any profitable rewrite will happen in the next iteration of InstCombine. The reason why we can't assume that the CmpInst already has been simplified is that the worklist does not guarantee such an ordering. Solves https://bugs.llvm.org/show_bug.cgi?id=43376 Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68022 llvm-svn: 372972	2019-09-26 12:16:01 +00:00
Roman Lebedev	a2fa03af3a	[InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251) https://rise4fun.com/Alive/0j9 llvm-svn: 372930	2019-09-25 22:59:59 +00:00
Roman Lebedev	ca524621d1	[NFC][InstCombine] Tests for 'base u<= offset && (base - offset) != 0' pattern (PR43251) llvm-svn: 372929	2019-09-25 22:59:48 +00:00
Roman Lebedev	914a3d1cf2	[InstSimplify] Handle more 'A </>/>=/<= B &&/\|\| (A - B) !=/== 0' patterns (PR43251) https://rise4fun.com/Alive/sl9s https://rise4fun.com/Alive/2plN https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 372928	2019-09-25 22:59:41 +00:00
Florian Hahn	d663efe23a	[InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul Because we do not constant fold multiplications in SimplifyFMAMul, we match 1.0 and 0.0 for both operands, as multiplying by them is guaranteed to produce an exact result (if it is allowed to do so). Note that it is not enough to just swap the operands to ensure a constant is on the RHS, as we want to also cover the case with 2 constants. Reviewers: lebedev.ri, spatel, reames, scanon Reviewed By: lebedev.ri, reames Differential Revision: https://reviews.llvm.org/D67553 llvm-svn: 372915	2019-09-25 19:33:26 +00:00
Roman Lebedev	23646952e2	[InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0 https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912	2019-09-25 19:06:40 +00:00
Roman Lebedev	dfda7d2d90	[NFC][InstCombine] Add tests for (X - Y) < X --> Y <= X iff Y != 0 https://rise4fun.com/Alive/KtL This should go to InstCombiner::foldICmpBinO(), next to "Convert sub-with-unsigned-overflow comparisons into a comparison of args." llvm-svn: 372911	2019-09-25 19:06:26 +00:00
Florian Hahn	f3ab99dcf8	[InstCombine] Limit FMul constant folding for fma simplifications. As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899	2019-09-25 17:03:20 +00:00
Philip Reames	d9629b88ff	[GCRelocate] Add a peephole to canonicalize base pointer relocation If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting. llvm-svn: 372771	2019-09-24 17:24:16 +00:00
Roman Lebedev	45fd1e9d50	[InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. For ``` #include <cassert> char* test(char& base, signed long offset) { __builtin_assume(offset < 0); return &base + offset; } ``` We produce https://godbolt.org/z/r40U47 and again those two icmp's can be merged: ``` Name: 0 Pre: C != 0 %adjusted = add i8 %base, C %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ult i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, C %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/ALap https://rise4fun.com/Alive/slnN There are 3 other variants of this pattern, i believe they all will go into InstSimplify. https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67849 llvm-svn: 372768	2019-09-24 16:10:50 +00:00
Roman Lebedev	5b881f356c	[InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. This pattern isn't exactly what we get there (strict vs. non-strict predicate), but this pattern does not require known-bits analysis, so it is best to handle it first. ``` Name: 0 %adjusted = add i8 %base, %offset %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ule i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, %offset %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/knp There are 3 other variants of this pattern, they all will go into InstSimplify: https://rise4fun.com/Alive/bIDZ https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: hiraditya, majnemer, vsk, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67846 llvm-svn: 372767	2019-09-24 16:10:38 +00:00
Huihui Zhang	a4dd98f2e9	[InstCombine] Fold a shifty implementation of clamp-to-allones. Summary: Fold or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? -1 : X https://rise4fun.com/Alive/d8Ab clamp255 is a common operator in image processing, can be implemented in a shifty way "(255 - X) >> 31 \| X & 255". Fold shift into select enables more optimization, e.g., vmin generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67800 llvm-svn: 372678	2019-09-24 00:30:09 +00:00
Huihui Zhang	8952199715	[InstCombine] Fold a shifty implementation of clamp-to-zero. Summary: Fold and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? X : 0 https://rise4fun.com/Alive/lFH Fold shift into select enables more optimization, e.g., vmax generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67799 llvm-svn: 372676	2019-09-24 00:15:03 +00:00
Huihui Zhang	5b5f1c8efd	[NFC][InstCombine] Add tests for shifty implementation of clamping. Summary: Clamp negative to zero and clamp positive to allOnes are common operation in image saturation. Add tests for shifty implementation of clamping, as prepare work for folding: and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> 0 ? X : 0; or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> Y ? allOnes : X. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67798 llvm-svn: 372671	2019-09-23 23:48:32 +00:00
David Bolvansky	48db0272d6	[InstCombine] Annotate strndup calls with dereferenceable_or_null "Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size." llvm-svn: 372647	2019-09-23 19:55:45 +00:00
David Bolvansky	8d52016155	[SLC] Convert some strndup calls to strdup calls Summary: Motivation: - If we can fold it to strdup, we should (strndup does more things than strdup). - Annotation mechanism. (Works for strdup well). strdup and strndup are part of C 20 (currently posix fns), so we should optimize them. Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67679 llvm-svn: 372636	2019-09-23 18:20:01 +00:00
Roman Lebedev	0a51e1f66d	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563) Summary: If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`, we already know (have a fold) that will drop the `& (-1 >> maskNbits)` mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`). So even if `(shiftNbits-maskNbits) s< 0`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: c, normal+mask %t0 = lshr i32 -1, C1 %t1 = and i32 %t0, %x %r = shl i32 %t1, C2 => %n0 = shl i32 %x, C2 %n1 = i32 ((-(C2-C1))+32) %n2 = zext i32 %n1 to i64 %n3 = lshr i64 -1, %n2 %n4 = trunc i64 %n3 to i32 %r = and i32 %n0, %n4 ``` https://rise4fun.com/Alive/gslRa Naturally, old `%masked` will have to be one-use. This is not valid for pattern f - where "masking" is done via `ashr`. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67725 llvm-svn: 372630	2019-09-23 17:04:28 +00:00
Roman Lebedev	b4a1d8a84c	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563) Summary: And this is finally the interesting part of that fold! If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`, we already know (have a fold) that will drop the `& (~(-1 << maskNbits))` mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`. But that is actually ignorant, there's more general fold here: In this pattern, `(maskNbits+shiftNbits)` actually correlates with the number of low bits that will remain in the final value. So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: a, normal+mask %onebit = shl i32 -1, C1 %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %r = shl i32 %masked, C2 => %n0 = shl i32 %x, C2 %n1 = add i32 C1, C2 %n2 = zext i32 %n1 to i64 %n3 = shl i64 -1, %n2 %n4 = xor i64 %n3, -1 %n5 = trunc i64 %n4 to i32 %r = and i32 %n0, %n5 ``` https://rise4fun.com/Alive/F5R Naturally, old `%masked` will have to be one-use. Similar fold exists for patterns c,d,e, will post patch later. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67677 llvm-svn: 372629	2019-09-23 17:04:14 +00:00
Sanjay Patel	eb8d39e113	[InstCombine] allow icmp+binop folds before min/max bailout (PR43310) This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510	2019-09-22 14:31:53 +00:00
Sanjay Patel	d2a524288d	[InstCombine] add tests for icmp fold hindered by min/max; NFC llvm-svn: 372509	2019-09-22 14:23:22 +00:00
Roman Lebedev	081eebc58f	[NFC][InstCombine] Fixup newly-added tests llvm-svn: 372413	2019-09-20 17:43:46 +00:00
Roman Lebedev	d21087af95	[InstCombine] Tests for (a+b)<=a && (a+b)!=0 fold (PR43259) https://rise4fun.com/Alive/knp https://rise4fun.com/Alive/ALap llvm-svn: 372402	2019-09-20 15:06:47 +00:00
Roman Lebedev	7a67ed5795	[InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base - offset; } ``` it will end up producing something like https://godbolt.org/z/luGEju which after optimizations reduces down to roughly ``` declare void @use64(i64) define i1 @test(i8* dereferenceable(1) %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = sub i64 %base_int, %offset call void @use64(i64 %adjusted) %not_null = icmp ne i64 %adjusted, 0 %no_underflow = icmp ule i64 %adjusted, %base_int %no_underflow_and_not_null = and i1 %not_null, %no_underflow ret i1 %no_underflow_and_not_null } ``` Without D67122 there was no `%not_null`, and in this particular case we can "get rid of it", by merging two checks: Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset` Alive proofs: https://rise4fun.com/Alive/QOs The `@llvm.usub.with.overflow` pattern itself is not handled here because this is the main pattern, that we currently consider canonical. https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00, majnemer Reviewed By: xbolva00, majnemer Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67356 llvm-svn: 372341	2019-09-19 17:25:19 +00:00
Roman Lebedev	b646dd92c2	[InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251) Summary: I don't have a direct motivational case for this, but it would be good to have this for completeness/symmetry. This pattern is basically the motivational pattern from https://bugs.llvm.org/show_bug.cgi?id=43251 but with different predicate that requires that the offset is non-zero. The completeness bit comes from the fact that a similar pattern (offset != zero) will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259, so it'd seem to be good to not overlook very similar patterns.. Proofs: https://rise4fun.com/Alive/21b Also, there is something odd with `isKnownNonZero()`, if the non-zero knowledge was specified as an assumption, it didn't pick it up (PR43267) With this, i see no other missing folds for https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67412 llvm-svn: 372257	2019-09-18 20:10:07 +00:00
Roman Lebedev	8b719a3b8a	[NFC][InstCombine] More tests for PR42563 "Dropping pointless masking before left shift" For patterns c/d/e we too can deal with the pattern even if we can't just drop the mask, we can just apply it afterwars: https://rise4fun.com/Alive/gslRa llvm-svn: 372244	2019-09-18 18:38:32 +00:00
Sanjay Patel	d46bf63fbb	[SimplifyLibCalls] fix crash with empty function name (PR43347) ...and improve some variable names while here. https://bugs.llvm.org/show_bug.cgi?id=43347 llvm-svn: 372227	2019-09-18 14:33:40 +00:00
Roman Lebedev	bed6e08e23	[NFC][InstCombine] More tests for "Dropping pointless masking before left shift" (PR42563) While we already fold that pattern if the sum of shift amounts is not smaller than bitwidth, there's painfully obvious generalization: https://rise4fun.com/Alive/F5R I.e. the "sub of shift amounts" tells us how many bits will be left in the output. If it's less than bitwidth, we simply need to apply a mask, which is constant. llvm-svn: 372170	2019-09-17 19:32:11 +00:00
David Bolvansky	0c0de794f1	Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)" llvm-svn: 372142	2019-09-17 17:12:24 +00:00
Krasimir Georgiev	bdff164e0e	Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)" Summary: This reverts commit r372101. Causes ASAN build bot failures: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176 From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio: ``` [ RUN ] AddressSanitizer.StrNCatOOBTest /home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure Death test: strncat(to - 1, from, 0) Result: failed to die. ``` Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67658 llvm-svn: 372125	2019-09-17 14:15:23 +00:00
David Bolvansky	ded48e93e6	[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y) llvm-svn: 372101	2019-09-17 10:25:38 +00:00
David Bolvansky	be2487a2ba	[InstCombine] Annotate strdup with deref_or_null llvm-svn: 372098	2019-09-17 10:12:48 +00:00
David Bolvansky	e80fcf0340	[SimplifyLibCalls] Mark known arguments with nonnull Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D53342 llvm-svn: 372091	2019-09-17 09:32:52 +00:00
Sanjay Patel	3961a143e1	[InstCombine] remove unneeded one-use checks for icmp fold Related folds were added in: rL125734 ...the code comment about register pressure is discussed in more detail in: https://bugs.llvm.org/show_bug.cgi?id=2698 But 10 years later, perf testing bzip2 with this change now shows a slight (0.2% average) improvement on Haswell although that's probably within test noise. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 and rL371981 are related patches in this series. llvm-svn: 372007	2019-09-16 16:15:25 +00:00
Sanjay Patel	4d9d0f9cf5	[InstCombine] move tests for icmp+add; NFC llvm-svn: 372004	2019-09-16 15:33:40 +00:00
Sanjay Patel	f201b1c918	[InstCombine] add/move tests for icmp with add operand; NFC llvm-svn: 371988	2019-09-16 14:05:19 +00:00
Sanjay Patel	c5cd808156	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 <https://reviews.llvm.org/rL125734> ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 is a related patch in this series. llvm-svn: 371981	2019-09-16 12:54:34 +00:00
Sanjay Patel	14ce3fde04	[InstCombine] add icmp tests with extra uses; NFC llvm-svn: 371979	2019-09-16 12:19:18 +00:00
Sanjay Patel	3daf168fa9	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. There are similar checks as noted with the TODO comments. I'm hoping to remove those restrictions too, but if any of these does cause a regression, it should be easier to correct by making small, individual commits. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. llvm-svn: 371940	2019-09-15 20:56:34 +00:00
Sanjay Patel	c77ad16f8e	[InstCombine] add icmp tests with extra uses; NFC llvm-svn: 371939	2019-09-15 20:13:27 +00:00
Evandro Menezes	08df6e64d5	[ConstantFolding] Expand folding of some library functions Expanding the folding of `nearbyint()`, `rint()` and `trunc()` to library functions, in addition to the current support for intrinsics. Differential revision: https://reviews.llvm.org/D67468 llvm-svn: 371774	2019-09-12 21:23:22 +00:00
Sanjay Patel	458c2759b1	[InstCombine] add tests for fptrunc; NFC llvm-svn: 371750	2019-09-12 18:00:11 +00:00
Sanjay Patel	62ad62fb98	[InstCombine] reduce test noise and regenerate CHECK lines; NFC llvm-svn: 371746	2019-09-12 17:07:01 +00:00
Roman Lebedev	80a8a85758	[InstCombine][InstSimplify] Move constant-folding tests in result-of-usub-is-non-zero-and-no-overflow.ll llvm-svn: 371737	2019-09-12 14:12:31 +00:00
Roman Lebedev	b3e0937f0a	[NFC][InstCombine][InstSimplify] Add test for "add-of-negative is non-zero and no overflow" (PR43259) https://rise4fun.com/Alive/ska https://rise4fun.com/Alive/9iX https://bugs.llvm.org/show_bug.cgi?id=43259 llvm-svn: 371736	2019-09-12 14:12:20 +00:00
Florian Hahn	51de22c8ee	Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma. This introduces additional rounding error in some cases. See D67434. This reverts r371518 (git commit `18a1f0818b`) llvm-svn: 371634	2019-09-11 16:17:03 +00:00
Sanjay Patel	80bea345d1	[InstCombine] fold sign-bit compares of srem (srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking off the sign bit and the module (low) bits: https://rise4fun.com/Alive/jSO A '2' divisor allows slightly more folding: https://rise4fun.com/Alive/tDBM Any chance to remove an 'srem' use is probably worthwhile, but this is limited to the one-use improvement case because doing more may expose other missing folds. That means it does nothing for PR21929 yet: https://bugs.llvm.org/show_bug.cgi?id=21929 Differential Revision: https://reviews.llvm.org/D67334 llvm-svn: 371610	2019-09-11 12:04:26 +00:00
David Bolvansky	af5ba2873f	[NFC] Updated objsize-64.ll test llvm-svn: 371604	2019-09-11 10:51:26 +00:00
David Bolvansky	57ebb50a0a	[NFC] Fixed test llvm-svn: 371603	2019-09-11 10:42:30 +00:00
David Bolvansky	4dae283cd3	[InstCombine] Fixed handling of isOpNewLike (PR11748) llvm-svn: 371602	2019-09-11 10:37:03 +00:00
Tim Renouf	c26b3940c3	[TLI][AMDGPU] AMDPAL does not have library functions Configure TLI to say that r600/amdgpu does not have any library functions, such that InstCombine does not do anything like turn sin/cos into the library function @tan with sufficient fast math flags. Differential Revision: https://reviews.llvm.org/D67406 Change-Id: I02f907d3e64832117ea9800e9f9285282856e5df llvm-svn: 371592	2019-09-11 07:26:39 +00:00
Alexey Lapshin	6b1c6c1287	[Debuginfo][Instcombiner] Do not clone dbg.declare. TryToSinkInstruction() has a bug: While updating debug info for sunk instruction, it could clone dbg.declare intrinsic. That is wrong. There could be only one dbg.declare. The fix is to not clone dbg.declare intrinsic and to update it`s arguments, to not to point to sunk instruction. Differential Revision: https://reviews.llvm.org/D67217 llvm-svn: 371587	2019-09-11 06:07:16 +00:00
Roman Lebedev	16f5605382	[NFC][InstCombine] rewrite test added in r371537 to use non-null pointer instead I only want to ensure that %offset is non-zero there, it doesn't matter how that info is conveyed. As filed in PR43267, the assumption way does not work. llvm-svn: 371550	2019-09-10 19:30:17 +00:00
Roman Lebedev	880657c97c	[NFC][InstCombine][InstSimplify] PR43251 - and some patterns with offset != 0 https://rise4fun.com/Alive/21b llvm-svn: 371537	2019-09-10 17:13:59 +00:00
Roman Lebedev	7dfd0fb7f1	[NFC][InstCombine] PR43251 - valid for other predicates too llvm-svn: 371519	2019-09-10 13:29:40 +00:00
Florian Hahn	18a1f0818b	[InstCombine] Use SimplifyFMulInst to simplify multiply in fma. This allows us to fold fma's that multiply with 0.0. Also, the multiply by 1.0 case is handled there as well. The fneg/fabs cases are not handled by SimplifyFMulInst, so we need to keep them. Reviewers: spatel, anemet, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D67351 llvm-svn: 371518	2019-09-10 13:10:28 +00:00
Florian Hahn	8886d0134e	[InstCombine] Precommit tests for D67351. llvm-svn: 371517	2019-09-10 13:05:34 +00:00
Roman Lebedev	59608c0049	[NFC][InstCombine] Fixup test i added in rL371352. llvm-svn: 371401	2019-09-09 14:27:39 +00:00
Roman Lebedev	139a9d6c0e	[InstCombine][NFC] Some tests for usub overflow+nonzero check improvement (PR43251) https://rise4fun.com/Alive/kHq https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 371352	2019-09-08 21:30:34 +00:00
Sanjay Patel	354a46444c	[InstCombine] add tests for icmp with srem operand; NFC llvm-svn: 371348	2019-09-08 19:48:47 +00:00
Sanjay Patel	aff5bee35f	[InstCombine] fold extract+insert into identity shuffle This is similar to the existing fold for splats added with: rL365379 If we can adjust the shuffle mask to include another element in an identity mask (if it changes vector length, that's an extract/insert subvector operation in the backend), then that can eliminate extractelement/insertelement pairs in IR. All targets are expected to lower shuffles with identity masks efficiently. llvm-svn: 371340	2019-09-08 19:03:01 +00:00
JF Bastien	52614dfc7f	[InstCombine] pow(x, +/- 0.0) -> 1.0 Summary: This isn't an important optimization at all... We're already doing: pow(x, 0.0) -> 1.0 My patch merely teaches instcombine that -0.0 does the same. However, doing this fixes an AMAZING bug! Compile this program: extern "C" double pow(double, double); double boom(double base) { return pow(base, -0.0); } With: clang++ ~/Desktop/fast-math.cpp -ffast-math -O2 -S And clang will crash with a signal. Wow, fast math is so fast it ICEs the compiler! Arguably, the generated math is infinitely fast. What's actually happening is that we recurse infinitely in getPow. In debug we hit its assertion: assert(Exp != 0 && "Incorrect exponent 0 not handled"); We avoid this entire mess if we instead recognize that an exponent of positive and negative zero yield 1.0. A separate commit, r371221, fixed the same problem. This only contains the added tests. <rdar://problem/54598300> Reviewers: scanon Subscribers: hiraditya, jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67248 llvm-svn: 371224	2019-09-06 16:26:59 +00:00
Sanjay Patel	4f0e429acc	[SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233) https://bugs.llvm.org/show_bug.cgi?id=43233 llvm-svn: 371221	2019-09-06 16:10:18 +00:00
Matt Arsenault	524a9d5774	InstCombine: Fix crash on icmp of gep with addrspacecasted null llvm-svn: 371146	2019-09-05 23:39:21 +00:00
Roman Lebedev	071ce66729	[NFC][InstCombine] Overhaul 'unsigned add overflow' tests, ensure that all 3 patterns have full test coverage llvm-svn: 371108	2019-09-05 19:13:15 +00:00
Roman Lebedev	8360c42e25	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101	2019-09-05 17:41:02 +00:00
Roman Lebedev	ecb7ea1ae7	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100	2019-09-05 17:40:49 +00:00
Roman Lebedev	1d9e0dcc9d	[InstCombine][NFC] Tests for 'unsigned sub overflow' check ---------------------------------------- Name: unsigned sub, overflow, v0 %sub = sub i8 %x, %y %ov = icmp ugt i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned sub, no overflow, v0 %sub = sub i8 %x, %y %ov = icmp ule i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371099	2019-09-05 17:40:37 +00:00
Roman Lebedev	745046c23f	[InstCombine][NFC] Tests for 'unsigned add overflow' check ---------------------------------------- Name: unsigned add, overflow, v0 %add = add i8 %x, %y %ov = icmp ult i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, overflow, v1 %add = add i8 %x, %y %ov = icmp ult i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v0 %add = add i8 %x, %y %ov = icmp uge i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v1 %add = add i8 %x, %y %ov = icmp uge i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371098	2019-09-05 17:40:28 +00:00
Evandro Menezes	bf78e39cbb	[InstCombine] Add more test cases (NFC) Add more test cases simplifying `log()`. llvm-svn: 370966	2019-09-04 20:01:09 +00:00
David Bolvansky	420cbb6190	[InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Summary: ``` Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) %or = or i32 %y, %x %xor = xor i32 %x, %y %sub = sub i32 %xor, %or => %sub1 = and i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/8OI Reviewers: lebedev.ri Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67188 llvm-svn: 370945	2019-09-04 18:03:21 +00:00
David Bolvansky	f6233d90f0	[NFC] Added tests for new fold llvm-svn: 370941	2019-09-04 17:37:06 +00:00
David Bolvansky	2ceb00db76	[NFC] Adjust test filename llvm-svn: 370939	2019-09-04 17:33:53 +00:00
David Bolvansky	0e07248704	[InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B) Summary: ``` Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %and, %or => %sub1 = xor i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/VI6 Found by @lebedev.ri. Also author of the proof. Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D67155 llvm-svn: 370934	2019-09-04 17:30:53 +00:00
Alexey Lapshin	cbf1f3b771	[Debuginfo][SROA] Need to handle dbg.value in SROA pass. SROA pass processes debug info incorrecly if applied twice. Specifically, after SROA works first time, instcombine converts dbg.declare intrinsics into dbg.value. Inlining creates new opportunities for SROA, so it is called again. This time it does not handle correctly previously inserted dbg.value intrinsics. Differential Revision: https://reviews.llvm.org/D64595 llvm-svn: 370906	2019-09-04 14:19:49 +00:00
Sanjay Patel	791949afe5	[InstCombine] add tests for insert/extract with identity shuffles; NFC llvm-svn: 370901	2019-09-04 13:38:49 +00:00
David Bolvansky	b9e9478244	[NFC] Added a negative test for new fold llvm-svn: 370890	2019-09-04 12:46:25 +00:00
David Bolvansky	13dadedc29	[NFC] Fixed test llvm-svn: 370888	2019-09-04 12:43:14 +00:00
David Bolvansky	3747c48d64	[NFC] Adjust tests for new fold llvm-svn: 370886	2019-09-04 12:22:28 +00:00
David Bolvansky	163b05b45d	[NFC] Added tests for new fold llvm-svn: 370885	2019-09-04 12:18:53 +00:00
David Bolvansky	358b80b340	[InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B) Summary: ``` Name: sub or and to xor %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %or, %and => %sub = xor i32 %x, %y Optimization: sub or and to xor Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/eJu Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67153 llvm-svn: 370883	2019-09-04 12:00:33 +00:00
David Bolvansky	54f3a651f3	[NFC] Added a new test for D67153 llvm-svn: 370881	2019-09-04 11:44:00 +00:00
David Bolvansky	75d734475a	[NFC] Added tests for 'SUB of OR and AND to XOR' fold llvm-svn: 370878	2019-09-04 11:17:08 +00:00
Sanjay Patel	561c39994b	[InstCombine] recognize bswap disguised as shufflevector bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N8} --> bswap (bitcast X to i{N8}) In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 ...we have a more complicated case where SLP is making a mess of bswap. This patch won't do anything for that currently, but we need to improve bswap recognition in instcombine, SLP, and/or a standalone pass to avoid that problem. This is limited using the data-layout so we don't try to do this transform with actual vector types. The backend does not appear to have folds to convert in either direction, so we don't want to mess up something that is actually better lowered as a shuffle. On x86, we're trading something like this: vmovd %edi, %xmm0 vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u] vmovd %xmm0, %eax For: movl %edi, %eax bswapl %eax Differential Revision: https://reviews.llvm.org/D66965 llvm-svn: 370659	2019-09-02 13:33:20 +00:00
David Bolvansky	ff0ad3c43d	[InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n Summary: Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization. GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n. https://godbolt.org/z/dOCG96 Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert Reviewed By: efriedma Subscribers: hjl.tools, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65737 llvm-svn: 370593	2019-08-31 18:19:05 +00:00
Piotr Sobczak	67b979466a	[InstCombine][AMDGPU] Simplify tbuffer loads Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 llvm-svn: 370475	2019-08-30 14:20:04 +00:00
Sanjay Patel	33541fafde	[InstCombine] add possible bswap as widening shuffle test; NFC Goes with the proposal in D66965. llvm-svn: 370407	2019-08-29 20:57:50 +00:00
Sanjay Patel	63411910a2	[InstCombine] add tests for bswap disguised as shuffle; NFC Somewhat motivating case In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 But that's a lot more complicated. llvm-svn: 370381	2019-08-29 16:48:00 +00:00
Roman Lebedev	473a063a5e	[InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` $ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll Processing /home/lebedevri/llvm-patch1.ll.. ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp ne i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %r = extractvalue {i4, i1} %n0, 1 ret %r Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow %o0 = mul i4 %y, %x %o1 = udiv i4 %o0, %x %r = icmp eq i4 %o1, %y ret i1 %r => %n0 = umul_overflow i4 %x, %y %o0 = extractvalue {i4, i1} %n0, 0 %o1 = udiv %o0, %x %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 ret i1 %r Done: 1 Optimization is correct! ``` Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65144 llvm-svn: 370348	2019-08-29 12:47:20 +00:00
Roman Lebedev	fb38b7aab3	[InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction Summary: `(-1 u/ %x) u< %y` is one of (3?) common ways to check that some unsigned multiplication (will not) overflow. Currently, we don't catch it. We could: ``` ---------------------------------------- Name: no overflow %o0 = udiv i4 -1, %x %r = icmp ult i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: no overflow, swapped %o0 = udiv i4 -1, %x %r = icmp ugt i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %r = extractvalue {i4, i1} %n0, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp uge i4 %o0, %y => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: overflow %o0 = udiv i4 -1, %x %r = icmp ule i4 %y, %o0 => %o0 = udiv i4 -1, %x %n0 = umul_overflow i4 %x, %y %n1 = extractvalue {i4, i1} %n0, 1 %r = xor %n1, -1 Done: 1 Optimization is correct! ``` As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow` is easy, if we were looking for the inverted answer, then more work needs to be done to cleanup the now-pointless control-flow that was guarding against division-by-zero. This is being addressed in follow-up patches. Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon Reviewed By: nikic, xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65143 llvm-svn: 370347	2019-08-29 12:47:08 +00:00
Roman Lebedev	f13b0e3ed8	[InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399) Summary: Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j I.e. we can perform the fold if any of the following is true: * The shift amount is either zero or one less than widest bitwidth * Either of the values being shifted has at most lowest bit set * The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount; * The value that is being shifted by `lshr` (which is truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one I strongly suspect there is some better generalization, but i'm not aware of it as of right now. For now i also avoided using actual `computeKnownBits()`, but restricted it to constants. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66383 llvm-svn: 370324	2019-08-29 10:26:23 +00:00
David Bolvansky	420327269e	[NFC] Added more tests for D66651 llvm-svn: 370222	2019-08-28 16:00:15 +00:00
Craig Topper	f79d8a064c	[InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs Due to missing vector support in this function, recursion can generate worse code in some cases. llvm-svn: 370221	2019-08-28 15:40:34 +00:00
David Bolvansky	207c653965	[NFC] Unbreak tests llvm-svn: 370170	2019-08-28 08:42:40 +00:00
David Bolvansky	a0a8dd225d	[NFC] Updated test llvm-svn: 370169	2019-08-28 08:40:45 +00:00
David Bolvansky	05bda8b4e5	Annotate return values of allocation functions with dereferenceable_or_null Summary: Example define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6 ret i8* %call } Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: aaron.ballman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66651 llvm-svn: 370168	2019-08-28 08:28:20 +00:00
Craig Topper	5bbb604bb5	[InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions. llvm-svn: 370114	2019-08-27 21:38:56 +00:00
Craig Topper	33585ddf14	[Analysis] Improve EmitGEPOffset handling of vector GEPs with scalar indices. This patch splats the scalar index if necessary before using it in any integer casts or other arithmetic. llvm-svn: 370112	2019-08-27 21:31:44 +00:00
David Bolvansky	aec6884e88	[NFC] Added tests for D66651 llvm-svn: 370046	2019-08-27 11:41:03 +00:00
David Bolvansky	0c2692108c	[InstCombine] Fold select with ctlz to cttz Summary: Handle pattern [0]: int ctz(unsigned int a) { int c = __clz(a & -a); return a ? 31 - c : c; } In reality, the compiler can generate much better code for cttz, so fold away this pattern. https://godbolt.org/z/c5kPtV [0] https://community.arm.com/community-help/f/discussions/2114/count-trailing-zeros Reviewers: spatel, nikic, lebedev.ri, dmgreen, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66308 llvm-svn: 370037	2019-08-27 10:22:40 +00:00
Vitaly Buka	aeca56964f	msan, codegen, instcombine: Keep more lifetime markers used for msan Reviewers: eugenis Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D66695 llvm-svn: 369979	2019-08-26 22:15:50 +00:00
Philip Reames	b92c971099	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well. Differential Revision: https://reviews.llvm.org/D66671 llvm-svn: 369949	2019-08-26 19:11:49 +00:00
Roman Lebedev	de19f749e0	[InstCombine] matchThreeWayIntCompare(): commutativity awareness Summary: `matchThreeWayIntCompare()` looks for ``` select i1 (a == b), i32 Equal, i32 (select i1 (a < b), i32 Less, i32 Greater) ``` but both of these selects/compares can be in it's commuted form, so out of 8 variants, only the two most basic ones is handled. This fixes regression being introduced in D66232. Reviewers: spatel, nikic, efriedma, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66607 llvm-svn: 369841	2019-08-24 06:49:36 +00:00
Roman Lebedev	2c75fe7f2a	[InstCombine] Try to reuse constant from select in leading comparison Summary: If we have e.g.: ``` %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 ``` the constants `65535` and `65536` are suspiciously close. We could perform a transformation to deduplicate them: ``` Name: ult %t = icmp ult i32 %x, 65536 %r = select i1 %t, i32 %y, i32 65535 => %t.inv = icmp ugt i32 %x, 65535 %r = select i1 %t.inv, i32 65535, i32 %y ``` https://rise4fun.com/Alive/avb While this may seem esoteric, this should certainly be good for vectors (less constant pool usage) and for opt-for-size - need to have only one constant. But the real fun part here is that it allows further transformation, in particular it finishes cleaning up the `clamp` folding, see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`. We start with e.g. ``` %dont_need_to_clamp_positive = icmp sle i32 %X, 32767 %dont_need_to_clamp_negative = icmp sge i32 %X, -32768 %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767 %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit ``` without this patch we currently produce ``` %1 = icmp slt i32 %X, 32768 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1, i32 %3, i32 32767 ``` which isn't really a `clamp` - both comparisons are performed on the original value, this patch changes it into ``` %1.inv = icmp sgt i32 %X, 32767 %2 = icmp sgt i32 %X, -32768 %3 = select i1 %2, i32 %X, i32 -32768 %R = select i1 %1.inv, i32 32767, i32 %3 ``` and then the magic happens! Some further transform finishes polishing it and we finally get: ``` %t1 = icmp sgt i32 %X, -32768 %t2 = select i1 %t1, i32 %X, i32 -32768 %t3 = icmp slt i32 %t2, 32767 %R = select i1 %t3, i32 %t2, i32 32767 ``` which is beautiful and just what we want. Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization: https://rise4fun.com/Alive/THl Proofs for the fold itself: https://rise4fun.com/Alive/THl Reviewers: spatel, dmgreen, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66232 llvm-svn: 369840	2019-08-24 06:49:25 +00:00
Roman Lebedev	b3eccc7f0b	[InstCombine][NFC] reuse-constant-from-select-in-icmp.ll - revisit tests llvm-svn: 369839	2019-08-24 06:49:11 +00:00
Vitaly Buka	d60271a1ad	NFC: Rename lifetime-asan.ll -> lifetime-sanitizer.ll llvm-svn: 369831	2019-08-24 01:44:39 +00:00
Philip Reames	9cb059fdcc	Fix a bug in just submitted rL369789 Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :( Also adding an extra vector test as it happened to be in workspace and wasn't worth separating. llvm-svn: 369795	2019-08-23 18:27:57 +00:00
Philip Reames	5b02cfa0b3	[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another. The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison. Differential Revision: https://reviews.llvm.org/D66608 llvm-svn: 369789	2019-08-23 17:58:58 +00:00
Roman Lebedev	dddc0fd9cb	[NFC][InstCombine] Fixup few new tests in unrecognized_three-way-comparison.ll llvm-svn: 369701	2019-08-22 20:34:56 +00:00
Peter Collingbourne	2452d7030b	IR. Change strip* family of functions to not look through aliases. I noticed another instance of the issue where references to aliases were being replaced with aliasees, this time in InstCombine. In the instance that I saw it turned out to be only a QoI issue (a symbol ended up being missing from the symbol table due to the last reference to the alias being removed, preventing HWASAN from symbolizing a global reference), but it could easily have manifested as incorrect behaviour. Since this is the third such issue encountered (previously: D65118, D65314) it seems to be time to address this common error/QoI issue once and for all and make the strip* family of functions not look through aliases. Includes a test for the specific issue that I saw, but no doubt there are other similar bugs fixed here. As with D65118 this has been tested to make sure that the optimization isn't load bearing. I built Clang, Chromium for Linux, Android and Windows as well as the test-suite and there were no size regressions. Differential Revision: https://reviews.llvm.org/D66606 llvm-svn: 369697	2019-08-22 19:56:14 +00:00
Roman Lebedev	1aeb27af22	[NFC][InstCombine] New tests: unrecognized_three-way-comparison.ll is ignorant about commutative variants part 2 llvm-svn: 369696	2019-08-22 19:53:23 +00:00
Roman Lebedev	41f89c3484	[NFC][InstCombine] New tests: unrecognized_three-way-comparison.ll is ignorant about commutative variants D66232 "exposes" the problem. llvm-svn: 369667	2019-08-22 16:46:16 +00:00
Philip Reames	3c4614ff10	Add a couple of extra test noticed in post-commit discussion of rL369541 llvm-svn: 369546	2019-08-21 16:57:53 +00:00
Philip Reames	764b0fd5a3	[instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0 Noticed while looking at pr43028. llvm-svn: 369541	2019-08-21 15:51:57 +00:00
Sanjay Patel	e728259278	[InstCombine] narrow icmp with extended operands of different widths An intermediate extend is used to widen the narrow operand to the width of the other (wider) operand. At that point, we have the same logic as the existing transform that was restricted to folds of equal width zext/sext. This mostly solves PR42700: https://bugs.llvm.org/show_bug.cgi?id=42700 llvm-svn: 369519	2019-08-21 11:56:08 +00:00
Sanjay Patel	d5035727ad	[InstCombine] add more extra use tests for icmp with extends; NFC llvm-svn: 369447	2019-08-20 21:23:28 +00:00
Sanjay Patel	48e81e8e10	[InstCombine] add tests for mismatched cast ops for icmp; NFC Motivating case is shown in PR42700: https://bugs.llvm.org/show_bug.cgi?id=42700 llvm-svn: 369439	2019-08-20 20:51:50 +00:00
Sanjay Patel	f99d254aae	[InstCombine] simplify min/max of min/max with same operands (PR35607) This is the original integer variant requested in: https://bugs.llvm.org/show_bug.cgi?id=35607 As noted in the TODO and several similar TODOs around this block, we could do this in instsimplify, but then it would cost more because we would be trying to match min/max via ValueTracking in 2 different places. There are 4 commuted variants for each of smin/smax/umin/umax that are not matched here. There are also icmp predicate variants that are not included in the affected test file because they are already handled by instsimplify by folding the final icmp to true/false. https://rise4fun.com/Alive/3KVc Name: smax(smax, smin) %c1 = icmp slt i32 %x, %y %c2 = icmp slt i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp sgt i32 %max, %min %r = select i1 %c3, i32 %max, i32 %min => %r = %max Name: smin(smax, smin) %c1 = icmp slt i32 %x, %y %c2 = icmp slt i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp sgt i32 %max, %min %r = select i1 %c3, i32 %min, i32 %max => %r = %min Name: umax(umax, umin) %c1 = icmp ult i32 %x, %y %c2 = icmp ult i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp ult i32 %min, %max %r = select i1 %c3, i32 %max, i32 %min => %r = %max Name: umin(umax, umin) %c1 = icmp ult i32 %x, %y %c2 = icmp ult i32 %y, %x %min = select i1 %c1, i32 %x, i32 %y %max = select i1 %c2, i32 %x, i32 %y %c3 = icmp ult i32 %min, %max %r = select i1 %c3, i32 %min, i32 %max => %r = %min llvm-svn: 369386	2019-08-20 13:39:17 +00:00
Sanjay Patel	eb2211b352	[InstCombine] add tests for min/max with min/max of same operands; NFC llvm-svn: 369376	2019-08-20 12:49:03 +00:00
Roman Lebedev	e8f666f48d	[NFC][InstCombine] Some tests for 'shift amount reassoc in bit test - trunc-of-lshr' (PR42399) Finally, the fold i was looking forward to :) The legality check is muddy, i doubt i've groked the full generalization, but it handles all the cases i care about, and can come up with: https://rise4fun.com/Alive/26j https://bugs.llvm.org/show_bug.cgi?id=42399 llvm-svn: 369197	2019-08-17 21:35:33 +00:00
Sanjay Patel	a53ad0e157	Revert r367891 - "[InstCombine] combine mul+shl separated by zext" This reverts commit `5dbb90bfe1`. As noted in the post-commit thread for r367891, this can create a multiply that is lowered to a libcall that may not exist. We need to improve the backend decomposition for integer multiply before trying to re-land this (if it's still worthwhile after doing the backend work). llvm-svn: 369174	2019-08-16 23:36:28 +00:00
Roman Lebedev	515ad8fe4a	[InstCombine][NFC] reuse-constant-from-select-in-icmp.ll - check branch_weights too llvm-svn: 369166	2019-08-16 23:06:37 +00:00
Roman Lebedev	97176bd2bc	[InstCombine][NFC] Revisit tests in reuse-constant-from-select-in-icmp.ll llvm-svn: 369163	2019-08-16 22:40:06 +00:00
Sanjay Patel	39eb2324f7	[InstCombine] canonicalize a scalar-select-of-vectors to vector select This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755: https://bugs.llvm.org/show_bug.cgi?id=42755 ...but we should handle this pattern to make things easier for the backend either way. For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change to a vector select, so this is safe to do without a cost model (in other words, as a target-independent canonicalization). For example, if the condition of the select is a scalar, we end up with something like this on x86: vpcmpgtd %xmm0, %xmm1, %xmm0 vpextrb $12, %xmm0, %eax testb $1, %al jne LBB0_2 ## %bb.1: vmovaps %xmm3, %xmm2 LBB0_2: vmovaps %xmm2, %xmm0 Rather than the splat-condition variant: vpcmpgtd %xmm0, %xmm1, %xmm0 vpshufd $255, %xmm0, %xmm0 ## xmm0 = xmm0[3,3,3,3] vblendvps %xmm0, %xmm2, %xmm3, %xmm0 Differential Revision: https://reviews.llvm.org/D66095 llvm-svn: 369140	2019-08-16 18:51:30 +00:00
Evandro Menezes	05e9c2ac2e	[InstCombine] Simplify pow(2.0, itofp(y)) to ldexp(1.0, y) Simplify `pow(2.0, itofp(y))` to `ldexp(1.0, y)`. Differential revision: https://reviews.llvm.org/D65979 llvm-svn: 369120	2019-08-16 15:33:41 +00:00
Roman Lebedev	16244fccfe	[InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399) Summary: This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399 I thought naive pattern would solve my issue, but nope, it involved truncation, thus more folds needed.. This isn't really the fold i'm interested in, i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler. In this case, no extra legality checks are needed: https://rise4fun.com/Alive/CAb We should be careful about not increasing instruction count, since we need to produce `zext` because `and` is done in wider type. Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66057 llvm-svn: 369117	2019-08-16 15:10:41 +00:00
Florian Hahn	75be1a9e58	[ValueTracking] Fix recurrence detection to check both PHI operands. Summary: Currently we fail to compute known bits for recurrences where the first incoming value is the start value of the recurrence. Instead of exiting the loop when the first incoming value is not the step of the recurrence, continue to check the second incoming value. The original code uses a loop to handle both cases, but incorrectly exits instead of continuing. Reviewers: lebedev.ri, spatel, nikic Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66216 llvm-svn: 369088	2019-08-16 09:15:02 +00:00
David Bolvansky	00782a4b68	[NFC] Added tests for 'select with ctlz to cttz' fold llvm-svn: 369032	2019-08-15 18:23:37 +00:00
Florian Hahn	1bd898989c	[InstCombine] Precommit test case for D66216 llvm-svn: 368978	2019-08-15 08:42:12 +00:00
Roman Lebedev	04ddff4cbc	[InstCombine][NFC] Tests for 'try to reuse constant from select in comparison' https://rise4fun.com/Alive/THl llvm-svn: 368886	2019-08-14 17:27:50 +00:00
David Bolvansky	f94460d4b6	[SLC] Dereferenceable annonation - handle valid null pointers Reviewers: jdoerfert, reames Reviewed By: jdoerfert Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66161 llvm-svn: 368884	2019-08-14 17:15:20 +00:00
David Bolvansky	0e0fbae1a4	[BuildLibCalls] Noalias annotation Summary: I think this is better solution than annotating callsites in IC/SLC. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66217 llvm-svn: 368875	2019-08-14 16:50:06 +00:00
Roman Lebedev	2faafc6e4f	[InstCombine][NFC] Autogenerate checks in adjust-for-minmax.ll Being affected by WIP patch. llvm-svn: 368807	2019-08-14 08:12:20 +00:00
David Bolvansky	038d604f4f	[SimplifyLibCalls] Add noalias from known callsites Summary: Should be fine for memcpy, strcpy, strncpy. Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66135 llvm-svn: 368724	2019-08-13 17:18:46 +00:00
Nikita Popov	2a4f26b4c2	[ValueTracking] Improve reverse assumption inference Use isGuaranteedToTransferExecutionToSuccessor() instead of isSafeToSpeculativelyExecute() when seeing whether we can propagate the information in an assume backwards in isValidAssumeForContext(). The latter is more general - it also allows arbitrary loads/stores - and is also the condition we want: if our assume is guaranteed to execute, its condition not holding would be UB. Original patch by arielb1. Differential Revision: https://reviews.llvm.org/D37215 llvm-svn: 368723	2019-08-13 17:15:42 +00:00
David Bolvansky	dde10cd7a9	[NFC] Revisited/updated tests llvm-svn: 368722	2019-08-13 17:07:02 +00:00
David Bolvansky	90a30fdcc3	[SLC] Improve dereferenceable bytes annotation llvm-svn: 368715	2019-08-13 16:44:16 +00:00
Roman Lebedev	73f702ff19	[InstCombine] Non-canonical clamp-like pattern handling Summary: Given a pattern like: ``` %old_cmp1 = icmp slt i32 %x, C2 %old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high %old_x_offseted = add i32 %x, C1 %old_cmp0 = icmp ult i32 %old_x_offseted, C0 %r = select i1 %old_cmp0, i32 %x, i32 %old_replacement ``` it can be rewritten as more canonical pattern: ``` %new_cmp1 = icmp slt i32 %x, -C1 %new_cmp2 = icmp sge i32 %x, C0-C1 %new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x %r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low ``` Iff `-C1 s<= C2 s<= C0-C1` Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result) Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result) If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses. NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting. So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants: \| \| ULT \| UGE \| UGT \| \| SLT \| https://rise4fun.com/Alive/yIJ \| https://rise4fun.com/Alive/5BfN \| https://rise4fun.com/Alive/INH \| \| SGE \| https://rise4fun.com/Alive/hd8 \| https://rise4fun.com/Alive/Abk \| https://rise4fun.com/Alive/PlzS \| \| SGT \| https://rise4fun.com/Alive/VYG \| https://rise4fun.com/Alive/oMY \| https://rise4fun.com/Alive/KrzC \| {F9730206} This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch. This patch requires D65530. Reviewers: spatel, nikic, xbolva00, dmgreen Reviewed By: spatel Subscribers: hiraditya, llvm-commits, dmgreen Tags: #llvm Differential Revision: https://reviews.llvm.org/D65765 llvm-svn: 368687	2019-08-13 12:49:28 +00:00
Roman Lebedev	2635c324da	[InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible Summary: This is rather unconventional.. As the comment there says, we don't have much folds for xor-of-icmps, we try to turn them into an and-of-icmps, for which we have plenty of folds. But if the ICmp we need to invert is not single-use - we give up. As discussed in https://reviews.llvm.org/D65148#1603922, we may have a non-canonical CLAMP pattern, with bit match and select-of-threshold that we'll potentially clamp. As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`, out of all 8 variations of the pattern, only two are not canonicalized into the variant with and+icmp instead of bit math. The reason is because the ICmp we need to invert is not single-use - we give up. We indeed can't perform this fold at will, the general rule is that we should not increase instruction count in InstCombine, But we wouldn't end up increasing instruction count if we can adapt every other user to the inverted value. This way the `not` we create will get folded, and in the end the instruction count did not increase. For that, of course, we need to look at the users of a Value, which is again rather unconventional for InstCombine :S Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`. The alternatives would be to not create that `not`, but add duplicate code to manually invert all users; or to add some even less general combine to handle some more specific pattern[s]. Reviewers: spatel, nikic, RKSimon, craig.topper Reviewed By: spatel Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65530 llvm-svn: 368685	2019-08-13 12:49:06 +00:00
David Bolvansky	39130314fe	[SimplifyLibCalls] Add dereferenceable bytes from known callsites Summary: int mm(char a, char b) { return memcmp(a,b,16); } Currently: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16) ret i32 %call } After patch: define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 { entry: %call = tail call i32 @memcmp(i8* dereferenceable(16) %a, i8* dereferenceable(16) %b, i64 16) ret i32 %call } Reviewers: jdoerfert, efriedma Reviewed By: jdoerfert Subscribers: javed.absar, spatel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66079 llvm-svn: 368657	2019-08-13 09:11:49 +00:00
Roman Lebedev	09eb71ced3	[NFC][InstCombine] Non-canonical clamp pattern: non-canonical predicate tests We can't handle 'uge' case because we can't ever get it, there needs to be extra use on that compare or else it will be canonicalized, but because of extra use we can't handle it. 'sge' case we can have. llvm-svn: 368656	2019-08-13 08:14:13 +00:00
Sanjay Patel	24a9e86849	[InstCombine] add tests for scalar-select-of-vectors; NFC llvm-svn: 368583	2019-08-12 15:21:11 +00:00
David Bolvansky	20d37fab82	[InstCombine] x /c fabs(x) -> copysign(1.0, x) Summary: x / fabs(x) -> copysign(1.0, x) fabs(x) / x -> copysign(1.0, x) Reviewers: spatel, foad, RKSimon, efriedma Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65898 llvm-svn: 368570	2019-08-12 13:43:35 +00:00
Roman Lebedev	ccdad6ef48	[InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962) Instead of matching value and then blindly casting to BinaryOperator just to get the opcode, just match instruction and do no cast. Fixes https://bugs.llvm.org/show_bug.cgi?id=42962 llvm-svn: 368554	2019-08-12 11:28:02 +00:00
Roman Lebedev	404e978f27	[NFC][InstCombine] Tests for shift amount reassociation in bittest with truncated shl (PR42399) trunc-of-shl: https://rise4fun.com/Alive/zGx https://rise4fun.com/Alive/sl0L I.e. no extra legality check needed. https://bugs.llvm.org/show_bug.cgi?id=42399 llvm-svn: 368520	2019-08-10 19:29:03 +00:00
Roman Lebedev	a8d20b4467	[InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant If one of the values being shifted is a constant, since the new shift amount is known-constant, the new shift will end up being constant-folded so, we don't need that one-use restriction then. llvm-svn: 368519	2019-08-10 19:28:54 +00:00
Roman Lebedev	64fe806c4e	[InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction That one-use restriction is not needed for correctness - we have already ensured that one of the shifts will go away, so we know we won't increase the instruction count. So there is no need for that restriction. llvm-svn: 368518	2019-08-10 19:28:44 +00:00
Roman Lebedev	45e9990c02	[NFC][InstCombine] Tests for shift amount reassociation in bittest with shift of const llvm-svn: 368517	2019-08-10 19:28:12 +00:00
David Bolvansky	f6a5699392	[NFC] Added tests for D65898 llvm-svn: 368447	2019-08-09 15:52:26 +00:00
David Bolvansky	2689ed0f9d	[InstCombine][NFC] Added comments about constants in tests for pow->exp2 fold llvm-svn: 368360	2019-08-08 22:37:51 +00:00
David Bolvansky	ae154d00b4	[NFC] Fixed newly added tests llvm-svn: 368201	2019-08-07 19:36:46 +00:00
David Bolvansky	f8183d64de	[NFC] Added tests for x/fabs(X) fold llvm-svn: 368200	2019-08-07 19:35:25 +00:00
Jay Foad	7d4ab7751d	[InstCombine] Add a TODO comment llvm-svn: 368176	2019-08-07 15:18:34 +00:00
Jay Foad	8e8b295835	[InstCombine] Propagate fast math flags through selects Summary: In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the outer op into both arms of the new select, to take advantage of simplifications that require fast math flags. Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00 Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65658 llvm-svn: 368175	2019-08-07 15:16:28 +00:00
Roman Lebedev	9bece444dd	[InstCombine] Recommit: Shift amount reassociation: shl-trunc-shl pattern This was initially committed in r368059 but got reverted in r368084 because there was a faulty logic in how the shift amounts type mismatch was being handled (it simply wasn't). I've added an explicit bailout before we SimplifyAddInst() - i don't think it's designed in general to handle differently-typed values, even though the actual problem only comes from ConstantExpr's. I have also changed the common type deduction, to not just blindly look past zext, but try to do that so that in the end types match. Differential Revision: https://reviews.llvm.org/D65380 llvm-svn: 368141	2019-08-07 09:41:50 +00:00
Reid Kleckner	e4bd38478b	Revert [InstCombine] Shift amount reassociation: shl-trunc-shl pattern This reverts r368059 (git commit `0f95710976`) This caused Clang to assert while self-hosting and compiling SystemZInstrInfo.cpp. Reduction is running. llvm-svn: 368084	2019-08-06 20:32:07 +00:00
Roman Lebedev	0f95710976	[InstCombine] Shift amount reassociation: shl-trunc-shl pattern Summary: Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles two shifts one after another. If the shifts are `shl`, we still can easily perform the fold, with no extra legality checks: https://rise4fun.com/Alive/OQbM If we have right-shift however, we won't be able to make it any simpler than it already is. After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`) * If it's a logical shift, then constant-fold to `0` (not `undef`) * If it's a `ashr`, then a splat of original signbit https://rise4fun.com/Alive/E1K https://rise4fun.com/Alive/i0V Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65380 llvm-svn: 368059	2019-08-06 17:03:40 +00:00
Sanjay Patel	efc24d9d6f	[InstCombine] add tests for binop with FMF with select operands; NFC Baseline coverage for D65658. llvm-svn: 368028	2019-08-06 13:19:13 +00:00
Roman Lebedev	76b772f9ce	[InstCombine][NFC] Tests for non-canonical clamp-like pattern As discussed in https://reviews.llvm.org/D65148#1607019 The canonical fold is: https://rise4fun.com/Alive/FKe llvm-svn: 367897	2019-08-05 18:01:22 +00:00
Sanjay Patel	5dbb90bfe1	[InstCombine] combine mul+shl separated by zext This appears to slightly help patterns similar to what's shown in PR42874: https://bugs.llvm.org/show_bug.cgi?id=42874 ...but not in the way requested. That fix will require some later IR and/or backend pass to decompose multiply/shifts into something more optimal per target. Those transforms already exist in some basic forms, but probably need enhancing to catch more cases. https://rise4fun.com/Alive/Qzv2 llvm-svn: 367891	2019-08-05 16:59:58 +00:00
Sanjay Patel	4b9d66cf41	[InstCombine] add tests for shl+mul; NFC llvm-svn: 367883	2019-08-05 16:17:07 +00:00
Sanjay Patel	1a29823b9c	[InstCombine] add extra use constraint for shl-zext fold As the test shows, we can end up with more instructions than we started with if we don't include the extra-use check. llvm-svn: 367880	2019-08-05 16:04:07 +00:00
Sanjay Patel	d1c5d13470	[InstCombine] add test for shl-zext with extra use; NFC llvm-svn: 367876	2019-08-05 15:25:07 +00:00
David Bolvansky	e834e306cb	[InstCombine] Added mempcpy tests [NFC] llvm-svn: 367825	2019-08-05 09:58:32 +00:00
Sanjay Patel	9ce5f41851	[InstCombine] fold cmp+select using select operand equivalence As discussed in PR42696: https://bugs.llvm.org/show_bug.cgi?id=42696 ...but won't help that case yet. We have an odd situation where a select operand equivalence fold was implemented in InstSimplify when it could have been done more generally in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand. Here's an example: https://rise4fun.com/Alive/Xplr %cmp = icmp eq i32 %x, 2147483647 %add = add nsw i32 %x, 1 %sel = select i1 %cmp, i32 -2147483648, i32 %add => %sel = add i32 %x, 1 I've left the InstSimplify code in place for now, but my guess is that we'd prefer to remove that as a follow-up to save on code duplication and compile-time. Differential Revision: https://reviews.llvm.org/D65576 llvm-svn: 367695	2019-08-02 17:39:32 +00:00
Sanjay Patel	66ce04f261	[InstCombine] add tests with 'ne' predicates; NFC More coverage for the proposal in D65576. llvm-svn: 367579	2019-08-01 16:04:12 +00:00
Sanjay Patel	350b389c90	[InstCombine] add test with swapped select operands; NFC More coverage for the proposal in D65576. llvm-svn: 367577	2019-08-01 15:32:10 +00:00
Sanjay Patel	435cdecdf7	[InstCombine] canonicalize fneg before fmul/fdiv Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it easier to implement the transforms (and possibly other fneg transforms) in 1 place because we can always start the pattern match from fneg (either the legacy binop or the new unop). There's a secondary practical benefit seen in PR21914 and PR42681: https://bugs.llvm.org/show_bug.cgi?id=21914 https://bugs.llvm.org/show_bug.cgi?id=42681 ...hoisting fneg rather than sinking seems to play nicer with LICM in IR (although this change may expose analysis holes in the other direction). 1. The instcombine test changes show the expected neutral IR diffs from reversing the order. 2. The reassociation tests show that we were missing an optimization opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says that all of these transforms are allowed (regardless of binop/unop fneg version) because: "For all other operations [besides copy/abs/negate/copysign], this standard does not specify the sign bit of a NaN result." In all of these transforms, we always have some other binop (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a potential intermediate NaN operand. (If that interpretation is wrong, then we must already have a bug in the existing transforms?) 3. The clang tests shouldn't exist as-is, but that's effectively a revert of rL367149 (the test broke with an extension of the pre-existing fneg canonicalization in rL367146). Differential Revision: https://reviews.llvm.org/D65399 llvm-svn: 367447	2019-07-31 16:53:22 +00:00
Roman Lebedev	8d76284599	[NFC][InstCombine] Add xor-or-icmp tests with icmp having extra uses Currently InstCombiner::foldXorOfICmps() bailouts if the ICMP it wants to invert has extra uses. As it can be seen in the tests in previous commit, this is super unfortunate, this is the single pattern that is left non-canonicalized. We could analyze if we can also invert all the uses if said ICMP at the same time, thus not bailing out there. I'm not seeing any nicer alternative. llvm-svn: 367439	2019-07-31 15:20:33 +00:00
Roman Lebedev	67688af5f0	[NFC][InstCombine] Add baseline tests with non-canonical CLAMP pattern As disscussed in https://reviews.llvm.org/D65148#1603922 these would all need to be canonicalized to traditional clamp pattern. llvm-svn: 367438	2019-07-31 15:20:21 +00:00
Roman Lebedev	be612ea471	[InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two Summary: I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling. While we did happen to handle this fold in most of the cases, the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two), or first turn `x s% -y` to `x u% y`; that does handle most of the cases. But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`, and thus we end up being stuck with `(x s% INT_MIN) == 0`. There is no such restriction for the more general fold: https://rise4fun.com/Alive/IIeS To be noted, the fold does not enforce that `y` is a constant, so it may indeed increase instruction count. This is consistent with what `x u% y`->`x & (y-1)` already does. I think it makes sense, it's at most one (simple) extra instruction, while `rem`ainder is really much more un-simple (and likely very costly). Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65046 llvm-svn: 367322	2019-07-30 15:28:22 +00:00
Cameron McInally	b32a6592eb	[NFC][FPEnv] Pre-commit tests for canonicalize negated operand of fdiv. llvm-svn: 367233	2019-07-29 16:09:56 +00:00
Sanjay Patel	e9ee7b47d4	[InstCombine] fold fadd+fneg with fdiv/fmul betweena The backend already does this via isNegatibleForFree(), but we may want to alter the fneg IR canonicalizations that currently exist, so we need to try harder to fold fneg in IR to avoid regressions. llvm-svn: 367227	2019-07-29 13:50:25 +00:00
Sanjay Patel	74c35bd6b0	[InstCombine] add tests for fadd with negated operand; NFC llvm-svn: 367222	2019-07-29 12:49:36 +00:00
Roman Lebedev	6ff633ddc4	[NFC][InstCombine] Revisit tests in shift-amount-reassociation-with-truncation-shl.ll llvm-svn: 367196	2019-07-28 21:31:58 +00:00
Sanjay Patel	99c57c6daf	[InstCombine] fold fsub+fneg with fdiv/fmul between The backend already does this via isNegatibleForFree(), but we may want to alter the fneg IR canonicalizations that currently exist, so we need to try harder to fold fneg in IR to avoid regressions. llvm-svn: 367194	2019-07-28 17:10:06 +00:00
Roman Lebedev	d5bc4b09f1	[NFC][InstCombine] Shift amount reassociation: can have trunc between shl's https://rise4fun.com/Alive/OQbM Not so simple for lshr/ashr, so those maybe later. https://bugs.llvm.org/show_bug.cgi?id=42391 llvm-svn: 367189	2019-07-28 13:13:46 +00:00
Sanjay Patel	d20a0fe203	[InstCombine] add tests for fsub with negated operand; NFC llvm-svn: 367156	2019-07-26 21:12:22 +00:00
Sanjay Patel	a9ab31558c	[InstCombine] canonicalize negated operand of fdiv This is a transform that we use with fmul, so use it for fdiv too for consistency. llvm-svn: 367146	2019-07-26 19:56:59 +00:00
Sanjay Patel	487e957775	[InstCombine] add tests for fdiv with negated operand; NFC llvm-svn: 367145	2019-07-26 19:44:53 +00:00
Sanjay Patel	c229cfeb7a	[InstCombine] remove flop from lerp patterns (Y * (1.0 - Z)) + (X * Z) --> Y - (Y * Z) + (X * Z) --> Y + Z * (X - Y) This is part of solving: https://bugs.llvm.org/show_bug.cgi?id=42716 Factoring eliminates an instruction, so that should be a good canonicalization. The potential conversion to FMA would be handled by the backend based on target capabilities. Differential Revision: https://reviews.llvm.org/D65305 llvm-svn: 367101	2019-07-26 11:19:18 +00:00
Sanjay Patel	8f15d40555	[InstCombine] add tests for lerp patterns (PR42716); NFC llvm-svn: 367069	2019-07-25 22:25:21 +00:00
Vlad Tsyrklevich	5d5a58317c	Revert "[InstCombine] try to narrow a truncated load" This reverts commit `bc4a63fd3c`, this is a speculative revert to fix a number of sanitizer bots (like sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2 compiler crashes, presumably due to a miscompile. llvm-svn: 367029	2019-07-25 15:37:57 +00:00
Sanjay Patel	bc4a63fd3c	[InstCombine] try to narrow a truncated load trunc (load X) --> load (bitcast X to narrow type) We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated load pattern can interfere with other instcombine transforms, so I'd like to allow the fold sooner. Example: https://bugs.llvm.org/show_bug.cgi?id=16739 ...in that report, we have bitcasts bracketing these ops, so those could get eliminated too. We've generally ruled out widening of loads early in IR ( LoadCombine - http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but that reasoning may not apply to narrowing if we can preserve information such as the dereferenceable range. Differential Revision: https://reviews.llvm.org/D64432 llvm-svn: 367011	2019-07-25 12:14:27 +00:00
Craig Topper	e9abc8177a	[InstCombine] Teach foldOrOfICmps to allow icmp eq MIN_INT/MAX to be part of a range comparision. Similar for foldAndOfICmps We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow it to merge with icmp ugt X, C. Similar for the other constants. We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there. Fixes PR42691. Differential Revision: https://reviews.llvm.org/D65017 llvm-svn: 366945	2019-07-24 20:57:29 +00:00
David Bolvansky	db913d9618	[InstCombine] Adjusted pow-exp tests for Windows [NFC] Summary: https://bugs.llvm.org/show_bug.cgi?id=42740 Reviewers: efriedma, hans Reviewed By: hans Subscribers: spatel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65220 llvm-svn: 366925	2019-07-24 17:01:20 +00:00
Matt Arsenault	0b7f226311	AMDGPU: Fix test after r366913 llvm-svn: 366916	2019-07-24 16:05:55 +00:00
Sanjay Patel	3624074426	[InstCombine] add tests for load narrowing; NFC Baseline results for D64432. llvm-svn: 366901	2019-07-24 12:44:21 +00:00
Roman Lebedev	402bf28ecc	[NFC][InstCombine] Fixup commutative/negative tests with icmp preds in @llvm.umul.with.overflow tests llvm-svn: 366802	2019-07-23 12:42:57 +00:00
Hideto Ueno	2d654df763	[AMDGPU][NFC] Simplify test file for amdgcn intrinsics Summary: Remove unchecked attribute in the call site and use FileCheck String Substitution for `convergent` check. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64901 llvm-svn: 366781	2019-07-23 06:48:47 +00:00
Roman Lebedev	77d37037f0	[InstCombine][NFC] Tests for canonicalization of unsigned multiply overflow check llvm-svn: 366748	2019-07-22 22:08:45 +00:00
Craig Topper	ee5dc7e7ad	[InstCombine] Add foldAndOfICmps test cases inspired by PR42691. icmp ne %x, INT_MIN can be treated similarly to icmp sgt %x, INT_MIN. icmp ne %x, INT_MAX can be treated similarly to icmp slt %x, INT_MAX. icmp ne %x, UINT_MAX can be treated similarly to icmp ult %x, UINT_MAX. We already treat icmp ne %x, 0 similarly to icmp ugt %x, 0 llvm-svn: 366662	2019-07-22 02:43:43 +00:00
Roman Lebedev	8a431874e9	[NFC][InstCombine] Add a few extra srem-by-power-of-two tests - extra uses llvm-svn: 366652	2019-07-21 09:05:49 +00:00
Roman Lebedev	a2dd672c5f	[NFC][InstCombine] Autogenerate a few tests llvm-svn: 366643	2019-07-20 21:34:00 +00:00
Roman Lebedev	056640f8b3	[NFC][InstCombine] Add srem-by-signbit tests - still can fold to bittest https://rise4fun.com/Alive/IIeS llvm-svn: 366642	2019-07-20 21:33:50 +00:00
Craig Topper	3a3c58f045	[InstCombine] Fix copy/paste mistake in the test cases I added for PR42691. NFC llvm-svn: 366614	2019-07-19 21:09:21 +00:00
Craig Topper	18230ecf7e	[InstCombine] Add test cases for PR42691. NFC llvm-svn: 366611	2019-07-19 20:48:52 +00:00
Roman Lebedev	9998585c47	[NFC][InstCombine] Tests for 'rem' formation from sub-of-mul-by-'div' (PR42673) https://rise4fun.com/Alive/8Rp https://bugs.llvm.org/show_bug.cgi?id=42673 llvm-svn: 366565	2019-07-19 11:29:18 +00:00
Roman Lebedev	882bf2a844	[NFC][InstCombine] Redundant masking before left-shift: tests with assume If the legality check is `(shiftNbits-maskNbits) s>= 0`, then we can simplify it to `shiftNbits u>= maskNbits`, which is easier to check for. However, currently switching the `dropRedundantMaskingOfLeftShiftInput()` to `SimplifyICmpInst()` does not catch these cases and regresses currently-handled cases, so i'll leave it as is for now. https://rise4fun.com/Alive/25P llvm-svn: 366564	2019-07-19 11:29:04 +00:00
Roman Lebedev	f2eb403144	[InstCombine] Dropping redundant masking before left-shift [5/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) Normally, the inner pattern is sign-extend, but for our purposes it's no different to other patterns: alive proofs: f: https://rise4fun.com/Alive/7U3 For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64524 llvm-svn: 366540	2019-07-19 08:26:58 +00:00
Roman Lebedev	441c9d6ca8	[InstCombine] Dropping redundant masking before left-shift [4/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: e: https://rise4fun.com/Alive/0FT For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64521 llvm-svn: 366539	2019-07-19 08:26:47 +00:00
Roman Lebedev	3c212ce305	[InstCombine] Dropping redundant masking before left-shift [3/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: d: https://rise4fun.com/Alive/I5Y For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64519 llvm-svn: 366538	2019-07-19 08:26:37 +00:00
Roman Lebedev	2ebe57386d	[InstCombine] Dropping redundant masking before left-shift [2/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`) alive proofs: c: https://rise4fun.com/Alive/RgJh For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64517 llvm-svn: 366537	2019-07-19 08:26:25 +00:00
Roman Lebedev	4422a1657c	[InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: b. `(x & (~(-1 << maskNbits))) << shiftNbits` All these patterns can be simplified to just: `x << ShiftShAmt` iff: b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)` alive proof: b: https://rise4fun.com/Alive/y8M For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Differential Revision: https://reviews.llvm.org/D64514 llvm-svn: 366536	2019-07-19 08:26:13 +00:00
Roman Lebedev	a5f0824eb5	[InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563) Summary: If we have some pattern that leaves only some low bits set, and then performs left-shift of those bits, if none of the bits that are left after the final shift are modified by the mask, we can omit the mask. There are many variants to this pattern: a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt` All these patterns can be simplified to just: `x << ShiftShAmt` iff: a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)` alive proof: a: https://rise4fun.com/Alive/wi9 Indeed, not all of these patterns are canonical. But since this fold will only produce a single instruction i'm really interested in handling even uncanonical patterns, since i have this general kind of pattern in hotpaths, and it is not totally outlandish for bit-twiddling code. For now let's start with patterns where both shift amounts are variable, with trivial constant "offset" between them, since i believe this is both simplest to handle and i think this is most common. But again, there are likely other variants where we could use ValueTracking/ConstantRange to handle more cases. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, huihuiz, xbolva00 Reviewed By: xbolva00 Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64512 llvm-svn: 366535	2019-07-19 08:25:43 +00:00
Nikita Popov	57190b3974	[InstCombine] Add assume context test; NFC Baseline test for D37215. llvm-svn: 366021	2019-07-14 15:55:32 +00:00
Sanjay Patel	22cc1030f6	Revert "[InstCombine] add tests for umin/umax via usub.sat; NFC" This reverts commit rL365999 / `0f6148df23`. The tests already exist in this file, and the hoped-for transform (mentioned in D62871) is invalid because of undef as discussed in D63060. llvm-svn: 366000	2019-07-13 13:16:46 +00:00
Sanjay Patel	0f6148df23	[InstCombine] add tests for umin/umax via usub.sat; NFC llvm-svn: 365999	2019-07-13 12:54:48 +00:00
David Bolvansky	af1b3185f5	[InstCombine] Fold select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y) to ashr (X, Y)) Summary: (select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y)) -> ashr (X, Y)) (select (icmp slt x, 1), ashr (X, Y), lshr (X, Y)) -> ashr (X, Y)) Fixes PR41173 Alive proof by @lebedev.ri (thanks) Name: PR41173 %cmp = icmp slt i32 %x, 1 %shr = lshr i32 %x, %y %shr1 = ashr i32 %x, %y %retval.0 = select i1 %cmp, i32 %shr1, i32 %shr => %retval.0 = ashr i32 %x, %y Optimization: PR41173 Done: 1 Optimization is correct! Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: nikic, craig.topper, llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D64285 llvm-svn: 365893	2019-07-12 11:31:16 +00:00
Huihui Zhang	7b4a59db1e	[InstCombine][NFCI] Add more test coverage to onehot_merge.ll Prep work for upcoming patch D64275. llvm-svn: 365828	2019-07-11 21:28:25 +00:00
David Bolvansky	5dca95bc4e	[NFC] Revisited tests for D64285 llvm-svn: 365815	2019-07-11 19:39:20 +00:00
Sanjay Patel	3487791fea	[InstCombine] don't move FP negation out of a constant expression -(X * ConstExpr) becomes X * (-ConstExpr), so don't reverse that and infinite loop. llvm-svn: 365774	2019-07-11 13:44:29 +00:00
David Bolvansky	e195a91d2d	[NFC] Updated tests for D64285 llvm-svn: 365765	2019-07-11 12:51:33 +00:00
David Bolvansky	e23be09e66	[InstCombine] Reorder recently added/improved pow transformations Changed cases are now faster with exp2. llvm-svn: 365758	2019-07-11 10:55:04 +00:00
Huihui Zhang	51f5079191	[InstCombine][NFCI] Add test coverage to onehot_merge.ll Prep work for upcoming patch D64275. llvm-svn: 365729	2019-07-11 04:56:37 +00:00
Johannes Doerfert	3ed286a388	Replace three "strip & accumulate" implementations with a single one This patch replaces the three almost identical "strip & accumulate" implementations for constant pointer offsets with a single one, combining the respective functionalities. The old interfaces are kept for now. Differential Revision: https://reviews.llvm.org/D64468 llvm-svn: 365723	2019-07-11 01:14:48 +00:00

... 3 4 5 6 7 ...

4729 Commits