Commit Graph

4729 Commits

Author SHA1 Message Date
Sanjay Patel 0845ac7331 [InstCombine] add another test for gep inbounds; NFC
llvm-svn: 374190
2019-10-09 17:52:26 +00:00
Roman Lebedev 7cdeac43e5 [InstCombine] Fold conditional sign-extend of high-bit-extract into high-bit-extract-with-signext (PR42389)
This can come up in Bit Stream abstractions.

The pattern looks big/scary, but it can't be simplified any further.
It only is so simple because a number of my preparatory folds had
happened already (shift amount reassociation / shift amount
reassociation in bit test, sign bit test detection).

Highlights:
* There are two main flavors: https://rise4fun.com/Alive/zWi
  The difference is add vs. sub, and left-shift of -1 vs. 1
* Since we only change the shift opcode,
  we can preserve the exact-ness: https://rise4fun.com/Alive/4u4
* There can be truncation after high-bit-extraction:
  https://rise4fun.com/Alive/slHc1   (the main pattern i'm after!)
  Which means that we need to ignore zext of shift amounts and of NBits.
* The sign-extending magic can be extended itself (in add pattern
  via sext, in sub pattern via zext. not the other way around!)
  https://rise4fun.com/Alive/NhG
  (or those sext/zext can be sinked into `select`!)
  Which again means we should pay attention when matching NBits.
* We can have both truncation of extraction and widening of magic:
  https://rise4fun.com/Alive/XTw
  In other words, i don't believe we need to have any checks on
  bitwidths of any of these constructs.

This is worsened in general by the fact that we may have `sext` instead
of `zext` for shift amounts, and we don't yet canonicalize to `zext`,
although we should. I have not done anything about that here.

Also, we really should have something to weed out `sub` like these,
by folding them into `add` variant.

https://bugs.llvm.org/show_bug.cgi?id=42389

llvm-svn: 373964
2019-10-07 20:53:27 +00:00
Roman Lebedev 3da71714cb [InstCombine][NFC] Tests for "conditional sign-extend of high-bit-extract" pattern (PR42389)
https://bugs.llvm.org/show_bug.cgi?id=42389

llvm-svn: 373963
2019-10-07 20:53:16 +00:00
Roman Lebedev c3b394ffba [InstCombine] dropRedundantMaskingOfLeftShiftInput(): propagate undef shift amounts
Summary:
When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`,
which means that for patterns a/b we'd assume that we must not produce
any bits for that channel, while in reality we simply didn't care
about that channel - i.e. we don't need to mask it.

Reviewers: spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68239

llvm-svn: 373960
2019-10-07 20:52:52 +00:00
Sanjay Patel aab8b3ab9c [InstCombine] fold fneg disguised as select+fmul (PR43497)
Extends rL373230 and solves the motivating bug (although in a narrow way):
https://bugs.llvm.org/show_bug.cgi?id=43497

llvm-svn: 373851
2019-10-06 14:15:48 +00:00
Sanjay Patel 61c22a83de [InstCombine] add fast-math-flags for better test coverage; NFC
llvm-svn: 373848
2019-10-06 13:19:05 +00:00
Sanjay Patel c38881a6b7 [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform (PR43501)
https://bugs.llvm.org/show_bug.cgi?id=43501
We can't declare a GEP 'inbounds' in general. But we may salvage that information if
we have known dereferenceable bytes on the source pointer.

Differential Revision: https://reviews.llvm.org/D68244

llvm-svn: 373847
2019-10-06 13:08:08 +00:00
Roman Lebedev fb5af8b9b9 [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0'
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
  https://rise4fun.com/Alive/kPg

llvm-svn: 373802
2019-10-04 22:16:22 +00:00
Roman Lebedev f304d4d185 [InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391)
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.

https://rise4fun.com/Alive/GEw
https://rise4fun.com/Alive/gN2O

In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.

llvm-svn: 373801
2019-10-04 22:16:11 +00:00
Roman Lebedev ae738641d5 [NFC][InstCombine] Autogenerate shift.ll test
llvm-svn: 373800
2019-10-04 22:15:57 +00:00
Roman Lebedev 007452532b [NFC][InstCombine] Autogenerate icmp-shr-lt-gt.ll test
llvm-svn: 373799
2019-10-04 22:15:49 +00:00
Roman Lebedev 3c56cc920f [NFC][InstCombine] Tests for bit test via highest sign-bit extract (w/ trunc) (PR43564)
https://rise4fun.com/Alive/x5IS

llvm-svn: 373798
2019-10-04 22:15:41 +00:00
Roman Lebedev 6a954748c8 [NFC][InstCombine] Tests for right-shift shift amount reassociation (w/ trunc) (PR43564, PR42391)
https://rise4fun.com/Alive/GEw

llvm-svn: 373797
2019-10-04 22:15:32 +00:00
Sanjay Patel 6e312388b6 [InstCombine] add tests for fneg disguised as fmul; NFC
llvm-svn: 373788
2019-10-04 20:54:14 +00:00
Kevin P. Neal 68b8052121 [FPEnv] Strict FP tests should use the requisite function attributes.
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.

This patch fixes this.

These tests have been tested against the IR verifier changes in D68233.

Reviewed by:	andrew.w.kaylor, cameron.mcinally, uweigand
Approved by:	andrew.w.kaylor
Differential Revision:	https://reviews.llvm.org/D67925

llvm-svn: 373761
2019-10-04 17:03:46 +00:00
Roman Lebedev c780645736 [NFC][InstCombine] Some tests for sub-of-negatible pattern
As we have previously estabilished, `sub` is an outcast,
and should be considered non-canonical iff it can be converted to `add`.

It can be converted to `add` if it's second operand can be negated.
So far we mostly only do that for constants and negation itself,
but we should be more through.

llvm-svn: 373597
2019-10-03 13:36:00 +00:00
Roman Lebedev ae3315af07 [InstCombine] Bypass high bit extract before variable sign-extension (PR43523)
https://rise4fun.com/Alive/8BY - valid for lshr+trunc+variable sext
https://rise4fun.com/Alive/7jk - the variable sext can be redundant

https://rise4fun.com/Alive/Qslu - 'exact'-ness of first shift can be preserver

https://rise4fun.com/Alive/IF63 - without trunc we could view this as
                                  more general "drop redundant mask before right-shift",
                                  but let's handle it here for now
https://rise4fun.com/Alive/iip - likewise, without trunc, variable sext can be redundant.

There's more patterns for sure - e.g. we can have 'lshr' as the final shift,
but that might be best handled by some more generic transform, e.g.
"drop redundant masking before right-shift" (PR42456)

I'm singling-out this sext patch because you can only extract
high bits with `*shr` (unlike abstract bit masking),
and i *know* this fold is wanted by existing code.

I don't believe there is much to review here,
so i'm gonna opt into post-review mode here.

https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373542
2019-10-02 23:02:12 +00:00
Roman Lebedev 29339149c3 [NFC][InstCombine] Add tests for 'variable sext of variable high bit extract' pattern (PR43523)
https://bugs.llvm.org/show_bug.cgi?id=43523

llvm-svn: 373541
2019-10-02 23:01:58 +00:00
David Bolvansky 6b45029676 [InstCombine] Transform bcopy to memmove
bcopy is still widely used mainly for network apps. Sadly, LLVM has no optimizations for bcopy, but there are some for memmove. 
Since bcopy == memmove, it is profitable to transform bcopy to memmove and use current optimizations for memmove for free here.

llvm-svn: 373537
2019-10-02 22:49:20 +00:00
Florian Hahn f2ffa7a1c0 [InstCombine] Precommit tests for D68265
llvm-svn: 373458
2019-10-02 12:32:37 +00:00
Roman Lebedev 053014f8f9 [InstCombine] Deal with -(trunc(X >>u 63)) -> trunc(X >>s 63)
Identical to it's trunc-less variant, just pretent-to hoist
trunc, and everything else still holds:
https://rise4fun.com/Alive/JRU

llvm-svn: 373364
2019-10-01 17:50:20 +00:00
Roman Lebedev 65144149d0 [InstCombine] Preserve 'exact' in -(X >>u 31) -> (X >>s 31) fold
https://rise4fun.com/Alive/yR4

llvm-svn: 373363
2019-10-01 17:50:09 +00:00
Roman Lebedev f273fc793a [NFC][InstCombine] (Better) tests for sign-bit-smearing pattern
https://rise4fun.com/Alive/JRU
https://rise4fun.com/Alive/yR4 <- we can preserve 'exact'

llvm-svn: 373362
2019-10-01 17:49:58 +00:00
David Bolvansky 4037582d6b Revert [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
Seems to be slower than memcpy + strlen.

llvm-svn: 373335
2019-10-01 13:19:04 +00:00
David Bolvansky 8fc6a1bf56 [InstCombine] sprintf(dest, "%s", str) -> memccpy(dest, str, 0, MAX)
llvm-svn: 373333
2019-10-01 13:03:10 +00:00
Evandro Menezes 110b1138ba [InstCombine] Expand the simplification of log()
Expand the simplification of special cases of `log()` to include `log2()`
and `log10()` as well as intrinsics and more types.

Differential revision: https://reviews.llvm.org/D67199

llvm-svn: 373261
2019-09-30 20:52:21 +00:00
Roman Lebedev 0205be8f12 [NFC][InstCombine] Redundant-left-shift-input-masking: add some more undef tests
llvm-svn: 373248
2019-09-30 19:15:51 +00:00
Sanjay Patel 712b7c2463 [InstCombine] fold negate disguised as select+mul
Name: negate if true
  %sel = select i1 %cond, i32 -1, i32 1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %m, i32 %x

  Name: negate if false
  %sel = select i1 %cond, i32 1, i32 -1
  %r = mul i32 %sel, %x
  =>
  %m = sub i32 0, %x
  %r = select i1 %cond, i32 %x, i32 %m

https://rise4fun.com/Alive/Nlh

llvm-svn: 373230
2019-09-30 17:02:26 +00:00
Sanjay Patel 8913882fa2 [InstCombine] add tests for negate disguised as mul; NFC
llvm-svn: 373222
2019-09-30 15:43:27 +00:00
Roman Lebedev 269f1bea0d [InstCombine] Simplify shift-by-sext to shift-by-zext
Summary:
This is valid for any `sext` bitwidth pair:
```
Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!
```

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:
```
int shl(int a, char b) {
    return a << b;
}
```
ends as
```
  %3 = sext i8 %1 to i32
  %4 = shl i32 %0, %3
```
https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

There is another problem here - we can only do the fold
if sext is one-use. But we can trivially have cases
where several shifts have the same sext shift amount.
This should be resolved, later.

Reviewers: spatel, nikic, RKSimon

Reviewed By: spatel

Subscribers: efriedma, hiraditya, nlopes, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68103

llvm-svn: 373106
2019-09-27 18:12:15 +00:00
Roman Lebedev 0956480459 [NFC][InstCombine] Revisit shift-by-signext tests
llvm-svn: 373055
2019-09-27 09:09:15 +00:00
Roman Lebedev 86b40b0bbf [InstCombine][NFC] Add tests for shift-by-signext
llvm-svn: 373013
2019-09-26 20:49:30 +00:00
Roman Lebedev d1ef2e48fb [InstCombine][NFC] Regenerate load-cmp.ll test
llvm-svn: 373012
2019-09-26 20:49:21 +00:00
David Bolvansky f1a5a93157 [NFC] Precommit tests for D68089
llvm-svn: 373006
2019-09-26 19:01:18 +00:00
Craig Topper 46721bb7f5 [InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop.
The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop.

The fix is to use m_Zero() to allow a vector of zeroes and undefs.

Differential Revision: https://reviews.llvm.org/D67977

llvm-svn: 373000
2019-09-26 17:20:50 +00:00
Bjorn Pettersson 163c54d288 [InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant
Summary:
Removing an assumption (assert) that the CmpInst already has been
simplified in getFlippedStrictnessPredicateAndConstant. Solution is
to simply bail out instead of hitting the assertion. Instead we
assume that any profitable rewrite will happen in the next iteration
of InstCombine.

The reason why we can't assume that the CmpInst already has been
simplified is that the worklist does not guarantee such an ordering.

Solves https://bugs.llvm.org/show_bug.cgi?id=43376

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68022

llvm-svn: 372972
2019-09-26 12:16:01 +00:00
Roman Lebedev a2fa03af3a [InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251)
https://rise4fun.com/Alive/0j9

llvm-svn: 372930
2019-09-25 22:59:59 +00:00
Roman Lebedev ca524621d1 [NFC][InstCombine] Tests for 'base u<= offset && (base - offset) != 0' pattern (PR43251)
llvm-svn: 372929
2019-09-25 22:59:48 +00:00
Roman Lebedev 914a3d1cf2 [InstSimplify] Handle more 'A </>/>=/<= B &&/|| (A - B) !=/== 0' patterns (PR43251)
https://rise4fun.com/Alive/sl9s
https://rise4fun.com/Alive/2plN

https://bugs.llvm.org/show_bug.cgi?id=43251

llvm-svn: 372928
2019-09-25 22:59:41 +00:00
Florian Hahn d663efe23a [InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul
Because we do not constant fold multiplications in SimplifyFMAMul,
we match 1.0 and 0.0 for both operands, as multiplying by them
is guaranteed to produce an exact result (if it is allowed to do so).

Note that it is not enough to just swap the operands to ensure a
constant is on the RHS, as we want to also cover the case with
2 constants.

Reviewers: lebedev.ri, spatel, reames, scanon

Reviewed By: lebedev.ri, reames

Differential Revision: https://reviews.llvm.org/D67553

llvm-svn: 372915
2019-09-25 19:33:26 +00:00
Roman Lebedev 23646952e2 [InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0
https://rise4fun.com/Alive/KtL

This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.

This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"

llvm-svn: 372912
2019-09-25 19:06:40 +00:00
Roman Lebedev dfda7d2d90 [NFC][InstCombine] Add tests for (X - Y) < X --> Y <= X iff Y != 0
https://rise4fun.com/Alive/KtL
This should go to InstCombiner::foldICmpBinO(), next to
"Convert sub-with-unsigned-overflow comparisons into a comparison of args."

llvm-svn: 372911
2019-09-25 19:06:26 +00:00
Florian Hahn f3ab99dcf8 [InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Reviewers: spatel, lebedev.ri, reames, scanon

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67434

llvm-svn: 372899
2019-09-25 17:03:20 +00:00
Philip Reames d9629b88ff [GCRelocate] Add a peephole to canonicalize base pointer relocation
If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.

llvm-svn: 372771
2019-09-24 17:24:16 +00:00
Roman Lebedev 45fd1e9d50 [InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

For
```
#include <cassert>
char* test(char& base, signed long offset) {
  __builtin_assume(offset < 0);
  return &base + offset;
}
```
We produce

https://godbolt.org/z/r40U47

and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
  %adjusted = add i8 %base, C
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ult i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, C
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALap
https://rise4fun.com/Alive/slnN

There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67849

llvm-svn: 372768
2019-09-24 16:10:50 +00:00
Roman Lebedev 5b881f356c [InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.

```
Name: 0
  %adjusted = add i8 %base, %offset
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ule i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, %offset
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/knp

There are 3 other variants of this pattern,
they all will go into InstSimplify:
https://rise4fun.com/Alive/bIDZ

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: hiraditya, majnemer, vsk, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67846

llvm-svn: 372767
2019-09-24 16:10:38 +00:00
Huihui Zhang a4dd98f2e9 [InstCombine] Fold a shifty implementation of clamp-to-allones.
Summary:
Fold
or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? -1 : X

https://rise4fun.com/Alive/d8Ab

clamp255 is a common operator in image processing, can be implemented
in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select
enables more optimization, e.g., vmin generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67800

llvm-svn: 372678
2019-09-24 00:30:09 +00:00
Huihui Zhang 8952199715 [InstCombine] Fold a shifty implementation of clamp-to-zero.
Summary:
Fold
and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? X : 0

https://rise4fun.com/Alive/lFH

Fold shift into select enables more optimization,
e.g., vmax generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67799

llvm-svn: 372676
2019-09-24 00:15:03 +00:00
Huihui Zhang 5b5f1c8efd [NFC][InstCombine] Add tests for shifty implementation of clamping.
Summary:
Clamp negative to zero and clamp positive to allOnes are common
operation in image saturation.

Add tests for shifty implementation of clamping, as prepare work for
folding:

and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> 0 ? X : 0;

or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> Y ? allOnes : X.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67798

llvm-svn: 372671
2019-09-23 23:48:32 +00:00
David Bolvansky 48db0272d6 [InstCombine] Annotate strndup calls with dereferenceable_or_null
"Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size."

llvm-svn: 372647
2019-09-23 19:55:45 +00:00
David Bolvansky 8d52016155 [SLC] Convert some strndup calls to strdup calls
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).

strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.

Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67679

llvm-svn: 372636
2019-09-23 18:20:01 +00:00
Roman Lebedev 0a51e1f66d [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563)
Summary:
If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`,
we already know (have a fold) that will drop the `& (-1 >> maskNbits)`
mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`).

So even if `(shiftNbits-maskNbits) s< 0`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: c, normal+mask
  %t0 = lshr i32 -1, C1
  %t1 = and i32 %t0, %x
  %r = shl i32 %t1, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = i32 ((-(C2-C1))+32)
  %n2 = zext i32 %n1 to i64
  %n3 = lshr i64 -1, %n2
  %n4 = trunc i64 %n3 to i32
  %r = and i32 %n0, %n4
```
https://rise4fun.com/Alive/gslRa

Naturally, old `%masked` will have to be one-use.
This is not valid for pattern f - where "masking" is done via `ashr`.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67725

llvm-svn: 372630
2019-09-23 17:04:28 +00:00
Roman Lebedev b4a1d8a84c [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
Summary:
And this is **finally** the interesting part of that fold!

If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`,
we already know (have a fold) that will drop the `& (~(-1 << maskNbits))`
mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`.
But that is actually ignorant, there's more general fold here:

In this pattern, `(maskNbits+shiftNbits)` actually correlates
with the number of low bits that will remain in the final value.
So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: a, normal+mask
  %onebit = shl i32 -1, C1
  %mask = xor i32 %onebit, -1
  %masked = and i32 %mask, %x
  %r = shl i32 %masked, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = add i32 C1, C2
  %n2 = zext i32 %n1 to i64
  %n3 = shl i64 -1, %n2
  %n4 = xor i64 %n3, -1
  %n5 = trunc i64 %n4 to i32
  %r = and i32 %n0, %n5
```
https://rise4fun.com/Alive/F5R

Naturally, old `%masked` will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67677

llvm-svn: 372629
2019-09-23 17:04:14 +00:00
Sanjay Patel eb8d39e113 [InstCombine] allow icmp+binop folds before min/max bailout (PR43310)
This has the potential to uncover missed analysis/folds as shown in the
min/max code comment/test, but fewer restrictions on icmp folds should
be better in general to solve cases like:
https://bugs.llvm.org/show_bug.cgi?id=43310

llvm-svn: 372510
2019-09-22 14:31:53 +00:00
Sanjay Patel d2a524288d [InstCombine] add tests for icmp fold hindered by min/max; NFC
llvm-svn: 372509
2019-09-22 14:23:22 +00:00
Roman Lebedev 081eebc58f [NFC][InstCombine] Fixup newly-added tests
llvm-svn: 372413
2019-09-20 17:43:46 +00:00
Roman Lebedev d21087af95 [InstCombine] Tests for (a+b)<=a && (a+b)!=0 fold (PR43259)
https://rise4fun.com/Alive/knp
https://rise4fun.com/Alive/ALap

llvm-svn: 372402
2019-09-20 15:06:47 +00:00
Roman Lebedev 7a67ed5795 [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = sub i64 %base_int, %offset
  call void @use64(i64 %adjusted)
  %not_null = icmp ne i64 %adjusted, 0
  %no_underflow = icmp ule i64 %adjusted, %base_int
  %no_underflow_and_not_null = and i1 %not_null, %no_underflow
  ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`

Alive proofs:
https://rise4fun.com/Alive/QOs

The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.

https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00, majnemer

Reviewed By: xbolva00, majnemer

Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67356

llvm-svn: 372341
2019-09-19 17:25:19 +00:00
Roman Lebedev b646dd92c2 [InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251)
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.

This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.

The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..

Proofs: https://rise4fun.com/Alive/21b

Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)

With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67412

llvm-svn: 372257
2019-09-18 20:10:07 +00:00
Roman Lebedev 8b719a3b8a [NFC][InstCombine] More tests for PR42563 "Dropping pointless masking before left shift"
For patterns c/d/e we too can deal with the pattern even if we can't
just drop the mask, we can just apply it afterwars:
   https://rise4fun.com/Alive/gslRa

llvm-svn: 372244
2019-09-18 18:38:32 +00:00
Sanjay Patel d46bf63fbb [SimplifyLibCalls] fix crash with empty function name (PR43347)
...and improve some variable names while here.

https://bugs.llvm.org/show_bug.cgi?id=43347

llvm-svn: 372227
2019-09-18 14:33:40 +00:00
Roman Lebedev bed6e08e23 [NFC][InstCombine] More tests for "Dropping pointless masking before left shift" (PR42563)
While we already fold that pattern if the sum of shift amounts is not
smaller than bitwidth, there's painfully obvious generalization:
  https://rise4fun.com/Alive/F5R
I.e. the "sub of shift amounts" tells us how many bits will be left
in the output. If it's less than bitwidth, we simply need to
apply a mask, which is constant.

llvm-svn: 372170
2019-09-17 19:32:11 +00:00
David Bolvansky 0c0de794f1 Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
llvm-svn: 372142
2019-09-17 17:12:24 +00:00
Krasimir Georgiev bdff164e0e Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
Summary:
This reverts commit r372101.

Causes ASAN build bot failures:

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176
From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio:

```
[ RUN      ] AddressSanitizer.StrNCatOOBTest
/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure
Death test: strncat(to - 1, from, 0)
    Result: failed to die.
```

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67658

llvm-svn: 372125
2019-09-17 14:15:23 +00:00
David Bolvansky ded48e93e6 [SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)
llvm-svn: 372101
2019-09-17 10:25:38 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
David Bolvansky e80fcf0340 [SimplifyLibCalls] Mark known arguments with nonnull
Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D53342

llvm-svn: 372091
2019-09-17 09:32:52 +00:00
Sanjay Patel 3961a143e1 [InstCombine] remove unneeded one-use checks for icmp fold
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698

But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 and rL371981 are related patches in this series.

llvm-svn: 372007
2019-09-16 16:15:25 +00:00
Sanjay Patel 4d9d0f9cf5 [InstCombine] move tests for icmp+add; NFC
llvm-svn: 372004
2019-09-16 15:33:40 +00:00
Sanjay Patel f201b1c918 [InstCombine] add/move tests for icmp with add operand; NFC
llvm-svn: 371988
2019-09-16 14:05:19 +00:00
Sanjay Patel c5cd808156 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734 <https://reviews.llvm.org/rL125734>
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 is a related patch in this series.

llvm-svn: 371981
2019-09-16 12:54:34 +00:00
Sanjay Patel 14ce3fde04 [InstCombine] add icmp tests with extra uses; NFC
llvm-svn: 371979
2019-09-16 12:19:18 +00:00
Sanjay Patel 3daf168fa9 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

There are similar checks as noted with the TODO comments. I'm
hoping to remove those restrictions too, but if any of these
does cause a regression, it should be easier to correct by making
small, individual commits.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

llvm-svn: 371940
2019-09-15 20:56:34 +00:00
Sanjay Patel c77ad16f8e [InstCombine] add icmp tests with extra uses; NFC
llvm-svn: 371939
2019-09-15 20:13:27 +00:00
Evandro Menezes 08df6e64d5 [ConstantFolding] Expand folding of some library functions
Expanding the folding of `nearbyint()`, `rint()` and `trunc()` to library
functions, in addition to the current support for intrinsics.

Differential revision: https://reviews.llvm.org/D67468

llvm-svn: 371774
2019-09-12 21:23:22 +00:00
Sanjay Patel 458c2759b1 [InstCombine] add tests for fptrunc; NFC
llvm-svn: 371750
2019-09-12 18:00:11 +00:00
Sanjay Patel 62ad62fb98 [InstCombine] reduce test noise and regenerate CHECK lines; NFC
llvm-svn: 371746
2019-09-12 17:07:01 +00:00
Roman Lebedev 80a8a85758 [InstCombine][InstSimplify] Move constant-folding tests in result-of-usub-is-non-zero-and-no-overflow.ll
llvm-svn: 371737
2019-09-12 14:12:31 +00:00
Roman Lebedev b3e0937f0a [NFC][InstCombine][InstSimplify] Add test for "add-of-negative is non-zero and no overflow" (PR43259)
https://rise4fun.com/Alive/ska
https://rise4fun.com/Alive/9iX

https://bugs.llvm.org/show_bug.cgi?id=43259

llvm-svn: 371736
2019-09-12 14:12:20 +00:00
Florian Hahn 51de22c8ee Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This introduces additional rounding error in some cases. See D67434.

This reverts r371518 (git commit 18a1f0818b)

llvm-svn: 371634
2019-09-11 16:17:03 +00:00
Sanjay Patel 80bea345d1 [InstCombine] fold sign-bit compares of srem
(srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking
off the sign bit and the module (low) bits:
https://rise4fun.com/Alive/jSO
A '2' divisor allows slightly more folding:
https://rise4fun.com/Alive/tDBM

Any chance to remove an 'srem' use is probably worthwhile, but this is limited
to the one-use improvement case because doing more may expose other missing
folds. That means it does nothing for PR21929 yet:
https://bugs.llvm.org/show_bug.cgi?id=21929

Differential Revision: https://reviews.llvm.org/D67334

llvm-svn: 371610
2019-09-11 12:04:26 +00:00
David Bolvansky af5ba2873f [NFC] Updated objsize-64.ll test
llvm-svn: 371604
2019-09-11 10:51:26 +00:00
David Bolvansky 57ebb50a0a [NFC] Fixed test
llvm-svn: 371603
2019-09-11 10:42:30 +00:00
David Bolvansky 4dae283cd3 [InstCombine] Fixed handling of isOpNewLike (PR11748)
llvm-svn: 371602
2019-09-11 10:37:03 +00:00
Tim Renouf c26b3940c3 [TLI][AMDGPU] AMDPAL does not have library functions
Configure TLI to say that r600/amdgpu does not have any library
functions, such that InstCombine does not do anything like turn sin/cos
into the library function @tan with sufficient fast math flags.

Differential Revision: https://reviews.llvm.org/D67406

Change-Id: I02f907d3e64832117ea9800e9f9285282856e5df
llvm-svn: 371592
2019-09-11 07:26:39 +00:00
Alexey Lapshin 6b1c6c1287 [Debuginfo][Instcombiner] Do not clone dbg.declare.
TryToSinkInstruction() has a bug: While updating debug info for
sunk instruction, it could clone dbg.declare intrinsic.
That is wrong. There could be only one dbg.declare.
The fix is to not clone dbg.declare intrinsic and to update
it`s arguments, to not to point to sunk instruction.

Differential Revision: https://reviews.llvm.org/D67217

llvm-svn: 371587
2019-09-11 06:07:16 +00:00
Roman Lebedev 16f5605382 [NFC][InstCombine] rewrite test added in r371537 to use non-null pointer instead
I only want to ensure that %offset is non-zero there,
it doesn't matter how that info is conveyed.
As filed in PR43267, the assumption way does not work.

llvm-svn: 371550
2019-09-10 19:30:17 +00:00
Roman Lebedev 880657c97c [NFC][InstCombine][InstSimplify] PR43251 - and some patterns with offset != 0
https://rise4fun.com/Alive/21b

llvm-svn: 371537
2019-09-10 17:13:59 +00:00
Roman Lebedev 7dfd0fb7f1 [NFC][InstCombine] PR43251 - valid for other predicates too
llvm-svn: 371519
2019-09-10 13:29:40 +00:00
Florian Hahn 18a1f0818b [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.

Reviewers: spatel, anemet, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D67351

llvm-svn: 371518
2019-09-10 13:10:28 +00:00
Florian Hahn 8886d0134e [InstCombine] Precommit tests for D67351.
llvm-svn: 371517
2019-09-10 13:05:34 +00:00
Roman Lebedev 59608c0049 [NFC][InstCombine] Fixup test i added in rL371352.
llvm-svn: 371401
2019-09-09 14:27:39 +00:00
Roman Lebedev 139a9d6c0e [InstCombine][NFC] Some tests for usub overflow+nonzero check improvement (PR43251)
https://rise4fun.com/Alive/kHq

https://bugs.llvm.org/show_bug.cgi?id=43251

llvm-svn: 371352
2019-09-08 21:30:34 +00:00
Sanjay Patel 354a46444c [InstCombine] add tests for icmp with srem operand; NFC
llvm-svn: 371348
2019-09-08 19:48:47 +00:00
Sanjay Patel aff5bee35f [InstCombine] fold extract+insert into identity shuffle
This is similar to the existing fold for splats added with:
rL365379

If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.

All targets are expected to lower shuffles with identity masks
efficiently.

llvm-svn: 371340
2019-09-08 19:03:01 +00:00
JF Bastien 52614dfc7f [InstCombine] pow(x, +/- 0.0) -> 1.0
Summary:
This isn't an important optimization at all... We're already doing:
  pow(x, 0.0) -> 1.0
My patch merely teaches instcombine that -0.0 does the same.

However, doing this fixes an AMAZING bug! Compile this program:

  extern "C" double pow(double, double);
  double boom(double base) {
    return pow(base, -0.0);
  }

With:
  clang++ ~/Desktop/fast-math.cpp -ffast-math -O2 -S

And clang will crash with a signal. Wow, fast math is so fast it ICEs the
compiler! Arguably, the generated math is infinitely fast.

What's actually happening is that we recurse infinitely in getPow. In debug we
hit its assertion:
  assert(Exp != 0 && "Incorrect exponent 0 not handled");

We avoid this entire mess if we instead recognize that an exponent of positive
and negative zero yield 1.0.

A separate commit, r371221, fixed the same problem. This only contains the added
tests.

<rdar://problem/54598300>

Reviewers: scanon

Subscribers: hiraditya, jkorous, dexonsmith, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67248

llvm-svn: 371224
2019-09-06 16:26:59 +00:00
Sanjay Patel 4f0e429acc [SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)
https://bugs.llvm.org/show_bug.cgi?id=43233

llvm-svn: 371221
2019-09-06 16:10:18 +00:00
Matt Arsenault 524a9d5774 InstCombine: Fix crash on icmp of gep with addrspacecasted null
llvm-svn: 371146
2019-09-05 23:39:21 +00:00
Roman Lebedev 071ce66729 [NFC][InstCombine] Overhaul 'unsigned add overflow' tests, ensure that all 3 patterns have full test coverage
llvm-svn: 371108
2019-09-05 19:13:15 +00:00
Roman Lebedev 8360c42e25 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.

A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
  so since the `icmp` here no longer refers to `sub`,
  reconstructing `usub.with.overflow` will be problematic,
  and will likely require standalone pass (similar to DivRemPairs).

https://rise4fun.com/Alive/Zqs

Name: (A - B) u> A --> B u> A
  %t0 = sub i8 %A, %B
  %r = icmp ugt i8 %t0, %A
=>
  %r = icmp ugt i8 %B, %A

Name: (A - B) u<= A --> B u<= A
  %t0 = sub i8 %A, %B
  %r = icmp ule i8 %t0, %A
=>
  %r = icmp ule i8 %B, %A

Name: C u< (C - D) --> C u< D
  %t0 = sub i8 %C, %D
  %r = icmp ult i8 %C, %t0
=>
  %r = icmp ult i8 %C, %D

Name: C u>= (C - D) --> C u>= D
  %t0 = sub i8 %C, %D
  %r = icmp uge i8 %C, %t0
=>
  %r = icmp uge i8 %C, %D

llvm-svn: 371101
2019-09-05 17:41:02 +00:00
Roman Lebedev ecb7ea1ae7 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check
A follow-up for r342004.
This will be changed to produce @llvm.add.with.overflow in a later patch,
but for now just make things more consistent overall.

https://rise4fun.com/Alive/qxE

Name: (Op1 + X) u< Op1 --> ~Op1 u< X
  %t0 = add i8 %Op1, %X
  %r = icmp ult i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp ult i8 %n, %X

Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X
  %t0 = add i8 %Op1, %X
  %r = icmp uge i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp uge i8 %n, %X

;-------------------------------------------------------------------------------

Name: Op0 u> (Op0 + X) --> X u> ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ugt i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ugt i8 %X, %n

Name: Op0 u<= (Op0 + X) --> X u<= ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ule i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ule i8 %X, %n

llvm-svn: 371100
2019-09-05 17:40:49 +00:00
Roman Lebedev 1d9e0dcc9d [InstCombine][NFC] Tests for 'unsigned sub overflow' check
----------------------------------------
Name: unsigned sub, overflow, v0
  %sub = sub i8 %x, %y
  %ov = icmp ugt i8 %sub, %x
=>
  %agg = usub_overflow i8 %x, %y
  %sub = extractvalue {i8, i1} %agg, 0
  %ov = extractvalue {i8, i1} %agg, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: unsigned sub, no overflow, v0
  %sub = sub i8 %x, %y
  %ov = icmp ule i8 %sub, %x
=>
  %agg = usub_overflow i8 %x, %y
  %sub = extractvalue {i8, i1} %agg, 0
  %not.ov = extractvalue {i8, i1} %agg, 1
  %ov = xor %not.ov, -1

Done: 1
Optimization is correct!

llvm-svn: 371099
2019-09-05 17:40:37 +00:00
Roman Lebedev 745046c23f [InstCombine][NFC] Tests for 'unsigned add overflow' check
----------------------------------------
Name: unsigned add, overflow, v0
  %add = add i8 %x, %y
  %ov = icmp ult i8 %add, %x
=>
  %agg = uadd_overflow i8 %x, %y
  %add = extractvalue {i8, i1} %agg, 0
  %ov = extractvalue {i8, i1} %agg, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: unsigned add, overflow, v1
  %add = add i8 %x, %y
  %ov = icmp ult i8 %add, %y
=>
  %agg = uadd_overflow i8 %x, %y
  %add = extractvalue {i8, i1} %agg, 0
  %ov = extractvalue {i8, i1} %agg, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: unsigned add, no overflow, v0
  %add = add i8 %x, %y
  %ov = icmp uge i8 %add, %x
=>
  %agg = uadd_overflow i8 %x, %y
  %add = extractvalue {i8, i1} %agg, 0
  %not.ov = extractvalue {i8, i1} %agg, 1
  %ov = xor %not.ov, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: unsigned add, no overflow, v1
  %add = add i8 %x, %y
  %ov = icmp uge i8 %add, %y
=>
  %agg = uadd_overflow i8 %x, %y
  %add = extractvalue {i8, i1} %agg, 0
  %not.ov = extractvalue {i8, i1} %agg, 1
  %ov = xor %not.ov, -1

Done: 1
Optimization is correct!

llvm-svn: 371098
2019-09-05 17:40:28 +00:00
Evandro Menezes bf78e39cbb [InstCombine] Add more test cases (NFC)
Add more test cases simplifying `log()`.

llvm-svn: 370966
2019-09-04 20:01:09 +00:00
David Bolvansky 420cbb6190 [InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Summary:
```
Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
%or = or i32 %y, %x
%xor = xor i32 %x, %y
%sub = sub i32 %xor, %or
  =>
%sub1 = and i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/8OI

Reviewers: lebedev.ri

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67188

llvm-svn: 370945
2019-09-04 18:03:21 +00:00
David Bolvansky f6233d90f0 [NFC] Added tests for new fold
llvm-svn: 370941
2019-09-04 17:37:06 +00:00
David Bolvansky 2ceb00db76 [NFC] Adjust test filename
llvm-svn: 370939
2019-09-04 17:33:53 +00:00
David Bolvansky 0e07248704 [InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B)
Summary:
```
Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %and, %or
  =>
%sub1 = xor i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/VI6

Found by @lebedev.ri. Also author of the proof.

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67155

llvm-svn: 370934
2019-09-04 17:30:53 +00:00
Alexey Lapshin cbf1f3b771 [Debuginfo][SROA] Need to handle dbg.value in SROA pass.
SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.

Differential Revision: https://reviews.llvm.org/D64595

llvm-svn: 370906
2019-09-04 14:19:49 +00:00
Sanjay Patel 791949afe5 [InstCombine] add tests for insert/extract with identity shuffles; NFC
llvm-svn: 370901
2019-09-04 13:38:49 +00:00
David Bolvansky b9e9478244 [NFC] Added a negative test for new fold
llvm-svn: 370890
2019-09-04 12:46:25 +00:00
David Bolvansky 13dadedc29 [NFC] Fixed test
llvm-svn: 370888
2019-09-04 12:43:14 +00:00
David Bolvansky 3747c48d64 [NFC] Adjust tests for new fold
llvm-svn: 370886
2019-09-04 12:22:28 +00:00
David Bolvansky 163b05b45d [NFC] Added tests for new fold
llvm-svn: 370885
2019-09-04 12:18:53 +00:00
David Bolvansky 358b80b340 [InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B)
Summary:
```
Name: sub or and to xor
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %or, %and
  =>
%sub = xor i32 %x, %y

Optimization: sub or and to xor
Done: 1
Optimization is correct!
```
https://rise4fun.com/Alive/eJu

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67153

llvm-svn: 370883
2019-09-04 12:00:33 +00:00
David Bolvansky 54f3a651f3 [NFC] Added a new test for D67153
llvm-svn: 370881
2019-09-04 11:44:00 +00:00
David Bolvansky 75d734475a [NFC] Added tests for 'SUB of OR and AND to XOR' fold
llvm-svn: 370878
2019-09-04 11:17:08 +00:00
Sanjay Patel 561c39994b [InstCombine] recognize bswap disguised as shufflevector
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})

In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.

This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.

On x86, we're trading something like this:

  vmovd	%edi, %xmm0
  vpshufb	LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
  vmovd	%xmm0, %eax

For:

  movl	%edi, %eax
  bswapl	%eax

Differential Revision: https://reviews.llvm.org/D66965

llvm-svn: 370659
2019-09-02 13:33:20 +00:00
David Bolvansky ff0ad3c43d [InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.

GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96

Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert

Reviewed By: efriedma

Subscribers: hjl.tools, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65737

llvm-svn: 370593
2019-08-31 18:19:05 +00:00
Piotr Sobczak 67b979466a [InstCombine][AMDGPU] Simplify tbuffer loads
Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

llvm-svn: 370475
2019-08-30 14:20:04 +00:00
Sanjay Patel 33541fafde [InstCombine] add possible bswap as widening shuffle test; NFC
Goes with the proposal in D66965.

llvm-svn: 370407
2019-08-29 20:57:50 +00:00
Sanjay Patel 63411910a2 [InstCombine] add tests for bswap disguised as shuffle; NFC
Somewhat motivating case In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146

But that's a lot more complicated.

llvm-svn: 370381
2019-08-29 16:48:00 +00:00
Roman Lebedev 473a063a5e [InstCombine] Fold '((%x * %y) u/ %x) != %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`((%x * %y) u/ %x) != %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
$ /repositories/alive2/build-Clang-unknown/alive -root-only ~/llvm-patch1.ll
Processing /home/lebedevri/llvm-patch1.ll..

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp ne i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %r = extractvalue {i4, i1} %n0, 1
  ret %r

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow
  %o0 = mul i4 %y, %x
  %o1 = udiv i4 %o0, %x
  %r = icmp eq i4 %o1, %y
  ret i1 %r
=>
  %n0 = umul_overflow i4 %x, %y
  %o0 = extractvalue {i4, i1} %n0, 0
  %o1 = udiv %o0, %x
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1
  ret i1 %r

Done: 1
Optimization is correct!

```

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65144

llvm-svn: 370348
2019-08-29 12:47:20 +00:00
Roman Lebedev fb38b7aab3 [InstCombine] Fold '(-1 u/ %x) u< %y' to '@llvm.umul.with.overflow' + overflow bit extraction
Summary:
`(-1 u/ %x) u< %y` is one of (3?) common ways to check that
some unsigned multiplication (will not) overflow.
Currently, we don't catch it. We could:
```
----------------------------------------
Name: no overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ult i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: no overflow, swapped
  %o0 = udiv i4 -1, %x
  %r = icmp ugt i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %n0, 1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp uge i4 %o0, %y
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!

----------------------------------------
Name: overflow
  %o0 = udiv i4 -1, %x
  %r = icmp ule i4 %y, %o0
=>
  %o0 = udiv i4 -1, %x
  %n0 = umul_overflow i4 %x, %y
  %n1 = extractvalue {i4, i1} %n0, 1
  %r = xor %n1, -1

Done: 1
Optimization is correct!
```

As it can be observed from tests, while simply forming the `@llvm.umul.with.overflow`
is easy, if we were looking for the inverted answer, then more work needs to be done
to cleanup the now-pointless control-flow that was guarding against division-by-zero.
This is being addressed in follow-up patches.

Reviewers: nikic, spatel, efriedma, xbolva00, RKSimon

Reviewed By: nikic, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65143

llvm-svn: 370347
2019-08-29 12:47:08 +00:00
Roman Lebedev f13b0e3ed8 [InstCombine] Shift amount reassociation in bittest: trunc-of-lshr (PR42399)
Summary:
Finally, the fold i was looking forward to :)

The legality check is muddy, i doubt  i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j

I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one

I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66383

llvm-svn: 370324
2019-08-29 10:26:23 +00:00
David Bolvansky 420327269e [NFC] Added more tests for D66651
llvm-svn: 370222
2019-08-28 16:00:15 +00:00
Craig Topper f79d8a064c [InstCombine] Disable recursion in foldGEPICmp for vector pointer GEPs
Due to missing vector support in this function, recursion can
generate worse code in some cases.

llvm-svn: 370221
2019-08-28 15:40:34 +00:00
David Bolvansky 207c653965 [NFC] Unbreak tests
llvm-svn: 370170
2019-08-28 08:42:40 +00:00
David Bolvansky a0a8dd225d [NFC] Updated test
llvm-svn: 370169
2019-08-28 08:40:45 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Craig Topper 5bbb604bb5 [InstCombine] Disable some portions of foldGEPICmp for GEPs that return a vector of pointers. Fix other portions.
llvm-svn: 370114
2019-08-27 21:38:56 +00:00
Craig Topper 33585ddf14 [Analysis] Improve EmitGEPOffset handling of vector GEPs with scalar indices.
This patch splats the scalar index if necessary before using it
in any integer casts or other arithmetic.

llvm-svn: 370112
2019-08-27 21:31:44 +00:00
David Bolvansky aec6884e88 [NFC] Added tests for D66651
llvm-svn: 370046
2019-08-27 11:41:03 +00:00
David Bolvansky 0c2692108c [InstCombine] Fold select with ctlz to cttz
Summary:
Handle pattern [0]:

int ctz(unsigned int a)
{
  int c = __clz(a & -a);
  return a ? 31 - c : c;
}

In reality, the compiler can generate much better code for cttz, so fold away this pattern.

https://godbolt.org/z/c5kPtV

 [0] https://community.arm.com/community-help/f/discussions/2114/count-trailing-zeros

Reviewers: spatel, nikic, lebedev.ri, dmgreen, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66308

llvm-svn: 370037
2019-08-27 10:22:40 +00:00
Vitaly Buka aeca56964f msan, codegen, instcombine: Keep more lifetime markers used for msan
Reviewers: eugenis

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D66695

llvm-svn: 369979
2019-08-26 22:15:50 +00:00
Philip Reames b92c971099 [InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null for vectors
Extend the transform introduced in https://reviews.llvm.org/D66608 to work for vector geps as well.

Differential Revision: https://reviews.llvm.org/D66671

llvm-svn: 369949
2019-08-26 19:11:49 +00:00
Roman Lebedev de19f749e0 [InstCombine] matchThreeWayIntCompare(): commutativity awareness
Summary:
`matchThreeWayIntCompare()` looks for
```
   select i1 (a == b),
          i32 Equal,
          i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.

Reviewers: spatel, nikic, efriedma, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66607

llvm-svn: 369841
2019-08-24 06:49:36 +00:00
Roman Lebedev 2c75fe7f2a [InstCombine] Try to reuse constant from select in leading comparison
Summary:
If we have e.g.:
```
  %t = icmp ult i32 %x, 65536
  %r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
  =>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb

While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.

But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
  %dont_need_to_clamp_positive = icmp sle i32 %X, 32767
  %dont_need_to_clamp_negative = icmp sge i32 %X, -32768
  %clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
  %dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
  %R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
  %1 = icmp slt i32 %X, 32768
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
  %1.inv = icmp sgt i32 %X, 32767
  %2 = icmp sgt i32 %X, -32768
  %3 = select i1 %2, i32 %X, i32 -32768
  %R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
  %t1 = icmp sgt i32 %X, -32768
  %t2 = select i1 %t1, i32 %X, i32 -32768
  %t3 = icmp slt i32 %t2, 32767
  %R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.

Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl

Reviewers: spatel, dmgreen, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66232

llvm-svn: 369840
2019-08-24 06:49:25 +00:00
Roman Lebedev b3eccc7f0b [InstCombine][NFC] reuse-constant-from-select-in-icmp.ll - revisit tests
llvm-svn: 369839
2019-08-24 06:49:11 +00:00
Vitaly Buka d60271a1ad NFC: Rename lifetime-asan.ll -> lifetime-sanitizer.ll
llvm-svn: 369831
2019-08-24 01:44:39 +00:00
Philip Reames 9cb059fdcc Fix a bug in just submitted rL369789
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly.  It's entertaining seeing what slips through review when we're focused on the 'hard' parts.  :(

Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.

llvm-svn: 369795
2019-08-23 18:27:57 +00:00
Philip Reames 5b02cfa0b3 [InstCombine] icmp eq/ne (gep inbounds P, Idx..), null -> icmp eq/ne P, null
This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another.

The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison.

Differential Revision: https://reviews.llvm.org/D66608

llvm-svn: 369789
2019-08-23 17:58:58 +00:00
Roman Lebedev dddc0fd9cb [NFC][InstCombine] Fixup few new tests in unrecognized_three-way-comparison.ll
llvm-svn: 369701
2019-08-22 20:34:56 +00:00
Peter Collingbourne 2452d7030b IR. Change strip* family of functions to not look through aliases.
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.

Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.

Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.

As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.

Differential Revision: https://reviews.llvm.org/D66606

llvm-svn: 369697
2019-08-22 19:56:14 +00:00
Roman Lebedev 1aeb27af22 [NFC][InstCombine] New tests: unrecognized_three-way-comparison.ll is ignorant about commutative variants part 2
llvm-svn: 369696
2019-08-22 19:53:23 +00:00
Roman Lebedev 41f89c3484 [NFC][InstCombine] New tests: unrecognized_three-way-comparison.ll is ignorant about commutative variants
D66232 "exposes" the problem.

llvm-svn: 369667
2019-08-22 16:46:16 +00:00
Philip Reames 3c4614ff10 Add a couple of extra test noticed in post-commit discussion of rL369541
llvm-svn: 369546
2019-08-21 16:57:53 +00:00
Philip Reames 764b0fd5a3 [instcombine] icmp eq/ne (sub C, Y), C -> icmp eq/ne Y, 0
Noticed while looking at pr43028.  

llvm-svn: 369541
2019-08-21 15:51:57 +00:00
Sanjay Patel e728259278 [InstCombine] narrow icmp with extended operands of different widths
An intermediate extend is used to widen the narrow operand to the width of
the other (wider) operand. At that point, we have the same logic as the
existing transform that was restricted to folds of equal width zext/sext.

This mostly solves PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700

llvm-svn: 369519
2019-08-21 11:56:08 +00:00
Sanjay Patel d5035727ad [InstCombine] add more extra use tests for icmp with extends; NFC
llvm-svn: 369447
2019-08-20 21:23:28 +00:00
Sanjay Patel 48e81e8e10 [InstCombine] add tests for mismatched cast ops for icmp; NFC
Motivating case is shown in PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700

llvm-svn: 369439
2019-08-20 20:51:50 +00:00
Sanjay Patel f99d254aae [InstCombine] simplify min/max of min/max with same operands (PR35607)
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607

As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.

There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.

https://rise4fun.com/Alive/3KVc

  Name: smax(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: smin(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

  Name: umax(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: umin(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

llvm-svn: 369386
2019-08-20 13:39:17 +00:00
Sanjay Patel eb2211b352 [InstCombine] add tests for min/max with min/max of same operands; NFC
llvm-svn: 369376
2019-08-20 12:49:03 +00:00
Roman Lebedev e8f666f48d [NFC][InstCombine] Some tests for 'shift amount reassoc in bit test - trunc-of-lshr' (PR42399)
Finally, the fold i was looking forward to :)

The legality check is muddy, i doubt  i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j

https://bugs.llvm.org/show_bug.cgi?id=42399

llvm-svn: 369197
2019-08-17 21:35:33 +00:00
Sanjay Patel a53ad0e157 Revert r367891 - "[InstCombine] combine mul+shl separated by zext"
This reverts commit 5dbb90bfe1.

As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.

We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).

llvm-svn: 369174
2019-08-16 23:36:28 +00:00
Roman Lebedev 515ad8fe4a [InstCombine][NFC] reuse-constant-from-select-in-icmp.ll - check branch_weights too
llvm-svn: 369166
2019-08-16 23:06:37 +00:00
Roman Lebedev 97176bd2bc [InstCombine][NFC] Revisit tests in reuse-constant-from-select-in-icmp.ll
llvm-svn: 369163
2019-08-16 22:40:06 +00:00
Sanjay Patel 39eb2324f7 [InstCombine] canonicalize a scalar-select-of-vectors to vector select
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.

For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).

For example, if the condition of the select is a scalar, we end up with something like this on x86:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpextrb	$12, %xmm0, %eax
	testb	$1, %al
	jne	LBB0_2
  ## %bb.1:
	vmovaps	%xmm3, %xmm2
  LBB0_2:
	vmovaps	%xmm2, %xmm0

Rather than the splat-condition variant:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpshufd	$255, %xmm0, %xmm0      ## xmm0 = xmm0[3,3,3,3]
	vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

Differential Revision: https://reviews.llvm.org/D66095

llvm-svn: 369140
2019-08-16 18:51:30 +00:00
Evandro Menezes 05e9c2ac2e [InstCombine] Simplify pow(2.0, itofp(y)) to ldexp(1.0, y)
Simplify `pow(2.0, itofp(y))` to `ldexp(1.0, y)`.

Differential revision: https://reviews.llvm.org/D65979

llvm-svn: 369120
2019-08-16 15:33:41 +00:00
Roman Lebedev 16244fccfe [InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399)
Summary:
This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399

I thought naive pattern would solve my issue, but nope, it involved truncation,
thus more folds needed.. This isn't really the fold i'm interested in,
i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler.

In this case, no extra legality checks are needed:
https://rise4fun.com/Alive/CAb

We should be careful about not increasing instruction count,
since we need to produce `zext` because `and` is done in wider type.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66057

llvm-svn: 369117
2019-08-16 15:10:41 +00:00
Florian Hahn 75be1a9e58 [ValueTracking] Fix recurrence detection to check both PHI operands.
Summary:
Currently we fail to compute known bits for recurrences where the
first incoming value is the start value of the recurrence.

Instead of exiting the loop when the first incoming value is not
the step of the recurrence, continue to check the second incoming
value.

The original code uses a loop to handle both cases, but incorrectly
exits instead of continuing.

Reviewers: lebedev.ri, spatel, nikic

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66216

llvm-svn: 369088
2019-08-16 09:15:02 +00:00
David Bolvansky 00782a4b68 [NFC] Added tests for 'select with ctlz to cttz' fold
llvm-svn: 369032
2019-08-15 18:23:37 +00:00
Florian Hahn 1bd898989c [InstCombine] Precommit test case for D66216
llvm-svn: 368978
2019-08-15 08:42:12 +00:00
Roman Lebedev 04ddff4cbc [InstCombine][NFC] Tests for 'try to reuse constant from select in comparison'
https://rise4fun.com/Alive/THl

llvm-svn: 368886
2019-08-14 17:27:50 +00:00
David Bolvansky f94460d4b6 [SLC] Dereferenceable annonation - handle valid null pointers
Reviewers: jdoerfert, reames

Reviewed By: jdoerfert

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66161

llvm-svn: 368884
2019-08-14 17:15:20 +00:00
David Bolvansky 0e0fbae1a4 [BuildLibCalls] Noalias annotation
Summary: I think this is better solution than annotating callsites in IC/SLC.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66217

llvm-svn: 368875
2019-08-14 16:50:06 +00:00
Roman Lebedev 2faafc6e4f [InstCombine][NFC] Autogenerate checks in adjust-for-minmax.ll
Being affected by WIP patch.

llvm-svn: 368807
2019-08-14 08:12:20 +00:00
David Bolvansky 038d604f4f [SimplifyLibCalls] Add noalias from known callsites
Summary:
Should be fine for memcpy, strcpy, strncpy.


Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66135

llvm-svn: 368724
2019-08-13 17:18:46 +00:00
Nikita Popov 2a4f26b4c2 [ValueTracking] Improve reverse assumption inference
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.

Original patch by arielb1.

Differential Revision: https://reviews.llvm.org/D37215

llvm-svn: 368723
2019-08-13 17:15:42 +00:00
David Bolvansky dde10cd7a9 [NFC] Revisited/updated tests
llvm-svn: 368722
2019-08-13 17:07:02 +00:00
David Bolvansky 90a30fdcc3 [SLC] Improve dereferenceable bytes annotation
llvm-svn: 368715
2019-08-13 16:44:16 +00:00
Roman Lebedev 73f702ff19 [InstCombine] Non-canonical clamp-like pattern handling
Summary:
Given a pattern like:
```
%old_cmp1 = icmp slt i32 %x, C2
%old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high
%old_x_offseted = add i32 %x, C1
%old_cmp0 = icmp ult i32 %old_x_offseted, C0
%r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
```
it can be rewritten as more canonical pattern:
```
%new_cmp1 = icmp slt i32 %x, -C1
%new_cmp2 = icmp sge i32 %x, C0-C1
%new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x
%r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low
```
Iff `-C1 s<= C2 s<= C0-C1`
Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result)
Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result)

If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses.
NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting.

So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants:

|     | ULT                            | UGE                             | UGT                             |
| SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH  |
| SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk  | https://rise4fun.com/Alive/PlzS |
| SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY  | https://rise4fun.com/Alive/KrzC |
{F9730206}

This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch.
This patch requires D65530.

Reviewers: spatel, nikic, xbolva00, dmgreen

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, dmgreen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65765

llvm-svn: 368687
2019-08-13 12:49:28 +00:00
Roman Lebedev 2635c324da [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible
Summary:
This is rather unconventional..

As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.

As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.

We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,

But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.

For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S

Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].

Reviewers: spatel, nikic, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65530

llvm-svn: 368685
2019-08-13 12:49:06 +00:00
David Bolvansky 39130314fe [SimplifyLibCalls] Add dereferenceable bytes from known callsites
Summary:
int mm(char *a, char *b) {
    return memcmp(a,b,16);
}

Currently:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16)
  ret i32 %call
}

After patch:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:
  %call = tail call i32 @memcmp(i8* dereferenceable(16)  %a, i8* dereferenceable(16)  %b, i64 16)
  ret i32 %call
}




Reviewers: jdoerfert, efriedma

Reviewed By: jdoerfert

Subscribers: javed.absar, spatel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66079

llvm-svn: 368657
2019-08-13 09:11:49 +00:00
Roman Lebedev 09eb71ced3 [NFC][InstCombine] Non-canonical clamp pattern: non-canonical predicate tests
We can't handle 'uge' case because we can't ever get it,
there needs to be extra use on that compare or else it will be
canonicalized, but because of extra use we can't handle it.

'sge' case we can have.

llvm-svn: 368656
2019-08-13 08:14:13 +00:00
Sanjay Patel 24a9e86849 [InstCombine] add tests for scalar-select-of-vectors; NFC
llvm-svn: 368583
2019-08-12 15:21:11 +00:00
David Bolvansky 20d37fab82 [InstCombine] x /c fabs(x) -> copysign(1.0, x)
Summary:
x / fabs(x) -> copysign(1.0, x)
fabs(x) / x -> copysign(1.0, x)

Reviewers: spatel, foad, RKSimon, efriedma

Reviewed By: spatel

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65898

llvm-svn: 368570
2019-08-12 13:43:35 +00:00
Roman Lebedev ccdad6ef48 [InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962)
Instead of matching value and then blindly casting to BinaryOperator
just to get the opcode, just match instruction and do no cast.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42962

llvm-svn: 368554
2019-08-12 11:28:02 +00:00
Roman Lebedev 404e978f27 [NFC][InstCombine] Tests for shift amount reassociation in bittest with truncated shl (PR42399)
trunc-of-shl:
  https://rise4fun.com/Alive/zGx
  https://rise4fun.com/Alive/sl0L
I.e. no extra legality check needed.

https://bugs.llvm.org/show_bug.cgi?id=42399

llvm-svn: 368520
2019-08-10 19:29:03 +00:00
Roman Lebedev a8d20b4467 [InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.

llvm-svn: 368519
2019-08-10 19:28:54 +00:00
Roman Lebedev 64fe806c4e [InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.

llvm-svn: 368518
2019-08-10 19:28:44 +00:00
Roman Lebedev 45e9990c02 [NFC][InstCombine] Tests for shift amount reassociation in bittest with shift of const
llvm-svn: 368517
2019-08-10 19:28:12 +00:00
David Bolvansky f6a5699392 [NFC] Added tests for D65898
llvm-svn: 368447
2019-08-09 15:52:26 +00:00
David Bolvansky 2689ed0f9d [InstCombine][NFC] Added comments about constants in tests for pow->exp2 fold
llvm-svn: 368360
2019-08-08 22:37:51 +00:00
David Bolvansky ae154d00b4 [NFC] Fixed newly added tests
llvm-svn: 368201
2019-08-07 19:36:46 +00:00
David Bolvansky f8183d64de [NFC] Added tests for x/fabs(X) fold
llvm-svn: 368200
2019-08-07 19:35:25 +00:00
Jay Foad 7d4ab7751d [InstCombine] Add a TODO comment
llvm-svn: 368176
2019-08-07 15:18:34 +00:00
Jay Foad 8e8b295835 [InstCombine] Propagate fast math flags through selects
Summary:
In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the
outer op into both arms of the new select, to take advantage of
simplifications that require fast math flags.

Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00

Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65658

llvm-svn: 368175
2019-08-07 15:16:28 +00:00
Roman Lebedev 9bece444dd [InstCombine] Recommit: Shift amount reassociation: shl-trunc-shl pattern
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).

I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.

I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368141
2019-08-07 09:41:50 +00:00
Reid Kleckner e4bd38478b Revert [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
This reverts r368059 (git commit 0f95710976)

This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.

llvm-svn: 368084
2019-08-06 20:32:07 +00:00
Roman Lebedev 0f95710976 [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
Summary:
Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles
two shifts one after another. If the shifts are `shl`, we still can
easily perform the fold, with no extra legality checks:
https://rise4fun.com/Alive/OQbM

If we have right-shift however, we won't be able to make it
any simpler than it already is.

After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`)
* If it's a logical shift, then constant-fold to `0` (not `undef`)
* If it's a `ashr`, then a splat of original signbit
https://rise4fun.com/Alive/E1K
https://rise4fun.com/Alive/i0V

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368059
2019-08-06 17:03:40 +00:00
Sanjay Patel efc24d9d6f [InstCombine] add tests for binop with FMF with select operands; NFC
Baseline coverage for D65658.

llvm-svn: 368028
2019-08-06 13:19:13 +00:00
Roman Lebedev 76b772f9ce [InstCombine][NFC] Tests for non-canonical clamp-like pattern
As discussed in https://reviews.llvm.org/D65148#1607019

The canonical fold is: https://rise4fun.com/Alive/FKe

llvm-svn: 367897
2019-08-05 18:01:22 +00:00
Sanjay Patel 5dbb90bfe1 [InstCombine] combine mul+shl separated by zext
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.

That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.

https://rise4fun.com/Alive/Qzv2

llvm-svn: 367891
2019-08-05 16:59:58 +00:00
Sanjay Patel 4b9d66cf41 [InstCombine] add tests for shl+mul; NFC
llvm-svn: 367883
2019-08-05 16:17:07 +00:00
Sanjay Patel 1a29823b9c [InstCombine] add extra use constraint for shl-zext fold
As the test shows, we can end up with more instructions than
we started with if we don't include the extra-use check.

llvm-svn: 367880
2019-08-05 16:04:07 +00:00
Sanjay Patel d1c5d13470 [InstCombine] add test for shl-zext with extra use; NFC
llvm-svn: 367876
2019-08-05 15:25:07 +00:00
David Bolvansky e834e306cb [InstCombine] Added mempcpy tests [NFC]
llvm-svn: 367825
2019-08-05 09:58:32 +00:00
Sanjay Patel 9ce5f41851 [InstCombine] fold cmp+select using select operand equivalence
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.

We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.

Here's an example:
https://rise4fun.com/Alive/Xplr

  %cmp = icmp eq i32 %x, 2147483647
  %add = add nsw i32 %x, 1
  %sel = select i1 %cmp, i32 -2147483648, i32 %add
  =>
  %sel = add i32 %x, 1

I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.

Differential Revision: https://reviews.llvm.org/D65576

llvm-svn: 367695
2019-08-02 17:39:32 +00:00
Sanjay Patel 66ce04f261 [InstCombine] add tests with 'ne' predicates; NFC
More coverage for the proposal in D65576.

llvm-svn: 367579
2019-08-01 16:04:12 +00:00
Sanjay Patel 350b389c90 [InstCombine] add test with swapped select operands; NFC
More coverage for the proposal in D65576.

llvm-svn: 367577
2019-08-01 15:32:10 +00:00
Sanjay Patel 435cdecdf7 [InstCombine] canonicalize fneg before fmul/fdiv
Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it
easier to implement the transforms (and possibly other fneg transforms) in
1 place because we can always start the pattern match from fneg (either the
legacy binop or the new unop).

There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914
https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR
(although this change may expose analysis holes in the other direction).

1. The instcombine test changes show the expected neutral IR diffs from
   reversing the order.

2. The reassociation tests show that we were missing an optimization
   opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says
   that all of these transforms are allowed (regardless of binop/unop
   fneg version) because:

   "For all other operations [besides copy/abs/negate/copysign], this
   standard does not specify the sign bit of a NaN result."
   In all of these transforms, we always have some other binop
   (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a
   potential intermediate NaN operand.
   (If that interpretation is wrong, then we must already have a bug in
   the existing transforms?)

3. The clang tests shouldn't exist as-is, but that's effectively a
   revert of rL367149 (the test broke with an extension of the
   pre-existing fneg canonicalization in rL367146).

Differential Revision: https://reviews.llvm.org/D65399

llvm-svn: 367447
2019-07-31 16:53:22 +00:00
Roman Lebedev 8d76284599 [NFC][InstCombine] Add xor-or-icmp tests with icmp having extra uses
Currently InstCombiner::foldXorOfICmps() bailouts if the
ICMP it wants to invert has extra uses. As it can be seen
in the tests in previous commit, this is super unfortunate,
this is the single pattern that is left non-canonicalized.

We could analyze if we can also invert all the uses if said ICMP
at the same time, thus not bailing out there.
I'm not seeing any nicer alternative.

llvm-svn: 367439
2019-07-31 15:20:33 +00:00
Roman Lebedev 67688af5f0 [NFC][InstCombine] Add baseline tests with non-canonical CLAMP pattern
As disscussed in https://reviews.llvm.org/D65148#1603922
these would all need to be canonicalized to traditional clamp pattern.

llvm-svn: 367438
2019-07-31 15:20:21 +00:00
Roman Lebedev be612ea471 [InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.

While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.

There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS

To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).

Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65046

llvm-svn: 367322
2019-07-30 15:28:22 +00:00
Cameron McInally b32a6592eb [NFC][FPEnv] Pre-commit tests for canonicalize negated operand of fdiv.
llvm-svn: 367233
2019-07-29 16:09:56 +00:00
Sanjay Patel e9ee7b47d4 [InstCombine] fold fadd+fneg with fdiv/fmul betweena
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367227
2019-07-29 13:50:25 +00:00
Sanjay Patel 74c35bd6b0 [InstCombine] add tests for fadd with negated operand; NFC
llvm-svn: 367222
2019-07-29 12:49:36 +00:00
Roman Lebedev 6ff633ddc4 [NFC][InstCombine] Revisit tests in shift-amount-reassociation-with-truncation-shl.ll
llvm-svn: 367196
2019-07-28 21:31:58 +00:00
Sanjay Patel 99c57c6daf [InstCombine] fold fsub+fneg with fdiv/fmul between
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367194
2019-07-28 17:10:06 +00:00
Roman Lebedev d5bc4b09f1 [NFC][InstCombine] Shift amount reassociation: can have trunc between shl's
https://rise4fun.com/Alive/OQbM
Not so simple for lshr/ashr, so those maybe later.

https://bugs.llvm.org/show_bug.cgi?id=42391

llvm-svn: 367189
2019-07-28 13:13:46 +00:00
Sanjay Patel d20a0fe203 [InstCombine] add tests for fsub with negated operand; NFC
llvm-svn: 367156
2019-07-26 21:12:22 +00:00
Sanjay Patel a9ab31558c [InstCombine] canonicalize negated operand of fdiv
This is a transform that we use with fmul, so use
it for fdiv too for consistency.

llvm-svn: 367146
2019-07-26 19:56:59 +00:00
Sanjay Patel 487e957775 [InstCombine] add tests for fdiv with negated operand; NFC
llvm-svn: 367145
2019-07-26 19:44:53 +00:00
Sanjay Patel c229cfeb7a [InstCombine] remove flop from lerp patterns
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)

This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716

Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.

Differential Revision: https://reviews.llvm.org/D65305

llvm-svn: 367101
2019-07-26 11:19:18 +00:00
Sanjay Patel 8f15d40555 [InstCombine] add tests for lerp patterns (PR42716); NFC
llvm-svn: 367069
2019-07-25 22:25:21 +00:00
Vlad Tsyrklevich 5d5a58317c Revert "[InstCombine] try to narrow a truncated load"
This reverts commit bc4a63fd3c, this is a
speculative revert to fix a number of sanitizer bots (like
sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2
compiler crashes, presumably due to a miscompile.

llvm-svn: 367029
2019-07-25 15:37:57 +00:00
Sanjay Patel bc4a63fd3c [InstCombine] try to narrow a truncated load
trunc (load X) --> load (bitcast X to narrow type)

We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.

Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.

We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.

Differential Revision: https://reviews.llvm.org/D64432

llvm-svn: 367011
2019-07-25 12:14:27 +00:00
Craig Topper e9abc8177a [InstCombine] Teach foldOrOfICmps to allow icmp eq MIN_INT/MAX to be part of a range comparision. Similar for foldAndOfICmps
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.

We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.

Fixes PR42691.

Differential Revision: https://reviews.llvm.org/D65017

llvm-svn: 366945
2019-07-24 20:57:29 +00:00
David Bolvansky db913d9618 [InstCombine] Adjusted pow-exp tests for Windows [NFC]
Summary: https://bugs.llvm.org/show_bug.cgi?id=42740

Reviewers: efriedma, hans

Reviewed By: hans

Subscribers: spatel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65220

llvm-svn: 366925
2019-07-24 17:01:20 +00:00
Matt Arsenault 0b7f226311 AMDGPU: Fix test after r366913
llvm-svn: 366916
2019-07-24 16:05:55 +00:00
Sanjay Patel 3624074426 [InstCombine] add tests for load narrowing; NFC
Baseline results for D64432.

llvm-svn: 366901
2019-07-24 12:44:21 +00:00
Roman Lebedev 402bf28ecc [NFC][InstCombine] Fixup commutative/negative tests with icmp preds in @llvm.umul.with.overflow tests
llvm-svn: 366802
2019-07-23 12:42:57 +00:00
Hideto Ueno 2d654df763 [AMDGPU][NFC] Simplify test file for amdgcn intrinsics
Summary: Remove unchecked attribute in the call site and use FileCheck String Substitution for `convergent` check.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64901

llvm-svn: 366781
2019-07-23 06:48:47 +00:00
Roman Lebedev 77d37037f0 [InstCombine][NFC] Tests for canonicalization of unsigned multiply overflow check
llvm-svn: 366748
2019-07-22 22:08:45 +00:00
Craig Topper ee5dc7e7ad [InstCombine] Add foldAndOfICmps test cases inspired by PR42691.
icmp ne %x, INT_MIN can be treated similarly to icmp sgt %x, INT_MIN.
icmp ne %x, INT_MAX can be treated similarly to icmp slt %x, INT_MAX.
icmp ne %x, UINT_MAX can be treated similarly to icmp ult %x, UINT_MAX.

We already treat icmp ne %x, 0 similarly to icmp ugt %x, 0

llvm-svn: 366662
2019-07-22 02:43:43 +00:00
Roman Lebedev 8a431874e9 [NFC][InstCombine] Add a few extra srem-by-power-of-two tests - extra uses
llvm-svn: 366652
2019-07-21 09:05:49 +00:00
Roman Lebedev a2dd672c5f [NFC][InstCombine] Autogenerate a few tests
llvm-svn: 366643
2019-07-20 21:34:00 +00:00
Roman Lebedev 056640f8b3 [NFC][InstCombine] Add srem-by-signbit tests - still can fold to bittest
https://rise4fun.com/Alive/IIeS

llvm-svn: 366642
2019-07-20 21:33:50 +00:00
Craig Topper 3a3c58f045 [InstCombine] Fix copy/paste mistake in the test cases I added for PR42691. NFC
llvm-svn: 366614
2019-07-19 21:09:21 +00:00
Craig Topper 18230ecf7e [InstCombine] Add test cases for PR42691. NFC
llvm-svn: 366611
2019-07-19 20:48:52 +00:00
Roman Lebedev 9998585c47 [NFC][InstCombine] Tests for 'rem' formation from sub-of-mul-by-'div' (PR42673)
https://rise4fun.com/Alive/8Rp
https://bugs.llvm.org/show_bug.cgi?id=42673

llvm-svn: 366565
2019-07-19 11:29:18 +00:00
Roman Lebedev 882bf2a844 [NFC][InstCombine] Redundant masking before left-shift: tests with assume
If the legality check is `(shiftNbits-maskNbits) s>= 0`,
then we can simplify it to `shiftNbits u>= maskNbits`,
which is easier to check for.

However, currently switching the `dropRedundantMaskingOfLeftShiftInput()`
to `SimplifyICmpInst()` does not catch these cases and regresses
currently-handled cases, so i'll leave it as is for now.

https://rise4fun.com/Alive/25P

llvm-svn: 366564
2019-07-19 11:29:04 +00:00
Roman Lebedev f2eb403144 [InstCombine] Dropping redundant masking before left-shift [5/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:

alive proofs:
f: https://rise4fun.com/Alive/7U3

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64524

llvm-svn: 366540
2019-07-19 08:26:58 +00:00
Roman Lebedev 441c9d6ca8 [InstCombine] Dropping redundant masking before left-shift [4/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
e: https://rise4fun.com/Alive/0FT

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64521

llvm-svn: 366539
2019-07-19 08:26:47 +00:00
Roman Lebedev 3c212ce305 [InstCombine] Dropping redundant masking before left-shift [3/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
d: https://rise4fun.com/Alive/I5Y

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64519

llvm-svn: 366538
2019-07-19 08:26:37 +00:00
Roman Lebedev 2ebe57386d [InstCombine] Dropping redundant masking before left-shift [2/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
c: https://rise4fun.com/Alive/RgJh

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64517

llvm-svn: 366537
2019-07-19 08:26:25 +00:00
Roman Lebedev 4422a1657c [InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
b: https://rise4fun.com/Alive/y8M

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64514

llvm-svn: 366536
2019-07-19 08:26:13 +00:00
Roman Lebedev a5f0824eb5 [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
a: https://rise4fun.com/Alive/wi9

Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, huihuiz, xbolva00

Reviewed By: xbolva00

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64512

llvm-svn: 366535
2019-07-19 08:25:43 +00:00
Nikita Popov 57190b3974 [InstCombine] Add assume context test; NFC
Baseline test for D37215.

llvm-svn: 366021
2019-07-14 15:55:32 +00:00
Sanjay Patel 22cc1030f6 Revert "[InstCombine] add tests for umin/umax via usub.sat; NFC"
This reverts commit rL365999 / 0f6148df23.
The tests already exist in this file, and the hoped-for transform
(mentioned in D62871) is invalid because of undef as discussed in
D63060.

llvm-svn: 366000
2019-07-13 13:16:46 +00:00
Sanjay Patel 0f6148df23 [InstCombine] add tests for umin/umax via usub.sat; NFC
llvm-svn: 365999
2019-07-13 12:54:48 +00:00
David Bolvansky af1b3185f5 [InstCombine] Fold select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y) to ashr (X, Y))
Summary:
(select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y)) -> ashr (X, Y))
(select (icmp slt x, 1), ashr (X, Y), lshr (X, Y)) -> ashr (X, Y))

Fixes PR41173

Alive proof by @lebedev.ri (thanks)
Name: PR41173
  %cmp = icmp slt i32 %x, 1
  %shr = lshr i32 %x, %y
  %shr1 = ashr i32 %x, %y
  %retval.0 = select i1 %cmp, i32 %shr1, i32 %shr
  =>
  %retval.0 = ashr i32 %x, %y

Optimization: PR41173
Done: 1
Optimization is correct!

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: nikic, craig.topper, llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64285

llvm-svn: 365893
2019-07-12 11:31:16 +00:00
Huihui Zhang 7b4a59db1e [InstCombine][NFCI] Add more test coverage to onehot_merge.ll
Prep work for upcoming patch D64275.

llvm-svn: 365828
2019-07-11 21:28:25 +00:00
David Bolvansky 5dca95bc4e [NFC] Revisited tests for D64285
llvm-svn: 365815
2019-07-11 19:39:20 +00:00
Sanjay Patel 3487791fea [InstCombine] don't move FP negation out of a constant expression
-(X * ConstExpr) becomes X * (-ConstExpr), so don't reverse that
and infinite loop.

llvm-svn: 365774
2019-07-11 13:44:29 +00:00
David Bolvansky e195a91d2d [NFC] Updated tests for D64285
llvm-svn: 365765
2019-07-11 12:51:33 +00:00
David Bolvansky e23be09e66 [InstCombine] Reorder recently added/improved pow transformations
Changed cases are now faster with exp2.

llvm-svn: 365758
2019-07-11 10:55:04 +00:00
Huihui Zhang 51f5079191 [InstCombine][NFCI] Add test coverage to onehot_merge.ll
Prep work for upcoming patch D64275.

llvm-svn: 365729
2019-07-11 04:56:37 +00:00
Johannes Doerfert 3ed286a388 Replace three "strip & accumulate" implementations with a single one
This patch replaces the three almost identical "strip & accumulate"
implementations for constant pointer offsets with a single one,
combining the respective functionalities. The old interfaces are kept
for now.

Differential Revision: https://reviews.llvm.org/D64468

llvm-svn: 365723
2019-07-11 01:14:48 +00:00