Commit Graph

3436 Commits

Author SHA1 Message Date
Sanjay Patel 2e68e4d60e [InstCombine] make fold for icmp with sext more efficient; NFC
We were creating 2 instructions and relying on a subsequent fold
to invert a not(icmp). Create the final icmp directly instead.

llvm-svn: 369411
2019-08-20 17:03:22 +00:00
Sanjay Patel a90ee0eeb6 [InstCombine] improve readability for icmp with cast folds; NFC
1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.

llvm-svn: 369390
2019-08-20 14:56:44 +00:00
Sanjay Patel f99d254aae [InstCombine] simplify min/max of min/max with same operands (PR35607)
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607

As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.

There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.

https://rise4fun.com/Alive/3KVc

  Name: smax(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: smin(smax, smin)
  %c1 = icmp slt i32 %x, %y
  %c2 = icmp slt i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp sgt i32 %max, %min
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

  Name: umax(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %max, i32 %min
  =>
  %r = %max

  Name: umin(umax, umin)
  %c1 = icmp ult i32 %x, %y
  %c2 = icmp ult i32 %y, %x
  %min = select i1 %c1, i32 %x, i32 %y
  %max = select i1 %c2, i32 %x, i32 %y
  %c3 = icmp ult i32 %min, %max
  %r = select i1 %c3, i32 %min, i32 %max
  =>
  %r = %min

llvm-svn: 369386
2019-08-20 13:39:17 +00:00
Roman Lebedev 9b957d3321 [InstCombine] Cherry-pick NFC cleanups of foldShiftIntoShiftInAnotherHandOfAndInICmp() from D66383
llvm-svn: 369207
2019-08-18 12:26:33 +00:00
Sanjay Patel a53ad0e157 Revert r367891 - "[InstCombine] combine mul+shl separated by zext"
This reverts commit 5dbb90bfe1.

As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.

We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).

llvm-svn: 369174
2019-08-16 23:36:28 +00:00
Sanjay Patel 39eb2324f7 [InstCombine] canonicalize a scalar-select-of-vectors to vector select
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.

For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).

For example, if the condition of the select is a scalar, we end up with something like this on x86:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpextrb	$12, %xmm0, %eax
	testb	$1, %al
	jne	LBB0_2
  ## %bb.1:
	vmovaps	%xmm3, %xmm2
  LBB0_2:
	vmovaps	%xmm2, %xmm0

Rather than the splat-condition variant:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpshufd	$255, %xmm0, %xmm0      ## xmm0 = xmm0[3,3,3,3]
	vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

Differential Revision: https://reviews.llvm.org/D66095

llvm-svn: 369140
2019-08-16 18:51:30 +00:00
Roman Lebedev 16244fccfe [InstCombine] Shift amount reassociation in bittest: trunc-of-shl (PR42399)
Summary:
This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399

I thought naive pattern would solve my issue, but nope, it involved truncation,
thus more folds needed.. This isn't really the fold i'm interested in,
i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler.

In this case, no extra legality checks are needed:
https://rise4fun.com/Alive/CAb

We should be careful about not increasing instruction count,
since we need to produce `zext` because `and` is done in wider type.

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66057

llvm-svn: 369117
2019-08-16 15:10:41 +00:00
Roman Lebedev 32f1e1a01d [InstCombine] Refactor getFlippedStrictnessPredicateAndConstant() out of canonicalizeCmpWithConstant(), NFCI
I'd like to use it elsewhere, hopefully without reinventing the wheel.
No functional change intended so far.

llvm-svn: 368820
2019-08-14 09:57:20 +00:00
Roman Lebedev 73f702ff19 [InstCombine] Non-canonical clamp-like pattern handling
Summary:
Given a pattern like:
```
%old_cmp1 = icmp slt i32 %x, C2
%old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high
%old_x_offseted = add i32 %x, C1
%old_cmp0 = icmp ult i32 %old_x_offseted, C0
%r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
```
it can be rewritten as more canonical pattern:
```
%new_cmp1 = icmp slt i32 %x, -C1
%new_cmp2 = icmp sge i32 %x, C0-C1
%new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x
%r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low
```
Iff `-C1 s<= C2 s<= C0-C1`
Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result)
Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result)

If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses.
NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting.

So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants:

|     | ULT                            | UGE                             | UGT                             |
| SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH  |
| SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk  | https://rise4fun.com/Alive/PlzS |
| SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY  | https://rise4fun.com/Alive/KrzC |
{F9730206}

This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch.
This patch requires D65530.

Reviewers: spatel, nikic, xbolva00, dmgreen

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, dmgreen

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65765

llvm-svn: 368687
2019-08-13 12:49:28 +00:00
Roman Lebedev 0410489a34 [InstCombine][NFC] Rename IsFreeToInvert() -> isFreeToInvert() for consistency
As per https://reviews.llvm.org/D65530#inline-592325

llvm-svn: 368686
2019-08-13 12:49:16 +00:00
Roman Lebedev 2635c324da [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible
Summary:
This is rather unconventional..

As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.

As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.

We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,

But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.

For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S

Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].

Reviewers: spatel, nikic, RKSimon, craig.topper

Reviewed By: spatel

Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65530

llvm-svn: 368685
2019-08-13 12:49:06 +00:00
David Bolvansky 20d37fab82 [InstCombine] x /c fabs(x) -> copysign(1.0, x)
Summary:
x / fabs(x) -> copysign(1.0, x)
fabs(x) / x -> copysign(1.0, x)

Reviewers: spatel, foad, RKSimon, efriedma

Reviewed By: spatel

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65898

llvm-svn: 368570
2019-08-12 13:43:35 +00:00
Roman Lebedev ccdad6ef48 [InstCombine] foldShiftIntoShiftInAnotherHandOfAndInICmp(): avoid constantexpr pitfail (PR42962)
Instead of matching value and then blindly casting to BinaryOperator
just to get the opcode, just match instruction and do no cast.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42962

llvm-svn: 368554
2019-08-12 11:28:02 +00:00
Roman Lebedev 96474d17c6 [InstCombine][NFC] Use SimplifyAddInst() instead of SimplifyBinOp(Instruction::BinaryOps::Add, )
llvm-svn: 368521
2019-08-10 19:29:10 +00:00
Roman Lebedev a8d20b4467 [InstCombine] Shift amount reassociation in bittest: relax one-use check when shifting constant
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.

llvm-svn: 368519
2019-08-10 19:28:54 +00:00
Roman Lebedev 64fe806c4e [InstCombine] Shift amount reassociation in bittest: drop pointless one-use restriction
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.

llvm-svn: 368518
2019-08-10 19:28:44 +00:00
Evandro Menezes c6c00cdf2e [Transforms] Rename hasUnaryFloatFn() and getUnaryFloatFn() (NFC)
Rename `hasUnaryFloatFn()` to `hasFloatFn()` and `getUnaryFloatFn()` to `getFloatFnName()`.

llvm-svn: 368449
2019-08-09 16:04:18 +00:00
Jay Foad 8e8b295835 [InstCombine] Propagate fast math flags through selects
Summary:
In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the
outer op into both arms of the new select, to take advantage of
simplifications that require fast math flags.

Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00

Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65658

llvm-svn: 368175
2019-08-07 15:16:28 +00:00
Roman Lebedev 9bece444dd [InstCombine] Recommit: Shift amount reassociation: shl-trunc-shl pattern
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).

I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.

I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368141
2019-08-07 09:41:50 +00:00
Reid Kleckner e4bd38478b Revert [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
This reverts r368059 (git commit 0f95710976)

This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.

llvm-svn: 368084
2019-08-06 20:32:07 +00:00
Roman Lebedev 0f95710976 [InstCombine] Shift amount reassociation: shl-trunc-shl pattern
Summary:
Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles
two shifts one after another. If the shifts are `shl`, we still can
easily perform the fold, with no extra legality checks:
https://rise4fun.com/Alive/OQbM

If we have right-shift however, we won't be able to make it
any simpler than it already is.

After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`)
* If it's a logical shift, then constant-fold to `0` (not `undef`)
* If it's a `ashr`, then a splat of original signbit
https://rise4fun.com/Alive/E1K
https://rise4fun.com/Alive/i0V

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65380

llvm-svn: 368059
2019-08-06 17:03:40 +00:00
Sanjay Patel 5dbb90bfe1 [InstCombine] combine mul+shl separated by zext
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.

That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.

https://rise4fun.com/Alive/Qzv2

llvm-svn: 367891
2019-08-05 16:59:58 +00:00
Sanjay Patel 1a29823b9c [InstCombine] add extra use constraint for shl-zext fold
As the test shows, we can end up with more instructions than
we started with if we don't include the extra-use check.

llvm-svn: 367880
2019-08-05 16:04:07 +00:00
Sanjay Patel 9ce5f41851 [InstCombine] fold cmp+select using select operand equivalence
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.

We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.

Here's an example:
https://rise4fun.com/Alive/Xplr

  %cmp = icmp eq i32 %x, 2147483647
  %add = add nsw i32 %x, 1
  %sel = select i1 %cmp, i32 -2147483648, i32 %add
  =>
  %sel = add i32 %x, 1

I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.

Differential Revision: https://reviews.llvm.org/D65576

llvm-svn: 367695
2019-08-02 17:39:32 +00:00
Roman Lebedev 0efeaa8162 [IR] SelectInst: add swapValues() utility
Summary:
Sometimes we need to swap true-val and false-val of a `SelectInst`.
Having a function for that is nicer than hand-writing it each time.

Reviewers: spatel, RKSimon, craig.topper, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65520

llvm-svn: 367547
2019-08-01 12:31:35 +00:00
Sanjay Patel 435cdecdf7 [InstCombine] canonicalize fneg before fmul/fdiv
Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it
easier to implement the transforms (and possibly other fneg transforms) in
1 place because we can always start the pattern match from fneg (either the
legacy binop or the new unop).

There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914
https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR
(although this change may expose analysis holes in the other direction).

1. The instcombine test changes show the expected neutral IR diffs from
   reversing the order.

2. The reassociation tests show that we were missing an optimization
   opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says
   that all of these transforms are allowed (regardless of binop/unop
   fneg version) because:

   "For all other operations [besides copy/abs/negate/copysign], this
   standard does not specify the sign bit of a NaN result."
   In all of these transforms, we always have some other binop
   (fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a
   potential intermediate NaN operand.
   (If that interpretation is wrong, then we must already have a bug in
   the existing transforms?)

3. The clang tests shouldn't exist as-is, but that's effectively a
   revert of rL367149 (the test broke with an extension of the
   pre-existing fneg canonicalization in rL367146).

Differential Revision: https://reviews.llvm.org/D65399

llvm-svn: 367447
2019-07-31 16:53:22 +00:00
Roman Lebedev be612ea471 [InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.

While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.

There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS

To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).

Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65046

llvm-svn: 367322
2019-07-30 15:28:22 +00:00
Sanjay Patel e9ee7b47d4 [InstCombine] fold fadd+fneg with fdiv/fmul betweena
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367227
2019-07-29 13:50:25 +00:00
Sanjay Patel 5483f4225e [InstCombine] reduce code for fadd with fneg operand; NFC
llvm-svn: 367224
2019-07-29 13:20:46 +00:00
Sanjay Patel 99c57c6daf [InstCombine] fold fsub+fneg with fdiv/fmul between
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.

llvm-svn: 367194
2019-07-28 17:10:06 +00:00
Sanjay Patel a9ab31558c [InstCombine] canonicalize negated operand of fdiv
This is a transform that we use with fmul, so use
it for fdiv too for consistency.

llvm-svn: 367146
2019-07-26 19:56:59 +00:00
Sanjay Patel c229cfeb7a [InstCombine] remove flop from lerp patterns
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)

This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716

Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.

Differential Revision: https://reviews.llvm.org/D65305

llvm-svn: 367101
2019-07-26 11:19:18 +00:00
Vlad Tsyrklevich 5d5a58317c Revert "[InstCombine] try to narrow a truncated load"
This reverts commit bc4a63fd3c, this is a
speculative revert to fix a number of sanitizer bots (like
sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2
compiler crashes, presumably due to a miscompile.

llvm-svn: 367029
2019-07-25 15:37:57 +00:00
Sanjay Patel bc4a63fd3c [InstCombine] try to narrow a truncated load
trunc (load X) --> load (bitcast X to narrow type)

We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.

Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.

We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.

Differential Revision: https://reviews.llvm.org/D64432

llvm-svn: 367011
2019-07-25 12:14:27 +00:00
Sanjay Patel 86e9f9dc26 [Transforms] move copying of load metadata to helper function; NFC
There's another proposed load combine that can make use of this code
in D64432.

llvm-svn: 366949
2019-07-24 22:11:11 +00:00
Craig Topper e9abc8177a [InstCombine] Teach foldOrOfICmps to allow icmp eq MIN_INT/MAX to be part of a range comparision. Similar for foldAndOfICmps
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.

We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.

Fixes PR42691.

Differential Revision: https://reviews.llvm.org/D65017

llvm-svn: 366945
2019-07-24 20:57:29 +00:00
Roman Lebedev 3a94765bfc [NFC][PatternMatch] Refactor code into a proper "matcher for any integral constant"
Having it as a proper matcher is better for reusability elsewhere
(in a follow-up patch.)

llvm-svn: 366752
2019-07-22 22:09:24 +00:00
Craig Topper e6cd20ba53 [InstCombine] Update comment I missed in r366649. NFC
llvm-svn: 366658
2019-07-21 16:15:03 +00:00
Craig Topper 1d149d08d3 [InstCombine] Remove insertRangeTest code that handles the equality case.
For equality, the function called getTrue/getFalse with the VT
of the comparison input. But getTrue/getFalse need the boolean VT.
So if this code ever executed, it would assert.

I believe these cases are removed by InstSimplify so we don't get here.

So this patch just fixes up an assert to exclude the equality
possibility and removes the broken code.

llvm-svn: 366649
2019-07-21 06:43:38 +00:00
Craig Topper 8fabdfe9fc [InstCombine] Don't use AddOne/SubOne to see if two APInts are 1 apart. Use APInt operations instead. NFCI
AddOne/SubOne create new Constant objects. That seems heavy for
comparing ConstantInts which wrap APInts. Just do the math on
on the APInts and compare them.

llvm-svn: 366648
2019-07-21 05:26:05 +00:00
Roman Lebedev f2eb403144 [InstCombine] Dropping redundant masking before left-shift [5/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:

alive proofs:
f: https://rise4fun.com/Alive/7U3

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64524

llvm-svn: 366540
2019-07-19 08:26:58 +00:00
Roman Lebedev 441c9d6ca8 [InstCombine] Dropping redundant masking before left-shift [4/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
e: https://rise4fun.com/Alive/0FT

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64521

llvm-svn: 366539
2019-07-19 08:26:47 +00:00
Roman Lebedev 3c212ce305 [InstCombine] Dropping redundant masking before left-shift [3/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
d: https://rise4fun.com/Alive/I5Y

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64519

llvm-svn: 366538
2019-07-19 08:26:37 +00:00
Roman Lebedev 2ebe57386d [InstCombine] Dropping redundant masking before left-shift [2/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)

alive proofs:
c: https://rise4fun.com/Alive/RgJh

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64517

llvm-svn: 366537
2019-07-19 08:26:25 +00:00
Roman Lebedev 4422a1657c [InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
b: https://rise4fun.com/Alive/y8M

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Differential Revision: https://reviews.llvm.org/D64514

llvm-svn: 366536
2019-07-19 08:26:13 +00:00
Roman Lebedev a5f0824eb5 [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`

alive proof:
a: https://rise4fun.com/Alive/wi9

Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, huihuiz, xbolva00

Reviewed By: xbolva00

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64512

llvm-svn: 366535
2019-07-19 08:25:43 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
David Bolvansky 9f0d718c66 [InstCombine] Disable fold from D64285 for non-integer types
llvm-svn: 365959
2019-07-12 21:14:21 +00:00
David Bolvansky af1b3185f5 [InstCombine] Fold select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y) to ashr (X, Y))
Summary:
(select (icmp sgt x, -1), lshr (X, Y), ashr (X, Y)) -> ashr (X, Y))
(select (icmp slt x, 1), ashr (X, Y), lshr (X, Y)) -> ashr (X, Y))

Fixes PR41173

Alive proof by @lebedev.ri (thanks)
Name: PR41173
  %cmp = icmp slt i32 %x, 1
  %shr = lshr i32 %x, %y
  %shr1 = ashr i32 %x, %y
  %retval.0 = select i1 %cmp, i32 %shr1, i32 %shr
  =>
  %retval.0 = ashr i32 %x, %y

Optimization: PR41173
Done: 1
Optimization is correct!

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: nikic, craig.topper, llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64285

llvm-svn: 365893
2019-07-12 11:31:16 +00:00
Sanjay Patel 3487791fea [InstCombine] don't move FP negation out of a constant expression
-(X * ConstExpr) becomes X * (-ConstExpr), so don't reverse that
and infinite loop.

llvm-svn: 365774
2019-07-11 13:44:29 +00:00
Roman Lebedev c5f92bd67b [PatternMatch] Generalize m_SpecificInt_ULT() to take ICmpInst::Predicate
As discussed in the original review, this may be useful,
so let's just do it.

llvm-svn: 365652
2019-07-10 16:07:35 +00:00
Tim Northover 60afa49abe OpaquePtr: add Type parameter to Loads analysis API.
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.

Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.

llvm-svn: 365468
2019-07-09 11:35:35 +00:00
Sanjay Patel 3dee113ebc [InstCombine] fold insertelement into splat of same scalar
Forming the canonical splat shuffle improves analysis and
may allow follow-on transforms (although some possibilities
are missing as shown in the test diffs).

The backend generically turns these patterns into build_vector,
so there should be no codegen regressions. All targets are
expected to be able to lower splats efficiently.

llvm-svn: 365379
2019-07-08 19:48:52 +00:00
Sanjay Patel 0b59103a73 [InstCombine] canonicalize insert+splat to/from element 0 of vector
We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue()
and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts
to that form for better matching.

The backend generically turns these patterns into build_vector,
so there should be no codegen difference.

llvm-svn: 365342
2019-07-08 16:26:48 +00:00
Sanjay Patel 75b5edf6a1 [InstCombine] allow undef elements when forming splat from chain of insertelements
We allow forming a splat (broadcast) shuffle, but we were conservatively limiting
that to cases where all elements of the vector are specified. It should be safe
from a codegen perspective to allow undefined lanes of the vector because the
expansion of a splat shuffle would become the chain of inserts again.

Forming splat shuffles can reduce IR and help enable further IR transforms.
Motivating bugs:
https://bugs.llvm.org/show_bug.cgi?id=42174
https://bugs.llvm.org/show_bug.cgi?id=16739

Differential Revision: https://reviews.llvm.org/D63848

llvm-svn: 365147
2019-07-04 16:45:34 +00:00
Roman Lebedev 9f0c83902d [InstCombine] Y - ~X --> X + Y + 1 fold (PR42457)
Summary:
I *think* we'd want this new variant, because we obviously
have better handling for `add` as compared to `sub`/`not`.

https://rise4fun.com/Alive/WMn

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]]

Reviewers: spatel, nikic, huihuiz, efriedma

Reviewed By: spatel

Subscribers: RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63992

llvm-svn: 365011
2019-07-03 09:41:50 +00:00
Roman Lebedev 0bde7c6527 [InstCombine] Shift amount reassociation: fixup constantexpr handling (PR42484)
I was actually wondering if there was some nicer way than m_Value()+cast,
but apparently what i was really "subconsciously" thinking about
was correctness issue.

hasNoUnsignedWrap()/hasNoUnsignedWrap() exist for Instruction,
not for BinaryOperator, so let's just use m_Instruction(),
thus both avoiding a cast, and a crash.

Fixes https://bugs.llvm.org/show_bug.cgi?id=42484,
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15587

llvm-svn: 364915
2019-07-02 12:54:48 +00:00
Sanjay Patel ddc1b40f26 [InstCombine] reduce more checks for power-of-2-or-zero using ctpop
Extends the transform from:
rL364341
...to include another (more common?) pattern that tests whether a
value is a power-of-2 (including or excluding zero).

llvm-svn: 364856
2019-07-01 22:00:00 +00:00
Roman Lebedev 04d3d3bbff [InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.

This does address the regression being exposed in D63992.

https://godbolt.org/z/7DGltU
https://rise4fun.com/Alive/EPO0

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]]

Reviewers: spatel, nikic, huihuiz

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63993

llvm-svn: 364792
2019-07-01 15:55:24 +00:00
Roman Lebedev 72b8d41ce8 [InstCombine] Shift amount reassociation in bittest (PR42399)
Summary:
Given pattern:
`icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0`
we should move shifts to the same hand of 'and', i.e. rewrite as
`icmp eq/ne (and (x shift (Q+K)), y), 0`  iff `(Q+K) u< bitwidth(x)`

It might be tempting to not restrict this to situations where we know
we'd fold two shifts together, but i'm not sure what rules should there be
to avoid endless combine loops.

We pick the same shift that was originally used to shift the variable we picked to shift:
https://rise4fun.com/Alive/6x1v

Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]].

Reviewers: spatel, nikic, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63829

llvm-svn: 364791
2019-07-01 15:55:15 +00:00
Roman Lebedev f55818e3a7 [InstCombine] Omit 'urem' where possible
This was added in D63390 / rL364286 to backend,
but it makes sense to also handle it in middle-end.
https://rise4fun.com/Alive/Zsln

llvm-svn: 364738
2019-07-01 09:41:43 +00:00
Sanjay Patel 706b48251f [InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics
This is the opposite direction of D62158 (we have to choose 1 form or the other).
Now that we have FMF on the select, this becomes more palatable. And the benefits
of having a single IR instruction for this operation (less chances of missing folds
based on extra uses, etc) overcome my previous comments about the potential advantage
of larger pattern matching/analysis.

Differential Revision: https://reviews.llvm.org/D62414

llvm-svn: 364721
2019-06-30 13:40:31 +00:00
Roman Lebedev e3a94ba4a9 [InstCombine] Shift amount reassociation (PR42391)
Summary:
Given pattern:
`(x shiftopcode Q) shiftopcode K`
we should rewrite it as
`x shiftopcode (Q+K)`  iff `(Q+K) u< bitwidth(x)`
This is valid for any shift, but they must be identical.

* https://rise4fun.com/Alive/9E2
* exact on both lshr => exact https://rise4fun.com/Alive/plHk
* exact on both ashr => exact https://rise4fun.com/Alive/QDAA
* nuw on both shl => nuw https://rise4fun.com/Alive/5Uk
* nsw on both shl => nsw https://rise4fun.com/Alive/0plg

Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42391 | PR42391]].

Reviewers: spatel, nikic, RKSimon

Reviewed By: nikic

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63812

llvm-svn: 364712
2019-06-29 11:51:50 +00:00
Sanjay Patel 71ad22707c [InstCombine] simplify code for inserts -> splat; NFC
llvm-svn: 364441
2019-06-26 15:52:59 +00:00
Huihui Zhang b90cb57b63 [InstCombine] Simplify icmp ult/uge (shl %x, C2), C1 iff C1 is power of two -> icmp eq/ne (and %x, (lshr -C1, C2)), 0.
Simplify 'shl' inequality test into 'and' equality test.

This pattern happens in the middle-end while simplifying bitfield access,
Exposed in https://reviews.llvm.org/D63505

https://rise4fun.com/Alive/6uz

Reviewers: lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: spatel, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63675

llvm-svn: 364348
2019-06-25 20:44:52 +00:00
Sanjay Patel fcfa056ceb [InstCombine] reduce checks for power-of-2-or-zero using ctpop
This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero.

This is matching the isPowerOf2() patterns used in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

But there's at least 1 instcombine follow-up needed to match the alternate form:

(v & (v - 1)) == 0;

We should have all of the backend expansions handled with:
rL364319
(x86-specific changes still needed for optimal code based on subtarget)

And the larger patterns to exclude zero as a power-of-2 are joining with this change after:
rL364153 ( D63660 )
rL364246

Differential Revision: https://reviews.llvm.org/D63777

llvm-svn: 364341
2019-06-25 18:51:44 +00:00
Huihui Zhang 4626613ffe [InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier.
Summary:
To generate simplified IR, make sure fold
  (X & ~C) ==/!= 0 --> X u</u>= C+1

is scheduled before fold
  ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.

https://rise4fun.com/Alive/7ZN

Reviewers: lebedev.ri, efriedma, spatel, craig.topper

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63505

llvm-svn: 364255
2019-06-25 00:09:10 +00:00
Sanjay Patel 2675b0c8ab [InstCombine] squash is-not-power-of-2 using ctpop
This is the Demorgan'd 'not' of the pattern handled in:
D63660 / rL364153

This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

We can test if a value is not a power-of-2 using ctpop(X) > 1,
so combining that with an is-zero check of the input is the
same as testing if not exactly 1 bit is set:

(X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1

llvm-svn: 364246
2019-06-24 22:35:26 +00:00
Matt Arsenault 8025842599 InstCombine: Preserve nuw when reassociating nuw ops [3/3]
Alive says this is OK.

llvm-svn: 364235
2019-06-24 21:37:03 +00:00
Matt Arsenault 5d82ecd5d9 InstCombine: Preserve nuw when reassociating nuw ops [2/3]
Alive says this is OK.

llvm-svn: 364234
2019-06-24 21:37:02 +00:00
Matt Arsenault 5a89ba7343 InstCombine: Preserve nuw when reassociating nuw ops [1/3]
Alive says this is OK.

llvm-svn: 364233
2019-06-24 21:36:59 +00:00
Sanjay Patel 89efefb170 [InstCombine] reduce funnel-shift i16 X, X, 8 to bswap X
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.

llvm-svn: 364187
2019-06-24 15:20:49 +00:00
Simon Pilgrim b617b0808d [InstCombine] SliceUpIllegalIntegerPHI - bail on out of range shifts
trunc(lshr) handling - if the shift is out of range (undefined) then bail like we do for non-constant shifts.

Fixes OSS Fuzz #15217

llvm-svn: 364181
2019-06-24 13:13:36 +00:00
Sanjay Patel 13a5ae58fc [InstCombine] squash is-power-of-2 that uses ctpop
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:

(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1

Differential Revision: https://reviews.llvm.org/D63660

llvm-svn: 364153
2019-06-23 14:22:37 +00:00
David Bolvansky dbcdad51ff [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
  =>
%d = shl i32 1, 31
%r = lshr i32 %d, %x

Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/btZm

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63652

llvm-svn: 364073
2019-06-21 16:25:32 +00:00
David Bolvansky 4b28478389 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
Sanjay Patel 273d97e6bf [InstCombine] fix typo in comment; NFC
llvm-svn: 363974
2019-06-20 20:23:32 +00:00
Sanjay Patel 63311bfb83 [InstCombine] canonicalize check for power-of-2
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).

This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314

Alive proof:
https://rise4fun.com/Alive/9kG

  Name: is power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp eq i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp eq i32 %a2, 0

  Name: is not power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp ne i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp ne i32 %a2, 0

llvm-svn: 363956
2019-06-20 17:41:15 +00:00
David Bolvansky 01511192b2 [InstCombine] cttz(-x) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63534

llvm-svn: 363951
2019-06-20 17:04:14 +00:00
Huihui Zhang 670778c762 [InstCombine] Fold icmp eq/ne (and %x, signbit), 0 -> %x s>=/s< 0 earlier
Summary:
To generate simplified IR, make sure fold
```
  (X & signbit) ==/!= 0) -> X s>=/s< 0;
```
is scheduled before fold
```
  ((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.
```

https://rise4fun.com/Alive/fbdh

Reviewers: lebedev.ri, efriedma, spatel, craig.topper

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63026

llvm-svn: 363845
2019-06-19 17:31:39 +00:00
Matt Arsenault 6d741f29ec AMDGPU: Fold readlane/readfirstlane calls
llvm-svn: 363587
2019-06-17 17:52:35 +00:00
Matt Arsenault 492d71cc99 AMDGPU: Fold readlane intrinsics of constants
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.

llvm-svn: 363406
2019-06-14 14:51:26 +00:00
Stanislav Mekhanoshin 68a2fef9ae [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301

llvm-svn: 363339
2019-06-13 23:47:36 +00:00
Sander de Smalen 7957fc6547 [IntrinsicEmitter] Extend argument overloading with forward references.
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.

In for example:

  def int_something : Intrinsic<[LLVMPointerToElt<0>],
                                [llvm_anyvector_ty], []>;

LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:

  declare i32* @llvm.something.v4i32(<4 x i32>);
  declare i64* @llvm.something.v2i64(<2 x i64>);

where the result pointer type is deduced from the element type of the
first argument.

If the returned pointer is not a pointer to the element type, LLVM will
give an error:

  Intrinsic has incorrect return type!
  i64* (<4 x i32>)* @llvm.something.v4i32

Reviewers: RKSimon, arsenm, rnk, greened

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62995

llvm-svn: 363233
2019-06-13 08:19:33 +00:00
Cameron McInally 08200d6d26 [InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZ
Differential Revision: https://reviews.llvm.org/D62612

llvm-svn: 363082
2019-06-11 16:21:21 +00:00
Cameron McInally 796de11331 [InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNeg
Differential Revision: https://reviews.llvm.org/D62629

llvm-svn: 363080
2019-06-11 15:45:41 +00:00
Sanjay Patel 9650c95b7e [InstCombine] allow unordered preds when canonicalizing to fabs()
We have a known-never-nan value via 'nnan', so an unordered predicate
is the same as its ordered sibling.

Similar to:
rL362937

llvm-svn: 362954
2019-06-10 15:39:00 +00:00
Sanjay Patel 85de9634e6 [InstCombine] fix bug in canonicalization to fabs()
Forgot to translate the predicate clauses in rL362943.

llvm-svn: 362945
2019-06-10 14:57:45 +00:00
Sanjay Patel 8b6d9f60ed [InstCombine] change canonicalization to fabs() to use FMF on fsub
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.

I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.

llvm-svn: 362943
2019-06-10 14:46:36 +00:00
Sanjay Patel 8cd8c5784b [InstCombine] allow unordered preds when canonicalizing to fabs()
PR42179:
https://bugs.llvm.org/show_bug.cgi?id=42179

llvm-svn: 362937
2019-06-10 14:14:51 +00:00
Roman Lebedev d669758d84 [InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles
A precondition 'x != 0' was forgotten by me:
https://rise4fun.com/Alive/JFNP
https://rise4fun.com/Alive/jHvL

These 4 folds with non-constants could be re-enabled,
but for now let's go for the simplest solution.

https://bugs.llvm.org/show_bug.cgi?id=42198

llvm-svn: 362911
2019-06-09 16:30:42 +00:00
Sanjay Patel 87cd16a86e [InstCombine] change canonicalization to fabs() to use FMF on fneg
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.

I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.

This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086

llvm-svn: 362909
2019-06-09 16:22:01 +00:00
Sanjay Patel ac111e526d [InstCombine] simplify code for bitcast of insertelement; NFC
llvm-svn: 362655
2019-06-05 21:26:52 +00:00
Tim Northover 8d7f118ab2 InstCombine: correctly change byval type attribute alongside call args.
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.

llvm-svn: 362643
2019-06-05 20:38:17 +00:00
Roman Lebedev 39390d8317 [InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-fold
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.

https://rise4fun.com/Alive/ftR

llvm-svn: 362217
2019-05-31 09:47:16 +00:00
Roman Lebedev 886c4ef35a [InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-fold
https://rise4fun.com/Alive/qJQ

llvm-svn: 362216
2019-05-31 09:47:04 +00:00
Martin Storsjo 0fe645c086 [InstCombine] Avoid use after free in DenseMap, when built with GCC
Previously, this used a statement like this:
    Map[A] = Map[B];

This is equivalent to the following:
    const auto &Src = Map[B];
    auto &Dest = Map[A];
    Dest = Src;

The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.

This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)

With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.

This fixes PR42065.

Differential Revision: https://reviews.llvm.org/D62624

llvm-svn: 362150
2019-05-30 20:53:21 +00:00
Nikita Popov 5382803b04 [InstCombine] Optimize always overflowing signed saturating add/sub
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.

Differential Revision: https://reviews.llvm.org/D62544

llvm-svn: 362006
2019-05-29 18:37:13 +00:00
Nikita Popov c51cdacab9 [InstCombine] Clean up saturing math overflow optimizations; NFC
Reduce duplication and make it easier to handle signed
always-overflows conditions in the future.

llvm-svn: 361863
2019-05-28 18:59:21 +00:00
Nikita Popov 332c100562 [ValueTracking][ConstantRange] Distinguish low/high always overflow
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).

Differential Revision: https://reviews.llvm.org/D62463

llvm-svn: 361858
2019-05-28 18:08:31 +00:00