Summary:
Sometimes we need to swap true-val and false-val of a `SelectInst`.
Having a function for that is nicer than hand-writing it each time.
Reviewers: spatel, RKSimon, craig.topper, jdoerfert
Reviewed By: jdoerfert
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65520
llvm-svn: 367547
Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it
easier to implement the transforms (and possibly other fneg transforms) in
1 place because we can always start the pattern match from fneg (either the
legacy binop or the new unop).
There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR
(although this change may expose analysis holes in the other direction).
1. The instcombine test changes show the expected neutral IR diffs from
reversing the order.
2. The reassociation tests show that we were missing an optimization
opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says
that all of these transforms are allowed (regardless of binop/unop
fneg version) because:
"For all other operations [besides copy/abs/negate/copysign], this
standard does not specify the sign bit of a NaN result."
In all of these transforms, we always have some other binop
(fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a
potential intermediate NaN operand.
(If that interpretation is wrong, then we must already have a bug in
the existing transforms?)
3. The clang tests shouldn't exist as-is, but that's effectively a
revert of rL367149 (the test broke with an extension of the
pre-existing fneg canonicalization in rL367146).
Differential Revision: https://reviews.llvm.org/D65399
llvm-svn: 367447
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367227
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367194
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)
This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716
Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.
Differential Revision: https://reviews.llvm.org/D65305
llvm-svn: 367101
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
This does address the regression being exposed in D63992.
https://godbolt.org/z/7DGltUhttps://rise4fun.com/Alive/EPO0
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]]
Reviewers: spatel, nikic, huihuiz
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63993
llvm-svn: 364792
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).
Differential Revision: https://reviews.llvm.org/D61693
llvm-svn: 361188
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)
I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV
This transform enables the following transform that already exists in
instcombine:
(X | Y) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)
There already exists the transform in the sub case:
(X | Y) - Y --> X & ~Y
However this does not trigger in the case where Y is constant due to an earlier
transform:
X - (-C) --> X + C
With this new add fold, both the add and sub constant cases are handled.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61517
llvm-svn: 360185
// 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow.
This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.
Differential Revision: https://reviews.llvm.org/D60426
llvm-svn: 358005
This is the last step towards solving the examples shown in:
https://bugs.llvm.org/show_bug.cgi?id=14613
With this change, x86 should end up with psubus instructions
when those are available.
All known codegen issues with expanding the saturating intrinsics
were resolved with:
D59006 / rL356855
We also have some early evidence in D58872 that using the intrinsics
will lead to better perf. If some target regresses from this, custom
lowering of the intrinsics (as in the above for x86) may be needed.
llvm-svn: 357012
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.
https://rise4fun.com/Alive/vaiW
Name: sext add nsw
%add = add nsw i8 %i, C0
%ext = sext i8 %add to i32
%r = add i32 %ext, C1
=>
%s = sext i8 %i to i32
%r = add i32 %s, sext(C0)+C1
Name: zext add nuw
%add = add nuw i8 %i, C0
%ext = zext i8 %add to i16
%r = add i16 %ext, C1
=>
%s = zext i8 %i to i16
%r = add i16 %s, zext(C0)+C1
llvm-svn: 355118
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
llvm-svn: 354748
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.
This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52075
llvm-svn: 342335
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
D52075 should be able to use this code too rather than
duplicating all of the logic.
llvm-svn: 342292
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
llvm-svn: 342278
This commit fixes a (gcc 7.3.0) [-Wunused-function] warning caused by the
presence of unused method FaddCombine::createFDiv().
The last use of that method was removed at r339519.
llvm-svn: 340014
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform.
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.
I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a
potentially expensive fdiv or fmul.
Differential Revision: https://reviews.llvm.org/D50417
llvm-svn: 339248
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.
The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.
llvm-svn: 338221
https://rise4fun.com/Alive/jDd
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.
llvm-svn: 338200
This bug was created by rL335258 because we used to always call instsimplify
after trying the associative folds. After that change it became possible
for subsequent folds to encounter unsimplified code (and potentially assert
because of it).
Instead of carrying changed state through instcombine, we can just return
immediately. This allows instsimplify to run, so we can continue assuming
that easy folds have already occurred.
llvm-svn: 336965
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.
Differential Revision: https://reviews.llvm.org/D48535
llvm-svn: 335579
This is outwardly NFC from what I can tell, but it should be more efficient
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).
This should make it easier to refactor duplicated code that runs for all binops.
llvm-svn: 335258
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036
...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831
Alive proofs including nsw/nuw cases that were first noted in:
D46988
https://rise4fun.com/Alive/Kmp
This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.
llvm-svn: 334137
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].
https://godbolt.org/g/VCMNpShttps://rise4fun.com/Alive/idM
When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.
The typical C code looks like:
```
int mask_signed_add(int nbits) {
return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 1, %0
%3 = add nsw i32 %2, -1
ret i32 %3
}
```
But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 -1, %0
%3 = xor i32 %2, -1
ret i32 %3
}
```
Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.
But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47428
llvm-svn: 334127
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.
llvm-svn: 333845
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820