SROA pass processes debug info incorrecly if applied twice.
Specifically, after SROA works first time, instcombine converts dbg.declare
intrinsics into dbg.value. Inlining creates new opportunities for SROA,
so it is called again. This time it does not handle correctly previously
inserted dbg.value intrinsics.
Differential Revision: https://reviews.llvm.org/D64595
llvm-svn: 370906
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})
In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.
This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.
On x86, we're trading something like this:
vmovd %edi, %xmm0
vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
vmovd %xmm0, %eax
For:
movl %edi, %eax
bswapl %eax
Differential Revision: https://reviews.llvm.org/D66965
llvm-svn: 370659
Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.
GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96
Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert
Reviewed By: efriedma
Subscribers: hjl.tools, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65737
llvm-svn: 370593
Summary:
Finally, the fold i was looking forward to :)
The legality check is muddy, i doubt i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j
I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one
I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66383
llvm-svn: 370324
Summary:
`matchThreeWayIntCompare()` looks for
```
select i1 (a == b),
i32 Equal,
i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.
Reviewers: spatel, nikic, efriedma, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66607
llvm-svn: 369841
Summary:
If we have e.g.:
```
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
=>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb
While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.
But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
%dont_need_to_clamp_positive = icmp sle i32 %X, 32767
%dont_need_to_clamp_negative = icmp sge i32 %X, -32768
%clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
%dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
%R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
%1 = icmp slt i32 %X, 32768
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
%1.inv = icmp sgt i32 %X, 32767
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
%t1 = icmp sgt i32 %X, -32768
%t2 = select i1 %t1, i32 %X, i32 -32768
%t3 = icmp slt i32 %t2, 32767
%R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.
Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl
Reviewers: spatel, dmgreen, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66232
llvm-svn: 369840
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :(
Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.
llvm-svn: 369795
This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another.
The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison.
Differential Revision: https://reviews.llvm.org/D66608
llvm-svn: 369789
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.
Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.
Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.
As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.
Differential Revision: https://reviews.llvm.org/D66606
llvm-svn: 369697
An intermediate extend is used to widen the narrow operand to the width of
the other (wider) operand. At that point, we have the same logic as the
existing transform that was restricted to folds of equal width zext/sext.
This mostly solves PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700
llvm-svn: 369519
This reverts commit 5dbb90bfe1.
As noted in the post-commit thread for r367891, this can create
a multiply that is lowered to a libcall that may not exist.
We need to improve the backend decomposition for integer multiply
before trying to re-land this (if it's still worthwhile after
doing the backend work).
llvm-svn: 369174
This pattern may arise more frequently with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755
...but we should handle this pattern to make things easier for the backend either way.
For all in-tree targets that I looked at, codegen for typical vector sizes looks better when we change
to a vector select, so this is safe to do without a cost model (in other words, as a target-independent
canonicalization).
For example, if the condition of the select is a scalar, we end up with something like this on x86:
vpcmpgtd %xmm0, %xmm1, %xmm0
vpextrb $12, %xmm0, %eax
testb $1, %al
jne LBB0_2
## %bb.1:
vmovaps %xmm3, %xmm2
LBB0_2:
vmovaps %xmm2, %xmm0
Rather than the splat-condition variant:
vpcmpgtd %xmm0, %xmm1, %xmm0
vpshufd $255, %xmm0, %xmm0 ## xmm0 = xmm0[3,3,3,3]
vblendvps %xmm0, %xmm2, %xmm3, %xmm0
Differential Revision: https://reviews.llvm.org/D66095
llvm-svn: 369140
Summary:
This is continuation of D63829 / https://bugs.llvm.org/show_bug.cgi?id=42399
I thought naive pattern would solve my issue, but nope, it involved truncation,
thus more folds needed.. This isn't really the fold i'm interested in,
i need trunc-of-lshr, but i'we decided to start with `shl` because it's simpler.
In this case, no extra legality checks are needed:
https://rise4fun.com/Alive/CAb
We should be careful about not increasing instruction count,
since we need to produce `zext` because `and` is done in wider type.
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66057
llvm-svn: 369117
Summary:
Currently we fail to compute known bits for recurrences where the
first incoming value is the start value of the recurrence.
Instead of exiting the loop when the first incoming value is not
the step of the recurrence, continue to check the second incoming
value.
The original code uses a loop to handle both cases, but incorrectly
exits instead of continuing.
Reviewers: lebedev.ri, spatel, nikic
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66216
llvm-svn: 369088
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.
Original patch by arielb1.
Differential Revision: https://reviews.llvm.org/D37215
llvm-svn: 368723
Summary:
Given a pattern like:
```
%old_cmp1 = icmp slt i32 %x, C2
%old_replacement = select i1 %old_cmp1, i32 %target_low, i32 %target_high
%old_x_offseted = add i32 %x, C1
%old_cmp0 = icmp ult i32 %old_x_offseted, C0
%r = select i1 %old_cmp0, i32 %x, i32 %old_replacement
```
it can be rewritten as more canonical pattern:
```
%new_cmp1 = icmp slt i32 %x, -C1
%new_cmp2 = icmp sge i32 %x, C0-C1
%new_clamped_low = select i1 %new_cmp1, i32 %target_low, i32 %x
%r = select i1 %new_cmp2, i32 %target_high, i32 %new_clamped_low
```
Iff `-C1 s<= C2 s<= C0-C1`
Also, `ULT` predicate can also be `UGE`; or `UGT` iff `C0 != -1` (+invert result)
Also, `SLT` predicate can also be `SGE`; or `SGT` iff `C2 != INT_MAX` (+invert result)
If `C1 == 0`, then all 3 instructions must be one-use; else at most either `%old_cmp1` or `%old_x_offseted` can have extra uses.
NOTE: if we could reuse `%old_cmp1` as one of the comparisons we'll have to build, this could be less limiting.
So there are two icmp's, each one with 3 predicate variants, so there are 9 fold variants:
| | ULT | UGE | UGT |
| SLT | https://rise4fun.com/Alive/yIJ | https://rise4fun.com/Alive/5BfN | https://rise4fun.com/Alive/INH |
| SGE | https://rise4fun.com/Alive/hd8 | https://rise4fun.com/Alive/Abk | https://rise4fun.com/Alive/PlzS |
| SGT | https://rise4fun.com/Alive/VYG | https://rise4fun.com/Alive/oMY | https://rise4fun.com/Alive/KrzC |
{F9730206}
This fold was brought up in https://reviews.llvm.org/D65148#1603922 by @dmgreen, and is needed to unblock that patch.
This patch requires D65530.
Reviewers: spatel, nikic, xbolva00, dmgreen
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits, dmgreen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65765
llvm-svn: 368687
Summary:
This is rather unconventional..
As the comment there says, we don't have much folds for xor-of-icmps,
we try to turn them into an and-of-icmps, for which we have plenty of folds.
But if the ICmp we need to invert is not single-use - we give up.
As discussed in https://reviews.llvm.org/D65148#1603922,
we may have a non-canonical CLAMP pattern, with bit match and
select-of-threshold that we'll potentially clamp.
As it can be seen in `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`,
out of all 8 variations of the pattern, only two are **not** canonicalized into
the variant with and+icmp instead of bit math.
The reason is because the ICmp we need to invert is not single-use - we give up.
We indeed can't perform this fold at will, the general rule is that
we should not increase instruction count in InstCombine,
But we wouldn't end up increasing instruction count if we can adapt every other
user to the inverted value. This way the `not` we create **will** get folded,
and in the end the instruction count did not increase.
For that, of course, we need to look at the users of a Value,
which is again rather unconventional for InstCombine :S
Thus i'm proposing to be a little bit more insistive in `foldXorOfICmps()`.
The alternatives would be to not create that `not`, but add duplicate code to
manually invert all users; or to add some even less general combine to handle
some more specific pattern[s].
Reviewers: spatel, nikic, RKSimon, craig.topper
Reviewed By: spatel
Subscribers: hiraditya, jdoerfert, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65530
llvm-svn: 368685
We can't handle 'uge' case because we can't ever get it,
there needs to be extra use on that compare or else it will be
canonicalized, but because of extra use we can't handle it.
'sge' case we can have.
llvm-svn: 368656
Instead of matching value and then blindly casting to BinaryOperator
just to get the opcode, just match instruction and do no cast.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42962
llvm-svn: 368554
If one of the values being shifted is a constant, since the new shift
amount is known-constant, the new shift will end up being constant-folded
so, we don't need that one-use restriction then.
llvm-svn: 368519
That one-use restriction is not needed for correctness - we have already
ensured that one of the shifts will go away, so we know we won't increase
the instruction count. So there is no need for that restriction.
llvm-svn: 368518
Summary:
In SimplifySelectsFeedingBinaryOp, propagate fast math flags from the
outer op into both arms of the new select, to take advantage of
simplifications that require fast math flags.
Reviewers: mcberg2017, majnemer, spatel, arsenm, xbolva00
Subscribers: wdng, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65658
llvm-svn: 368175
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).
I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.
I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.
Differential Revision: https://reviews.llvm.org/D65380
llvm-svn: 368141
This reverts r368059 (git commit 0f95710976)
This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.
llvm-svn: 368084
Summary:
Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles
two shifts one after another. If the shifts are `shl`, we still can
easily perform the fold, with no extra legality checks:
https://rise4fun.com/Alive/OQbM
If we have right-shift however, we won't be able to make it
any simpler than it already is.
After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`)
* If it's a logical shift, then constant-fold to `0` (not `undef`)
* If it's a `ashr`, then a splat of original signbit
https://rise4fun.com/Alive/E1Khttps://rise4fun.com/Alive/i0V
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65380
llvm-svn: 368059
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.
That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.
https://rise4fun.com/Alive/Qzv2
llvm-svn: 367891
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.
We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.
Here's an example:
https://rise4fun.com/Alive/Xplr
%cmp = icmp eq i32 %x, 2147483647
%add = add nsw i32 %x, 1
%sel = select i1 %cmp, i32 -2147483648, i32 %add
=>
%sel = add i32 %x, 1
I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.
Differential Revision: https://reviews.llvm.org/D65576
llvm-svn: 367695
Reverse the canonicalization of fneg relative to fmul/fdiv. That makes it
easier to implement the transforms (and possibly other fneg transforms) in
1 place because we can always start the pattern match from fneg (either the
legacy binop or the new unop).
There's a secondary practical benefit seen in PR21914 and PR42681:
https://bugs.llvm.org/show_bug.cgi?id=21914https://bugs.llvm.org/show_bug.cgi?id=42681
...hoisting fneg rather than sinking seems to play nicer with LICM in IR
(although this change may expose analysis holes in the other direction).
1. The instcombine test changes show the expected neutral IR diffs from
reversing the order.
2. The reassociation tests show that we were missing an optimization
opportunity to fold away fneg-of-fneg. My reading of IEEE-754 says
that all of these transforms are allowed (regardless of binop/unop
fneg version) because:
"For all other operations [besides copy/abs/negate/copysign], this
standard does not specify the sign bit of a NaN result."
In all of these transforms, we always have some other binop
(fadd/fsub/fmul/fdiv), so we are free to flip the sign bit of a
potential intermediate NaN operand.
(If that interpretation is wrong, then we must already have a bug in
the existing transforms?)
3. The clang tests shouldn't exist as-is, but that's effectively a
revert of rL367149 (the test broke with an extension of the
pre-existing fneg canonicalization in rL367146).
Differential Revision: https://reviews.llvm.org/D65399
llvm-svn: 367447
Currently InstCombiner::foldXorOfICmps() bailouts if the
ICMP it wants to invert has extra uses. As it can be seen
in the tests in previous commit, this is super unfortunate,
this is the single pattern that is left non-canonicalized.
We could analyze if we can also invert all the uses if said ICMP
at the same time, thus not bailing out there.
I'm not seeing any nicer alternative.
llvm-svn: 367439
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.
While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.
There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS
To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).
Reviewers: spatel, RKSimon, nikic, xbolva00, craig.topper
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65046
llvm-svn: 367322
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367227
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367194
(Y * (1.0 - Z)) + (X * Z) -->
Y - (Y * Z) + (X * Z) -->
Y + Z * (X - Y)
This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=42716
Factoring eliminates an instruction, so that should be a good canonicalization.
The potential conversion to FMA would be handled by the backend based on target
capabilities.
Differential Revision: https://reviews.llvm.org/D65305
llvm-svn: 367101
This reverts commit bc4a63fd3c, this is a
speculative revert to fix a number of sanitizer bots (like
sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2
compiler crashes, presumably due to a miscompile.
llvm-svn: 367029
trunc (load X) --> load (bitcast X to narrow type)
We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.
Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.
We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.
Differential Revision: https://reviews.llvm.org/D64432
llvm-svn: 367011
We can treat icmp eq X, MIN_UINT as icmp ule X, MIN_UINT and allow
it to merge with icmp ugt X, C. Similar for the other constants.
We can do simliar for icmp ne X, (U)INT_MIN/MAX in foldAndOfICmps. And we already handled UINT_MIN there.
Fixes PR42691.
Differential Revision: https://reviews.llvm.org/D65017
llvm-svn: 366945
icmp ne %x, INT_MIN can be treated similarly to icmp sgt %x, INT_MIN.
icmp ne %x, INT_MAX can be treated similarly to icmp slt %x, INT_MAX.
icmp ne %x, UINT_MAX can be treated similarly to icmp ult %x, UINT_MAX.
We already treat icmp ne %x, 0 similarly to icmp ugt %x, 0
llvm-svn: 366662
If the legality check is `(shiftNbits-maskNbits) s>= 0`,
then we can simplify it to `shiftNbits u>= maskNbits`,
which is easier to check for.
However, currently switching the `dropRedundantMaskingOfLeftShiftInput()`
to `SimplifyICmpInst()` does not catch these cases and regresses
currently-handled cases, so i'll leave it as is for now.
https://rise4fun.com/Alive/25P
llvm-svn: 366564
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:
alive proofs:
f: https://rise4fun.com/Alive/7U3
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64524
llvm-svn: 366540
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
e: https://rise4fun.com/Alive/0FT
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64521
llvm-svn: 366539
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
d: https://rise4fun.com/Alive/I5Y
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64519
llvm-svn: 366538
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
c: https://rise4fun.com/Alive/RgJh
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64517
llvm-svn: 366537
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
b: https://rise4fun.com/Alive/y8M
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64514
llvm-svn: 366536
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
a: https://rise4fun.com/Alive/wi9
Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, huihuiz, xbolva00
Reviewed By: xbolva00
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64512
llvm-svn: 366535
This reverts commit rL365999 / 0f6148df23.
The tests already exist in this file, and the hoped-for transform
(mentioned in D62871) is invalid because of undef as discussed in
D63060.
llvm-svn: 366000
This patch replaces the three almost identical "strip & accumulate"
implementations for constant pointer offsets with a single one,
combining the respective functionalities. The old interfaces are kept
for now.
Differential Revision: https://reviews.llvm.org/D64468
llvm-svn: 365723
alive proofs:
a,b: https://rise4fun.com/Alive/4zsf
c,d,e,f: https://rise4fun.com/Alive/RC49
Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns.
Other than these 6 patterns, i can't think of any other
reasonable variants right now, although i'm sure they exist.
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
llvm-svn: 365641
Summary:
Transform
pow(C,x)
To
exp2(log2(C)*x)
if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already).
log(C) is folded by the compiler and exp2 is much faster to compute than pow.
Reviewers: spatel, efriedma, evandro
Reviewed By: evandro
Subscribers: lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64099
llvm-svn: 365637
I'm not sure if transforming any of these is valid as
a target-independent fold, but we might as well have
a few tests here to confirm or deny our position.
llvm-svn: 365523
Forming the canonical splat shuffle improves analysis and
may allow follow-on transforms (although some possibilities
are missing as shown in the test diffs).
The backend generically turns these patterns into build_vector,
so there should be no codegen regressions. All targets are
expected to be able to lower splats efficiently.
llvm-svn: 365379
We recognize a splat from element 0 in (VectorUtils) llvm::getSplatValue()
and also in ShuffleVectorInst::isZeroEltSplatMask(), so this converts
to that form for better matching.
The backend generically turns these patterns into build_vector,
so there should be no codegen difference.
llvm-svn: 365342
We allow forming a splat (broadcast) shuffle, but we were conservatively limiting
that to cases where all elements of the vector are specified. It should be safe
from a codegen perspective to allow undefined lanes of the vector because the
expansion of a splat shuffle would become the chain of inserts again.
Forming splat shuffles can reduce IR and help enable further IR transforms.
Motivating bugs:
https://bugs.llvm.org/show_bug.cgi?id=42174https://bugs.llvm.org/show_bug.cgi?id=16739
Differential Revision: https://reviews.llvm.org/D63848
llvm-svn: 365147
I was actually wondering if there was some nicer way than m_Value()+cast,
but apparently what i was really "subconsciously" thinking about
was correctness issue.
hasNoUnsignedWrap()/hasNoUnsignedWrap() exist for Instruction,
not for BinaryOperator, so let's just use m_Instruction(),
thus both avoiding a cast, and a crash.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42484,
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15587
llvm-svn: 364915
Use both one bit and signbit shifting to check for one bit merge.
Reviewers: lebedev.ri, spatel, efriedma, craig.topper
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63903
llvm-svn: 364857
Extends the transform from:
rL364341
...to include another (more common?) pattern that tests whether a
value is a power-of-2 (including or excluding zero).
llvm-svn: 364856
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
This does address the regression being exposed in D63992.
https://godbolt.org/z/7DGltUhttps://rise4fun.com/Alive/EPO0
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42459 | PR42459 ]]
Reviewers: spatel, nikic, huihuiz
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63993
llvm-svn: 364792
Summary:
Given pattern:
`icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0`
we should move shifts to the same hand of 'and', i.e. rewrite as
`icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)`
It might be tempting to not restrict this to situations where we know
we'd fold two shifts together, but i'm not sure what rules should there be
to avoid endless combine loops.
We pick the same shift that was originally used to shift the variable we picked to shift:
https://rise4fun.com/Alive/6x1v
Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]].
Reviewers: spatel, nikic, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63829
llvm-svn: 364791
As discussed in https://reviews.llvm.org/D63829
*if* *both* shifts are one-use, we'd most likely want to produce `lshr`,
and not rely on ordering.
Also, there should likely be a *separate* fold to do this reordering.
llvm-svn: 364772
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
https://bugs.llvm.org/show_bug.cgi?id=42459https://rise4fun.com/Alive/EPO0
llvm-svn: 364749
This is the opposite direction of D62158 (we have to choose 1 form or the other).
Now that we have FMF on the select, this becomes more palatable. And the benefits
of having a single IR instruction for this operation (less chances of missing folds
based on extra uses, etc) overcome my previous comments about the potential advantage
of larger pattern matching/analysis.
Differential Revision: https://reviews.llvm.org/D62414
llvm-svn: 364721
This transform came up in D62414, but we should deal with it first.
We have LLVM intrinsics that correspond exactly to libm calls (unlike
most libm calls, these libm calls never set errno).
This holds without any fast-math-flags, so we should always canonicalize
to those intrinsics directly for better optimization.
Currently, we convert to fcmp+select only when we have FMF (nnan) because
fcmp+select does not preserve the semantics of the call in the general case.
Differential Revision: https://reviews.llvm.org/D63214
llvm-svn: 364714
For fold
(X & (signbit l>> Y)) ==/!= 0 -> (X << Y) >=/< 0
(X & (signbit << Y)) ==/!= 0 -> (X l>> Y) >=/< 0
Test cases of X being constant are positive tests not negative.
Prep work for D62818.
llvm-svn: 364497
I don't think there was anything going wrong here,
but the auto-generating CHECK line script is known
to have problems with 'TMP' because it uses that
to match nameless values.
This is a retry of rL364452.
llvm-svn: 364477
I don't think there was anything going wrong here,
but the auto-generating CHECK line script is known
to have problems with 'TMP' because it uses that
to match nameless values.
llvm-svn: 364452
This follows up the transform from rL363956 to use the ctpop intrinsic when checking for power-of-2-or-zero.
This is matching the isPowerOf2() patterns used in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
But there's at least 1 instcombine follow-up needed to match the alternate form:
(v & (v - 1)) == 0;
We should have all of the backend expansions handled with:
rL364319
(x86-specific changes still needed for optimal code based on subtarget)
And the larger patterns to exclude zero as a power-of-2 are joining with this change after:
rL364153 ( D63660 )
rL364246
Differential Revision: https://reviews.llvm.org/D63777
llvm-svn: 364341
This is the Demorgan'd 'not' of the pattern handled in:
D63660 / rL364153
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is not a power-of-2 using ctpop(X) > 1,
so combining that with an is-zero check of the input is the
same as testing if not exactly 1 bit is set:
(X == 0) || (ctpop(X) u> 1) --> ctpop(X) != 1
llvm-svn: 364246
Prefer the more exact intrinsic to remove a use of the input value
and possibly make further transforms easier (we will still need
to match patterns with funnel-shift of wider types as pieces of
bswap, especially if we want to canonicalize to funnel-shift with
constant shift amount). Discussed in D46760.
llvm-svn: 364187
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:
(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1
Differential Revision: https://reviews.llvm.org/D63660
llvm-svn: 364153
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).
This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314
Alive proof:
https://rise4fun.com/Alive/9kG
Name: is power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp eq i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp eq i32 %a2, 0
Name: is not power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp ne i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp ne i32 %a2, 0
llvm-svn: 363956
Should also be marked writeonly, but I think that would require
splitting the version with done set to a separate intrinsic
Test change is only from renumbering the attribute group numbers,
which for some reason the generated check lines consider.
llvm-svn: 363560
Fix folds of addo and subo with an undef operand to be:
`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
as per LLVM undef rules.
Same for commuted variants.
Based on the original version of the patch by @nikic.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]
Differential Revision: https://reviews.llvm.org/D63065
llvm-svn: 363522
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
llvm-svn: 363406
Similar to rL362909:
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fsub because they have the
same operand.
llvm-svn: 362943
The `icmp sgt`/`icmp sle` variants are, too, miscompiles:
https://rise4fun.com/Alive/JFNPhttps://rise4fun.com/Alive/jHvL
A precondition 'x != 0' was forgotten by me.
While ensuring test coverage for `-1`, also add test coverage
for `0` mask. Mask `0` is allowed for all the folds,
mask `-1` is allowed for all the folds with unsigned `icmp` pred.
Constant mask `0` is missed though.
https://bugs.llvm.org/show_bug.cgi?id=42198
llvm-svn: 362910
This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.
This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
llvm-svn: 362909
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.
llvm-svn: 362643
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.
rdar://50797197
Differential revision: https://reviews.llvm.org/D62358
llvm-svn: 362272
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.
Differential Revision: https://reviews.llvm.org/D62544
llvm-svn: 362006
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
llvm-svn: 361559
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
llvm-svn: 361533
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
This is a minimal start to correcting a problem most directly discussed in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
We have been hacking around a limitation for FP select patterns by using the
fast-math-flags on the condition of the select rather than the select itself.
This patch just allows FMF to appear with the 'select' opcode. No changes are
needed to "FPMathOperator" because it already includes select-of-FP because
that definition is based on the (return) value type.
Once we have this ability, we can start correcting and adding IR transforms
to use the FMF on a 'select' instruction. The instcombine and vectorizer test
diffs only show that the IRBuilder change is behaving as expected by applying
an FMF guard value to 'select'.
For reference:
rL241901 - allowed FMF with fcmp
rL255555 - allowed FMF with FP calls
Differential Revision: https://reviews.llvm.org/D61917
llvm-svn: 361401
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.
Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rAhttps://bugs.llvm.org/show_bug.cgi?id=16739
And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.
Differential Revision: https://reviews.llvm.org/D62024
llvm-svn: 361338
As discussed in D62024, we want to limit any potential IR
transforms of shuffles to cases where we know the SDAG
conversion would result in equivalent patterns for these
IR variants.
llvm-svn: 361317
This reverts commit rr360902. It caused an assertion failure in
lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <=
FragmentSizeInBits) && "new fragment outside of original fragment"'
failed.
PR41931.
llvm-svn: 361246
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).
Differential Revision: https://reviews.llvm.org/D61693
llvm-svn: 361188
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.
Reviewers: spatel, craig.topper, efriedma, majnemer
Reviewed By: spatel
Subscribers: llvm-commits, craig.topper
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61938
llvm-svn: 361043
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop casts
impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is
now possible, and so can be used to fix the attached bug as well as any cases
where SExt instruction results are lost in the debugging metadata. This patch
introduces this fix by expanding the salvage debug info method to cover these
cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360902
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. With the recent addition of DW_OP_LLVM_convert this
salvaging is now possible, and so can be used to fix the attached bug as
well as any cases where SExt instruction results are lost in the
debugging metadata. This patch introduces this fix by expanding the
salvage debug info method to cover these cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360772
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.
Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):
%shamt = zext i8 %i to i32
%m = and i32 %shamt, 31
%neg = sub i32 0, %shamt
%and4 = and i32 %neg, 31
%shl = shl i32 %v, %m
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
=>
%a = and i8 %i, 31
%shamt2 = zext i8 %a to i32
%neg2 = sub i32 0, %shamt2
%and4 = and i32 %neg2, 31
%shl = shl i32 %v, %shamt2
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
https://rise4fun.com/Alive/V9r
llvm-svn: 360605
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)
I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV
This transform enables the following transform that already exists in
instcombine:
(X | Y) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)
There already exists the transform in the sub case:
(X | Y) - Y --> X & ~Y
However this does not trigger in the case where Y is constant due to an earlier
transform:
X - (-C) --> X + C
With this new add fold, both the add and sub constant cases are handled.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61517
llvm-svn: 360185
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.
llvm-svn: 360180
We don't always get this:
Cond ? -X : -Y --> -(Cond ? X : Y)
...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.
Differential Revision: https://reviews.llvm.org/D61604
llvm-svn: 360075
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.
This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.
llvm-svn: 359633
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.
Differential Revision: https://reviews.llvm.org/D59703
llvm-svn: 359000
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.
The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.
Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.
Differential Revision: https://reviews.llvm.org/D60659
llvm-svn: 358919
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.
In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.
Patch by Shawn Landden.
Differential Revision: https://reviews.llvm.org/D60660
llvm-svn: 358515
Zexts can be treated like no-op casts when it comes to assessing whether their
removal affects debug info.
Reviewer: aprantl
Differential Revision: https://reviews.llvm.org/D60641
llvm-svn: 358431
Summary:
Enable some of the existing size optimizations for cold code under PGO.
A ~5% code size saving in big internal app under PGO.
The way it gets BFI/PSI is discussed in the RFC thread
http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html
Note it doesn't currently touch loop passes.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59514
llvm-svn: 358422
Three related changes:
1) auto-gen several test files
2) Add the new tests at the bottom of said files
3) Adjust a couple of other test files not to use stores to constants when trying to test constexpr address handling
llvm-svn: 358344
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).
The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself. If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address. The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.
llvm-svn: 358299
If the ObjectSizeOffsetEvaluator fails to fold the object size call, then it may
litter some unused instructions in the function. When done repeatably in
InstCombine, this results in an infinite loop. Fix this by tracking the set of
instructions that were inserted, then removing them on failure.
rdar://49172227
Differential revision: https://reviews.llvm.org/D60298
llvm-svn: 358146
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.
Differential Revision: https://reviews.llvm.org/D60518
llvm-svn: 358100
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D60061
llvm-svn: 358099
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).
Differential Revision: https://reviews.llvm.org/D60483
llvm-svn: 358052
The uadd and umul cases are currently handled, the usub, sadd, ssub
and smul cases are not. usub, sadd and ssub already have the
necessary ValueTracking support, smul doesn't.
llvm-svn: 358031
This reverts commit 1383a91689.
sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.
llvm-svn: 358026
This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.
Differential Revision: https://reviews.llvm.org/D60469
llvm-svn: 358020
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).
This (finally...) covers the motivating case from D59071.
Differential Revision: https://reviews.llvm.org/D60420
llvm-svn: 358014
Similar to:
rL358005
Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.
llvm-svn: 358013
// 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow.
This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.
Differential Revision: https://reviews.llvm.org/D60426
llvm-svn: 358005
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)
(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)
llvm-svn: 357943