shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.
The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.
This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.
Differential Revision: https://reviews.llvm.org/D71488
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.
The reproducers created from the bugs that caused the old commits to be
reverted are included.
Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans
Subscribers: hiraditya, bollu, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71181
This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
The backend for some targets already manages to get
this if it converts copysign to bitwise logic.
This is correct for any value including NaN/inf.
We don't have this fold directly in the backend either,
but x86 manages to get it after converting things to bitops.
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:
$ cat /tmp/a.c
unsigned char f(unsigned char *p) {
unsigned char result = 0;
for (int shift = 0; shift < 1; ++shift)
result |= p[0] << (shift * 8);
return result;
}
$ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
f: # @f
.cfi_startproc
# %bb.0: # %entry
xorl %eax, %eax
retq
That's nicely optimized, but I don't think it's the right result :-)
> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571
This reverts commit 57dd4b03e4.
The easy code fix won't catch non-canonical mismatched
constant patterns, so adding extra coverage for those in
case we decide that's important (but seems unlikely).
Same as D60846 but with a fix for the problem encountered there which
was a missing context adjustment in the handling of PHI nodes.
The test that caused D60846 to be reverted was added in e15ab8f277.
Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69571
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.
Differential Revision: https://reviews.llvm.org/D67721
llvm-svn: 373455
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67411
llvm-svn: 371718
This was actually the original intention in D67332,
but i messed up and forgot about it.
This patch was originally part of D67411, but precommitting this.
llvm-svn: 371630
I only want to ensure that %offset is non-zero there,
it doesn't matter how that info is conveyed.
As filed in PR43267, the assumption way does not work.
llvm-svn: 371546
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = add i64 %base_int, %offset
%non_null_after_adjustment = icmp ne i64 %adjusted, 0
%no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
%res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:
Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.
Alive proofs:
https://rise4fun.com/Alive/WRzq
There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.
https://bugs.llvm.org/show_bug.cgi?id=43246
Reviewers: spatel, nikic, vsk
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits, reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67332
llvm-svn: 371349
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %retval.0
=>
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %umul.ov.not
Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].
Reviewers: nikic, spatel, xbolva00, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65151
llvm-svn: 370351
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret i1 %retval.0
=>
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret %umul.ov
Done: 1
Optimization is correct!
```
Reviewers: nikic, spatel, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65150
llvm-svn: 370350
Summary:
Make sure that we report that changes has been made
by InstSimplify also in situations when only trivially
dead instructions has been removed. If for example a call
is removed the call graph must be updated.
Bug seem to have been introduced by llvm-svn r367173
(commit 02b9e45a7e), since the code in question
was rewritten in that commit.
Reviewers: spatel, chandlerc, foad
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65973
llvm-svn: 368401
The matchSelectPattern code can match patterns like (x >= 0) ? x : -x
for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x).
If the latter form was matched we can only use the nsw flag if its
set on both subtracts.
This match makes sure we're looking at the former case only.
Differential Revision: https://reviews.llvm.org/D65692
llvm-svn: 368195
computeKnownBits will indicate the sign bit of abs is 0 if the
the RHS operand returned by matchSelectPattern has the nsw flag set.
For abs idioms like (X >= 0) ? X : -X, the RHS returns -X. But
we can also match ((X-Y) >= 0 ? X-Y : Y-X as abs. In this case
RHS will be the Y-X operand. According to Alive, the sign bit for
this is only 0 if both the X-Y and Y-X operands have the nsw flag.
But we're only checking the Y-X operand.
llvm-svn: 367747
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.
This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.
Differential Revision: https://reviews.llvm.org/D65336
llvm-svn: 367173
It would be already handled by the non-inverted case if we were hoisting
the `not` in InstCombine, but we don't (granted, we don't sink it
in this case either), so this is a separate case.
llvm-svn: 366801
Summary:
- As the pointer stripping could trace through `addrspacecast` now, need
to sext/trunc the offset to ensure it has the same width as the
pointer after stripping.
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64768
llvm-svn: 366162