Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = sub i64 %base_int, %offset
call void @use64(i64 %adjusted)
%not_null = icmp ne i64 %adjusted, 0
%no_underflow = icmp ule i64 %adjusted, %base_int
%no_underflow_and_not_null = and i1 %not_null, %no_underflow
ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`
Alive proofs:
https://rise4fun.com/Alive/QOs
The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00, majnemer
Reviewed By: xbolva00, majnemer
Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67356
llvm-svn: 372341
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67412
llvm-svn: 372257
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698
But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
rL371940 and rL371981 are related patches in this series.
llvm-svn: 372007
This fold and several others were added in:
rL125734 <https://reviews.llvm.org/rL125734>
...with no explanation for the one-use checks other than the code
comments about register pressure.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
rL371940 is a related patch in this series.
llvm-svn: 371981
This blob was written before match() existed, so it
could probably be reduced significantly.
But I suspect it isn't well tested, so tests would have
to be added to reduce risk from logic changes.
llvm-svn: 371978
This fold and several others were added in:
rL125734
...with no explanation for the one-use checks other than the code
comments about register pressure.
Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.
There are similar checks as noted with the TODO comments. I'm
hoping to remove those restrictions too, but if any of these
does cause a regression, it should be easier to correct by making
small, individual commits.
This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.
llvm-svn: 371940
(srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking
off the sign bit and the module (low) bits:
https://rise4fun.com/Alive/jSO
A '2' divisor allows slightly more folding:
https://rise4fun.com/Alive/tDBM
Any chance to remove an 'srem' use is probably worthwhile, but this is limited
to the one-use improvement case because doing more may expose other missing
folds. That means it does nothing for PR21929 yet:
https://bugs.llvm.org/show_bug.cgi?id=21929
Differential Revision: https://reviews.llvm.org/D67334
llvm-svn: 371610
TryToSinkInstruction() has a bug: While updating debug info for
sunk instruction, it could clone dbg.declare intrinsic.
That is wrong. There could be only one dbg.declare.
The fix is to not clone dbg.declare intrinsic and to update
it`s arguments, to not to point to sunk instruction.
Differential Revision: https://reviews.llvm.org/D67217
llvm-svn: 371587
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.
Reviewers: spatel, anemet, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D67351
llvm-svn: 371518
This is similar to the existing fold for splats added with:
rL365379
If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.
All targets are expected to lower shuffles with identity masks
efficiently.
llvm-svn: 371340
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.
A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
so since the `icmp` here no longer refers to `sub`,
reconstructing `usub.with.overflow` will be problematic,
and will likely require standalone pass (similar to DivRemPairs).
https://rise4fun.com/Alive/Zqs
Name: (A - B) u> A --> B u> A
%t0 = sub i8 %A, %B
%r = icmp ugt i8 %t0, %A
=>
%r = icmp ugt i8 %B, %A
Name: (A - B) u<= A --> B u<= A
%t0 = sub i8 %A, %B
%r = icmp ule i8 %t0, %A
=>
%r = icmp ule i8 %B, %A
Name: C u< (C - D) --> C u< D
%t0 = sub i8 %C, %D
%r = icmp ult i8 %C, %t0
=>
%r = icmp ult i8 %C, %D
Name: C u>= (C - D) --> C u>= D
%t0 = sub i8 %C, %D
%r = icmp uge i8 %C, %t0
=>
%r = icmp uge i8 %C, %D
llvm-svn: 371101
bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8})
In PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146
...we have a more complicated case where SLP is making a mess of bswap. This patch won't
do anything for that currently, but we need to improve bswap recognition in instcombine,
SLP, and/or a standalone pass to avoid that problem.
This is limited using the data-layout so we don't try to do this transform with actual
vector types. The backend does not appear to have folds to convert in either direction,
so we don't want to mess up something that is actually better lowered as a shuffle.
On x86, we're trading something like this:
vmovd %edi, %xmm0
vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u]
vmovd %xmm0, %eax
For:
movl %edi, %eax
bswapl %eax
Differential Revision: https://reviews.llvm.org/D66965
llvm-svn: 370659
Summary:
Finally, the fold i was looking forward to :)
The legality check is muddy, i doubt i've groked the full generalization,
but it handles all the cases i care about, and can come up with:
https://rise4fun.com/Alive/26j
I.e. we can perform the fold if **any** of the following is true:
* The shift amount is either zero or one less than widest bitwidth
* Either of the values being shifted has at most lowest bit set
* The value that is being shifted by `shl` (which is not truncated) should have no less leading zeros than the total shift amount;
* The value that is being shifted by `lshr` (which **is** truncated) should have no less leading zeros than the widest bit width minus total shift amount minus one
I strongly suspect there is some better generalization, but i'm not aware of it as of right now.
For now i also avoided using actual `computeKnownBits()`, but restricted it to constants.
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66383
llvm-svn: 370324
Always true/false checks were flagged by static analysis;
https://bugs.llvm.org/show_bug.cgi?id=43143
I have not confirmed the logic difference in propagating nsw vs. nuw,
but presumably we would have noticed a bug by now if that was wrong.
llvm-svn: 370248
Promoting it from InstCombine's tryToReuseConstantFromSelectInComparison().
Return true if this constant and a constant 'Y' are element-wise equal.
This is identical to just comparing the pointers, with the exception that
for vectors, if only one of the constants has an `undef` element in some
lane, the constants still match.
llvm-svn: 369842
Summary:
`matchThreeWayIntCompare()` looks for
```
select i1 (a == b),
i32 Equal,
i32 (select i1 (a < b), i32 Less, i32 Greater)
```
but both of these selects/compares can be in it's commuted form,
so out of 8 variants, only the two most basic ones is handled.
This fixes regression being introduced in D66232.
Reviewers: spatel, nikic, efriedma, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66607
llvm-svn: 369841
Summary:
If we have e.g.:
```
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
```
the constants `65535` and `65536` are suspiciously close.
We could perform a transformation to deduplicate them:
```
Name: ult
%t = icmp ult i32 %x, 65536
%r = select i1 %t, i32 %y, i32 65535
=>
%t.inv = icmp ugt i32 %x, 65535
%r = select i1 %t.inv, i32 65535, i32 %y
```
https://rise4fun.com/Alive/avb
While this may seem esoteric, this should certainly be good for vectors
(less constant pool usage) and for opt-for-size - need to have only one constant.
But the real fun part here is that it allows further transformation,
in particular it finishes cleaning up the `clamp` folding,
see e.g. `canonicalize-clamp-with-select-of-constant-threshold-pattern.ll`.
We start with e.g.
```
%dont_need_to_clamp_positive = icmp sle i32 %X, 32767
%dont_need_to_clamp_negative = icmp sge i32 %X, -32768
%clamp_limit = select i1 %dont_need_to_clamp_positive, i32 -32768, i32 32767
%dont_need_to_clamp = and i1 %dont_need_to_clamp_positive, %dont_need_to_clamp_negative
%R = select i1 %dont_need_to_clamp, i32 %X, i32 %clamp_limit
```
without this patch we currently produce
```
%1 = icmp slt i32 %X, 32768
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1, i32 %3, i32 32767
```
which isn't really a `clamp` - both comparisons are performed on the original value,
this patch changes it into
```
%1.inv = icmp sgt i32 %X, 32767
%2 = icmp sgt i32 %X, -32768
%3 = select i1 %2, i32 %X, i32 -32768
%R = select i1 %1.inv, i32 32767, i32 %3
```
and then the magic happens! Some further transform finishes polishing it and we finally get:
```
%t1 = icmp sgt i32 %X, -32768
%t2 = select i1 %t1, i32 %X, i32 -32768
%t3 = icmp slt i32 %t2, 32767
%R = select i1 %t3, i32 %t2, i32 32767
```
which is beautiful and just what we want.
Proofs for `getFlippedStrictnessPredicateAndConstant()` for de-canonicalization:
https://rise4fun.com/Alive/THl
Proofs for the fold itself: https://rise4fun.com/Alive/THl
Reviewers: spatel, dmgreen, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66232
llvm-svn: 369840
Started implementing the vector case and realized the scalar case hadn't handled the GEP producing a different type than the base correctly. It's entertaining seeing what slips through review when we're focused on the 'hard' parts. :(
Also adding an extra vector test as it happened to be in workspace and wasn't worth separating.
llvm-svn: 369795
This generalizes the isGEPKnownNonNull rule from ValueTracking to apply when we do not know if the base is non-null, and thus need to replace one condition with another.
The core notion is that since an inbounds GEP can only form null if the base pointer is null and the offset is zero. However, if the offset is non-zero, the the "inbounds" marker makes the result poison. Thus, we're free to ignore the case where the offset is non-zero. Similarly, there's no case under which a non-null base can result in a null result without generating poison.
Differential Revision: https://reviews.llvm.org/D66608
llvm-svn: 369789
An intermediate extend is used to widen the narrow operand to the width of
the other (wider) operand. At that point, we have the same logic as the
existing transform that was restricted to folds of equal width zext/sext.
This mostly solves PR42700:
https://bugs.llvm.org/show_bug.cgi?id=42700
llvm-svn: 369519