This relands commit cd7f8051ac that was
reverted since lower threshold have successfully found an issue.
Now that the issue is fixed, let's wait until the next one is reported.
This reverts commit caa423eef0.
We can happen to have a situation with many stores eligible for transform,
but due to our visitation order (top to bottom), when we have processed
the first eligible instruction, we would not try to reprocess the previous
instructions that are now also eligible.
So after we've successfully merged a store that was second-to-last instruction
into successor, if the now-second-to-last instruction is also a such store
that is eligible, add it to worklist to be revisited.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46661
Summary:
1000 iteratons is still kinda a lot.
Would it make sense to iteratively lower it, until it becomes `2`,
with some delay inbetween in order to let users actually potentially encounter it?
Reviewers: spatel, nikic, kuhar
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83160
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
As noted in PR46561:
https://bugs.llvm.org/show_bug.cgi?id=46561
...it takes something beyond a minimal IR example to trigger
this bug because it relies on matching non-canonical IR.
There are no tests that show the need for matching this
pattern, so I'm just deleting it to fix the miscompile.
Summary:
The actual transform i was going after was:
https://rise4fun.com/Alive/Tp9H
```
Name: zz
Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0
%t0 = and i8 %x, C0
%r = icmp eq i8 %t0, C1
=>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
Name: zz
Pre: isPowerOf2(C0)
%t0 = and i8 %x, C0
%r = icmp ne i8 %t0, 0
=>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
```
but as it can be seen from the current tests, we already canonicalize most of it,
and we are only missing handling multi-use non-canonical icmp predicates.
If we have both `!=0` and `==0`, even though we can CSE them,
we end up being stuck with them. We should canonicalize to the `==0`.
I believe this is one of the cleanup steps i'll need after `-scalarizer`
if i end up proceeding with my WIP alloca promotion helper pass.
Reviewers: spatel, jdoerfert, nikic
Reviewed By: nikic
Subscribers: zzheng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83139
The use of 'tmp' can trigger warnings from the update_test_checks.py
script. That's evidence of a flaw in the script's logic, but we
can always do better than naming variables 'tmp' in LLVM too.
The phi test file should be updated with auto-generated regex CHECK
lines, so it isn't affected by cosmetic diffs, but I don't have
time to do that right now.
Narrowing an input expression of a truncate to a type larger than the
result of the truncate won't allow removing the truncate, but it may
enable further optimizations, e.g. allowing for larger vectorization
factors.
For now this is intentionally limited to integer types only, to avoid
producing new vector ops that might not be suitable for the target.
If we know that the only user is a trunc, we can also be allow more
cases, e.g. also shortening expressions with some additional shifts.
I would appreciate feedback on the best place to do such a narrowing.
This fixes PR43580.
Reviewers: spatel, RKSimon, lebedev.ri, xbolva00
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D82973
The base case only works because we are relying on a
poison-unsafe select transform; if that is fixed, we
would regress on patterns like this.
The extra use tests show that the select transform can't
be applied consistently. So it may be a regression to have
an extra instruction on 1 test, but that result was not
created safely and does not happen reliably.
Currently canEvaluateTruncated can only attempt to truncate shifts if they are scalar/uniform constant amounts that are in range.
This patch replaces the constant extraction code with KnownBits handling, using the KnownBits::getMaxValue to check that the amounts are inrange.
This enables support for nonuniform constant cases, and also variable shift amounts that have been masked somehow. Annoyingly, this still won't work for vectors with (demanded) undefs as KnownBits returns nothing in those cases, but its a definite improvement on what we currently have.
Differential Revision: https://reviews.llvm.org/D83127
As noted on PR46531, we were only performing this transform on uniform vectors as we were using the m_APInt pattern matcher to extract the shift amount.
Differential Revision: https://reviews.llvm.org/D83035
If the addend of the fma is zero, common sense would suggest that we can
convert fma x, y, 0.0 to fmul x, y. This comes up with some user code
that was expecting the first fma in an unrolled loop to simplify to a
fmul.
Floating point often does not follow naive common sense though. Alive
suggests that this should be guarded by nsz (as fadd -0.0, 0.0 = 0.0).
fma x, y, -0.0 is always valid.
Differential Revision: https://reviews.llvm.org/D82778
Summary:
The advice in HowToUpdateDebugInfo.rst is to "... preserve the debug
location of an instruction if the instruction either remains in its
basic block, or if its basic block is folded into a predecessor that
branches unconditionally".
TryToSinkInstruction doesn't seem to satisfy the criteria as it's
sinking an instruction to some successor block. Preserving the debug loc
can make single-stepping appear to go backwards, or make a breakpoint
hit on that location happen "too late" (since single-stepping from that
breakpoint can cause the function to return unexpectedly).
So, drop the debug location.
This was reverted in ee3620643d because it removed source locations
from inlinable calls, breaking a verifier rule. I've added an exception
for calls because the alternative (setting a line 0 location) is not
better. I tested the updated patch by completing a stage2 RelWithDebInfo
build.
Reviewers: aprantl, davide
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82487
Summary:
The advice in HowToUpdateDebugInfo.rst is to "... preserve the debug
location of an instruction if the instruction either remains in its
basic block, or if its basic block is folded into a predecessor that
branches unconditionally".
TryToSinkInstruction doesn't seem to satisfy the criteria as it's
sinking an instruction to some successor block. Preserving the debug loc
can make single-stepping appear to go backwards, or make a breakpoint
hit on that location happen "too late" (since single-stepping from that
breakpoint can cause the function to return unexpectedly).
So, drop the debug location.
Reviewers: aprantl, davide
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82487
fabs(X) * fabs(Y) --> fabs(X * Y)
fabs(X) / fabs(Y) --> fabs(X / Y)
If both operands of fmul/fdiv are positive, then the result must be positive.
There's a NAN corner-case that prevents removing the more specific fold just
above this one:
fabs(X) * fabs(X) -> X * X
That fold works even with NAN because the sign-bit result of the multiply is
not specified if X is NAN.
We can't remove that and use the more general fold that is proposed here
because once we convert to this:
fabs (X * X)
...it is not legal to simplify the 'fabs' out of that expression when X is NAN.
That's because fabs() guarantees that the sign-bit is always cleared - even
for NAN values.
So this patch has the potential to lose information, but it seems unlikely if
we do the more specific fold ahead of this one.
Differential Revision: https://reviews.llvm.org/D82277
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This patch transforms
```
p = phi [x, y]
s = select cond, z, p
```
with
```
s = phi[x, z]
```
if we can prove that the Phi node takes values basing on select's condition.
Differential Revision: https://reviews.llvm.org/D82072
Reviewed By: nikic
We can sometimes replace a select with a Phi node if all of its values
are available on respective incoming edges.
Differential Revision: https://reviews.llvm.org/D82005
Reviewed By: nikic
This is the integer sibling to D81491.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
Removing the "experimental" from these intrinsics is likely
not too far away.
I don't have any testcases showing it happening,
and i haven't succeeded in creating one,
but i'm also not positive it can't ever happen,
and i recall having something that looked like
that in the very beginning of Negator creation.
But since we now already have a negation cache,
we can now detect such cases practically for free.
Let's do so instead of "relying" on stack overflow :D
It is possible that we can try to negate the same value multiple times.
For example, PHI nodes may happen to have multiple incoming values
(all of which must be the same value) for the same incoming basic block.
It may happen that we try to negate such a PHI node, and succeed,
and that might result in having now-different incoming values..
To avoid that, and in general to reduce the amount of duplicated
work we might be doing, let's introduce a cache where
we'll track results of negating each value.
The added test was previously failing -verify after -instcombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46362
I originally reverted the patch because it was causing performance
issues, but now I think it's just enabling simplify-cfg to do
something that I don't want instead :)
Sorry for the noise.
This reverts commit 3e39760f8e.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.
Differential Revision: https://reviews.llvm.org/D81491
This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271
As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.
The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.
Differential Revision: https://reviews.llvm.org/D81698
Summary:
"X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two)
However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression:
"X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj
This is my first contribution to LLVM so I hope I didn't mess things up
Reviewers: lebedev.ri, spatel
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79369
- Now all SalvageDebugInfo() calls will mark undef if the salvage
attempt fails.
Reviewed by: vsk, Orlando
Differential Revision: https://reviews.llvm.org/D78369