This is fix for the crash caused by ScalarEvolution::getTruncateExpr.
It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.
However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.
So we must always check whether SCEV is in cache and do not insert item if it is already there.
Reviewers: sanjoy, mkazantsev, craig.topper
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380
llvm-svn: 321472
In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate
by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke
`computeShiftCompareExitLimit` with Values from which the SCEVs have been derived,
these Values have not been modified while `Cond` could be.
One of possible outcomes of this is that we may falsely prove that an infinite loop ends
within some finite number of iterations.
In this patch, we save the original `Cond` and pass it along with original operands.
This logic may be removed in future once `computeShiftCompareExitLimit` works
with SCEVs instead of value operands.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40953
llvm-svn: 320142
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.
It can lead, for exaple, to attempt of incorrect folding. Consider:
AR1 = {a,+,b}<L1>
AR2 = {c,+,d}<L2>
EXAR2 = sext(AR1)
MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.
Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.
Differential Revision: https://reviews.llvm.org/D39453
llvm-svn: 318819
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").
However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.
[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.
llvm-svn: 316615
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 316054
Summary:
This patch teaches SCEV to calculate the maxBECount when the end bound
of the loop can vary. Note that we cannot calculate the exactBECount.
This will only be done when both conditions are satisfied:
1. the loop termination condition is strictly LT.
2. the IV is proven to not overflow.
This provides more information to users of SCEV and can be used to
improve identification of finite loops.
Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38825
llvm-svn: 315683
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented
with minimal effort using that relation:
%r --> (-%b * (%t /u %b)) + %t
We implement two special cases:
- if %b is 1, the result is always 0
- if %b is a power-of-two, we produce a zext/trunc based expression instead
That is, the following code:
%r = urem i32 %t, 65536
Produces:
%r --> (zext i16 (trunc i32 %a to i16) to i32)
Note that while this helps get a tighter bound on the range analysis and the
known-bits analysis, this exposes some normalization shortcoming of SCEVs:
%div = udim i32 %a, 65536
%mul = mul i32 %div, 65536
%rem = urem i32 %a, 65536
%add = add i32 %mul, %rem
Will usually not be reduced.
llvm-svn: 312329
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.
Differential Revision: https://reviews.llvm.org/D35256
llvm-svn: 310117
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
Differential Revision: https://reviews.llvm.org/D36087
llvm-svn: 309925
This reverts commit r309080. The patch needs to clear out the
ScalarEvolution::ExitLimits cache in forgetMemoizedResults.
I've replied on the commit thread for the patch with more details.
llvm-svn: 309357
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
Differential Revision: https://reviews.llvm.org/D35827
llvm-svn: 309080
When SCEV calculates product of two SCEVAddRecs from the same loop, it
tries to combine them into one big AddRecExpr. If the sizes of the initial
SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every
operand of the resulting SCEV is combined from operands of initial SCEV and
has much higher complexity than they have.
As result, if we try to calculate something like:
%x1 = {a,+,b}
%x2 = mul i32 %x1, %x1
%x3 = mul i32 %x2, %x1
%x4 = mul i32 %x3, %x2
...
The size of such SCEVs grows as `2^N`, and the arguments
become more and more complex as we go forth. This leads
to long compilation and huge memory consumption.
This patch sets a limit after which we don't try to combine two
`SCEVAddRecExpr`s into one. By default, max allowed size of the
resulting AddRecExpr is set to 16.
Differential Revision: https://reviews.llvm.org/D35664
llvm-svn: 308847
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.
Also added a test case that exercises this situation.
Differentian Revision: https://reviews.llvm.org/D35107
llvm-svn: 307477
It seems that the patch was reverted by mistake. Clang testing showed failure of the
MathExtras.SaturatingMultiply test, however I was unable to reproduce the issue on the
fresh code base and was able to confirm that the transformation introduced by the change
does not happen in the said test. This gives a strong confidence that the actual reason of
the failure of the initial patch was somewhere else, and that problem now seems to be
fixed. Re-submitting the change to confirm that.
llvm-svn: 307244
This patch seems to cause failures of test MathExtras.SaturatingMultiply on
multiple buildbots. Reverting until the reason of that is clarified.
Differential Revision: https://reviews.llvm.org/rL307126
llvm-svn: 307135
-If there is a IndVar which is known to be non-negative, and there is a value which is also non-negative,
then signed and unsigned comparisons between them produce the same result. Both of those can be
seen in the same loop. To allow other optimizations to simplify them, we turn all instructions like
%c = icmp slt i32 %iv, %b
to
%c = icmp ult i32 %iv, %b
if both %iv and %b are known to be non-negative.
Differential Revision: https://reviews.llvm.org/D34979
llvm-svn: 307126
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.
In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.
Differential Revision: https://reviews.llvm.org/D34273
llvm-svn: 306785
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to:
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be
implemented with minimal effort this way.
Note: While SRem and SDiv are also related this way, SCEV does not
provides SDiv yet.
llvm-svn: 306695
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.
We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain
getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr
can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.
This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.
Differential Revision: https://reviews.llvm.org/D33984
llvm-svn: 305463
The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on
many non-X86 configs with this patch. The reason of this is that the patch
makes a correctless fix that changes optimizer's behavior for this test.
Without the change, LSR was making an overconfident simplification basing on a
wrong SCEV. Apparently it did not need the IV analysis to do this. With the
change, it chose a different way to simplify (that wasn't so confident), and
this way required the IV analysis. Now, following the right execution path,
LSR tries to make a transformation relying on IV Users analysis. This analysis
is target-dependent due to this code:
// LSR is not APInt clean, do not touch integers bigger than 64-bits.
// Also avoid creating IVs of non-native types. For example, we don't want a
// 64-bit IV in 32-bit code just because the loop has one 64-bit cast.
uint64_t Width = SE->getTypeSizeInBits(I->getType());
if (Width > 64 || !DL.isLegalInteger(Width))
return false;
To make a proper transformation in this test case, the type i32 needs to be
legal for the specified data layout. When the test runs on some non-X86
configuration (e.g. pure ARM 64), opt gets confused by the specified target
and does not use it, rejecting the specified data layout as well. Instead,
it uses some default layout that does not treat i32 as a legal type
(currently the layout that is used when it is not specified does not have
legal types at all). As result, the transformation we expect to happen does
not happen for this test.
This re-enabling patch does not have any source code changes compared to the
original patch rL303730. The only difference is that the failing test is
moved to X86 directory and now has requirement of running on x86 only to comply
with the specified target triple and data layout.
Differential Revision: https://reviews.llvm.org/D33543
llvm-svn: 303971
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.
Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:
for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.
This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.
llvm-svn: 303730
This is a re-application of a r303497 that was reverted in r303498.
I thought it had broken a bot when it had not (the breakage did not
go away with the revert).
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303531
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303497
The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs
by loop depth, but does not pay attention to dominance of loops. This can
lead us to the following buggy situation:
for (...) { // loop1
op1 = {A,+,B}
}
for (...) { // loop2
op2 = {A,+,B}
S = add op1, op2
}
In this case there is no guarantee that in operand list of S the op2 comes
before op1 (loop depth is the same, so they will be sorted just
lexicographically), so we can incorrectly treat S as a recurrence of loop1,
which is wrong.
This patch changes the sorting logic so that it places the dominated recs
before the dominating recs. This ensures that when we pick the first recurrency
in the operands order, it will be the bottom-most in terms of domination tree.
The attached test set includes some tests that produce incorrect SCEV
estimations and crashes with oldlogic.
Reviewers: sanjoy, reames, apilipenko, anna
Reviewed By: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33121
llvm-svn: 303148
Summary:
The existing implementation creates a symbolic SCEV expression every
time we analyze a phi node and then has to remove it, when the analysis
is finished. This is very expensive, and in most of the cases it's also
unnecessary. According to the data I collected, ~60-70% of analyzed phi
nodes (measured on SPEC) have the following form:
PN = phi(Start, OP(Self, Constant))
Handling such cases separately significantly speeds this up.
Reviewers: sanjoy, pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32663
llvm-svn: 302096
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.
llvm-svn: 300928
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.
Differential Revision: https://reviews.llvm.org/D32239
llvm-svn: 300746
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
Reviewers: reames, apilipenko, anna, sanjoy
Reviewed By: sanjoy
Subscribers: mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31238
llvm-svn: 299205
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
llvm-svn: 298690
Given below case:
%y = shl %x, n
%z = ashr %y, m
when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle
the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV
expression.
llvm-svn: 298631
This patch allows SCEV predicate analysis to prove implication of some expression predicates
from context predicates related to arguments of those expressions.
It introduces three new rules:
For addition:
(A >X && B >= 0) || (B >= 0 && A > X) ===> (A + B) > X.
For division:
(A > X) && (0 < B <= X + 1) ===> (A / B > 0).
(A > X) && (-B <= X < 0) ===> (A / B >= 0).
Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0".
They can also be combined with the same context, to prove more complex expressions like
"if X > 1 then X/2 + 1 > 1".
Diffirential Revision: https://reviews.llvm.org/D30887
Reviewed by: sanjoy
llvm-svn: 298481
If loop bound containing calculations like min(a,b), the Scalar
Evolution API getSmallConstantTripMultiple returns 4294967295 "-1"
as the trip multiple. The problem is that, SCEV use -1 * umax to
represent umin. The multiple constant -1 was returned, and the logic
of guarding against huge trip counts was skipped. Because -1 has 32
active bits.
The fix attempt to factor more general cases. First try to get the
greatest power of two divisor of trip count expression. In case
overflow happens, the trip count expression is still divisible by the
greatest power of two divisor returned. Returns 1 if not divisible by 2.
Patch by Huihui Zhang <huihuiz@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D30840
llvm-svn: 298301
Summary:
This approach has two major advantages over the existing one:
1. We don't need to extend bitwidth in our computations. Extending
bitwidth is a big issue for compile time as we often end up working with
APInts wider than 64bit, which is a slow case for APInt.
2. When we zero extend a wrapped range, we lose some information (we
replace the range with [0, 1 << src bit width)). Thus, avoiding such
extensions better preserves information.
Correctness testing:
I ran 'ninja check' with assertions that the new implementation of
getRangeForAffineAR gives the same results as the old one (this
functionality is not present in this patch). There were several failures
- I inspected them manually and found out that they all are caused by
the fact that we're returning more accurate results now (see bullet (2)
above).
Without such assertions 'ninja check' works just fine, as well as
SPEC2006.
Compile time testing:
CTMark/Os:
- mafft/pairlocalalign -16.98%
- tramp3d-v4/tramp3d-v4 -12.72%
- lencod/lencod -11.51%
- Bullet/bullet -4.36%
- ClamAV/clamscan -3.66%
- 7zip/7zip-benchmark -3.19%
- sqlite3/sqlite3 -2.95%
- SPASS/SPASS -2.74%
- Average -5.81%
Performance testing:
The changes are expected to be neutral for runtime performance.
Reviewers: sanjoy, atrick, pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30477
llvm-svn: 297992
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress). However, the actual content of PR31181 is not relevant
here.
This change makes poison propagation more aggressive in the following
cases:
1. poision * Val == poison, for any Val. In particular, this changes
existing intentional and documented behavior in these two cases:
a. Val is 0
b. Val is 2^k * N
2. poison << Val == poison, for any Val
3. getelementptr is poison if any input is poison
I think all of these are justified (and are axiomatically true in the
new poison / undef model):
1a: we need poison * 0 to be poison to allow transforms like these:
A * (B + C) ==> A * B + A * C
If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS
poison * (1 + -1) = poison * 0 = 0
and the RHS
poison * 1 + poison * -1 = poison + poison = poison
1b: we need e.g. poison * 4 to be poison since we want to allow
A * 4 ==> A + A + A + A
If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.
2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:
A << 1 ==> A + A
Reviewers: majnemer, efriedma
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30185
llvm-svn: 295809
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.
The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.
Differential Revision: https://reviews.llvm.org/D28884
llvm-svn: 293576
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28812
llvm-svn: 293176
bots ever since d0k fixed the CHECK lines so that it did something at
all.
It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.
llvm-svn: 292774
To avoid regressions, make ScalarEvolution::createSCEV a bit more
clever.
Also get rid of some useless code in ScalarEvolution::howFarToZero
which was hiding this bug.
No new testcase because it's impossible to actually expose this bug:
we don't have any in-tree users of getUDivExactExpr besides the two
functions I just mentioned, and they both dodged the problem. I'll
try to add some interesting users in a followup.
Differential Revision: https://reviews.llvm.org/D28587
llvm-svn: 292449
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
llvm-svn: 292060
mark it as never invalidated in the new PM.
The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.
This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.
llvm-svn: 292039
Refines max backedge-taken count if a loop like
"for (int i = 0; i != n; ++i) { /* body */ }" is rotated.
Differential Revision: https://reviews.llvm.org/D28536
llvm-svn: 291704
This is both easier to understand, and produces a tighter bound in certain
cases.
Differential Revision: https://reviews.llvm.org/D28393
llvm-svn: 291701
invalid.
This fixes use-after-free bugs that will arise with any interesting use
of SCEV.
I've added a dedicated test that works diligently to trigger these kinds
of bugs in the new pass manager and also checks for them explicitly as
well as triggering ASan failures when things go squirly.
llvm-svn: 291426
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
Summary:
When SCEVRewriteVisitor traverses the SCEV DAG, it may visit the same SCEV
multiple times if this SCEV is referenced by multiple other SCEVs. This has
exponential time complexity in the worst case. Memoizing the results will
avoid re-visiting the same SCEV. Add a map to save the results, and override
the visit function of SCEVVisitor. Now SCEVRewriteVisitor only visit each
SCEV once and thus returns the same result for the same input SCEV.
This patch fixes PR18606, PR18607.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Differential Revision: https://reviews.llvm.org/D25810
llvm-svn: 284868
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
This is to avoid inlining too many multiplication operands into a SCEV, which could
take exponential time in the worst case.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Differential Revision: https://reviews.llvm.org/D25794
llvm-svn: 284784
In loops that look something like
i = n;
do {
...
} while(i++ < n+k);
where k is a constant, the maximum backedge count is k (in fact the backedge
count will be either 0 or k, depending on whether n+k wraps). More generally
for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will
iterate either 0 or that constant number of times.
This allows for more loop unrolling with the recent upper bound loop unrolling
changes, and I'm working on a patch that will let loop unrolling additionally
make use of the loop being executed either 0 or k times (we need to retain the
loop comparison only on the first unrolled iteration).
Differential Revision: https://reviews.llvm.org/D25607
llvm-svn: 284465
Enhance SCEV to compute the trip count for some loops with unknown stride.
Patch by Pankaj Chawla
Differential Revision: https://reviews.llvm.org/D22377
llvm-svn: 281732
value is a pointer.
This patch is to fix PR30213. When expanding an expr based on ValueOffsetPair,
if the value is of pointer type, we can only create a getelementptr instead
of sub expr.
Differential Revision: https://reviews.llvm.org/D24088
llvm-svn: 281439
when unroll runtime iteration loop.
In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner
loop inside a loop nest, the scalar evolution needs to be dropped for its
parent loop which is done by ScalarEvolution::forgetLoop. However, we can
postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC
expansion can still reuse existing value.
Differential Revision: https://reviews.llvm.org/D23572
llvm-svn: 279748
The patch is to fix the bug in PR28705. It was caused by setting wrong return
value for SCEVExpander::findExistingExpansion. The return values of findExistingExpansion
have different meanings when the function is used in different ways so it is easy to make
mistake. The fix creates two new interfaces to replace SCEVExpander::findExistingExpansion,
and specifies where each interface is expected to be used.
Differential Revision: https://reviews.llvm.org/D22942
llvm-svn: 278161
The fix for PR28705 will be committed consecutively.
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 278160
This change lets us prove things like
"{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow"
It does this by replacing replacing getConstantDifference by
computeConstantDifference (which is smarter) in
isImpliedCondOperandsViaRanges.
llvm-svn: 276505
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 276136
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.
Differential Revision: http://reviews.llvm.org/D9381
llvm-svn: 275037
The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:
```
static int wrapping_add(int a, int b) {
return (int)((unsigned)a + (unsigned)b);
}
void test() {
volatile int end_buf = 2147483548; // INT_MIN - 100
int end = end_buf;
unsigned counter = 0;
for (int start = wrapping_add(end, 200); start < end; start++)
counter++;
print(counter);
}
```
Note: the `NoWrap` variable that was being tested has little to do with
the values flowing into the max expression; it is a property of the
induction variable.
test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test
functionality I'm reverting in this change, so I've deleted the test
fully.
llvm-svn: 273079
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).
I believe with this change PR28012 is fixed.
Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.
llvm-svn: 272237
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
llvm-svn: 272181
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
llvm-svn: 272179
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 271152
Fixes PR27315.
The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression. Otherwise we miscompile
programs like
```
int main() {
int a = 0;
unsigned a_u = 0;
volatile long last_value;
do {
a_u += 3;
last_value = (long) ((int) a_u);
if (will_add_overflow(a, 3)) {
// Leave, and don't actually do the increment, so no UB.
printf("last_value = %ld\n", last_value);
exit(0);
}
a += 3;
} while (a != 46);
return 0;
}
```
This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.
To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases. This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.
llvm-svn: 271151
Summary:
**Description**
This makes `WidenIV::widenIVUse` (IndVarSimplify.cpp) fail to widen narrow IV uses in some cases. The latter affects IndVarSimplify which may not eliminate narrow IV's when there actually exists such a possibility, thereby producing ineffective code.
When `WidenIV::widenIVUse` gets a NarrowUse such as `{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>`, it first tries to get a wide recurrence for it via the `getWideRecurrence` call.
`getWideRecurrence` returns recurrence like this: `{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>`.
Then a wide use operation is generated by `cloneIVUser`. The generated wide use is evaluated to `{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>`, which is different from the `getWideRecurrence` result. `cloneIVUser` sees the difference and returns nullptr.
This patch also fixes the broken LLVM tests by adding missing <nsw> entries introduced by the correction.
**Minimal reproducer:**
```
int foo(int a, int b, int c);
int baz();
void bar()
{
int arr[20];
int i = 0;
for (i = 0; i < 4; ++i)
arr[i] = baz();
for (; i < 20; ++i)
arr[i] = foo(arr[i - 4], arr[i - 3], arr[i - 2]);
}
```
**Clang command line:**
```
clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf test.cpp -o test.ir
```
**Expected result:**
The ` -mllvm -debug` log shows that all the IV's for the second `for` loop have been eliminated.
Reviewers: sanjoy
Subscribers: atrick, asl, aemerson, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D20058
llvm-svn: 270695
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is the NUW variant of r269211 and fixes PR27691.
(Note: PR27691 is not a correct or stability bug, it was created to
track a pending task).
llvm-svn: 269790
SCEVExpander::replaceCongruentIVs assumes the backedge value of an
SCEV-analysable PHI to always be an instruction, when this is not
necessarily true. For now address this by bailing out of the
optimization if the backedge value of the PHI is a non-Instruction.
llvm-svn: 269213
`SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the
original and the isomorphic increments are PHI nodes. Doing this can
break SSA if the isomorphic increment is not dominated by the original
increment. Get rid of the bypass, and let `hoistIVInc` do the right
thing.
Fixes PR27232 (compile time crash/hang).
llvm-svn: 269212
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is not a problem for "normal" loops[0] that don't have
guards or assumes, but helps in cases where we have guards or assumes in
the loop that can be used to constrain incoming values over the backedge.
This partially fixes PR27691 (we still don't handle the NUW case).
[0]: for "normal" loops, in the cases where we'd be able to prove
no-wrap via isKnownPredicate, we'd also be able to compute a max
tripcount.
llvm-svn: 269211
We can use calls to @llvm.experimental.guard to prove predicates,
relying on the fact that in all locations domianted by a call to
@llvm.experimental.guard the predicate it is guarding is known to be
true.
llvm-svn: 268997
In the "LoopDispositions:" section:
- Instead of printing out a list, print out a "dictionary" to make it
obvious by inspection which disposition is for which loop. This is
just a cosmetic change.
- Print dispositions for parent _and_ sibling loops. I will use this
to write a test case.
llvm-svn: 268405
There are currently some bugs in tree around SCEV caching an incorrect
loop disposition. Printing out loop dispositions will let us write
whitebox tests as those are fixed.
The dispositions are printed as a list in "inside out" order,
i.e. innermost loop first.
llvm-svn: 268177
Summary:
(... while still not using a PostDomTree)
The way we use isKnownNotFullPoison from SCEV today, the new CFG walking
logic will not trigger for any realistic cases -- it will kick in only
for situations where we could have merged the contiguous basic blocks
anyway[0], since the poison generating instruction dominates all of its
non-PHI uses (which are the only uses we consider right now).
However, having this change in place will allow a later bugfix to break
fewer llvm-lit tests.
[0]: i.e. cases where block A branches to block B and B is A's only
successor and A is B's only predecessor.
Reviewers: broune, bjarke.roune
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19212
llvm-svn: 267175
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 265912
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.
Original commit message:
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265786