When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.
Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:
for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.
This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.
llvm-svn: 303730
This is a re-application of a r303497 that was reverted in r303498.
I thought it had broken a bot when it had not (the breakage did not
go away with the revert).
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303531
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303497
The probability of edge coming to unreachable block should be as low as possible.
The change reduces the probability to minimal value greater than zero.
The bug https://bugs.llvm.org/show_bug.cgi?id=32214 show the example when
the probability of edge coming to unreachable block is greater than for edge
coming to out of the loop and it causes incorrect loop rotation.
Please note that with this change the behavior of unreachable heuristic is a bit different
than others. Specifically, before this change the sum of probabilities
coming to unreachable blocks have the same weight for all branches
(it was just split over all edges of this block coming to unreachable blocks).
With this change it might be slightly different but not to much due to probability of
taken branch to unreachable block is really small.
Reviewers: chandlerc, sanjoy, vsk, congh, junbuml, davidxl, dexonsmith
Reviewed By: chandlerc, dexonsmith
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D30633
llvm-svn: 303327
The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs
by loop depth, but does not pay attention to dominance of loops. This can
lead us to the following buggy situation:
for (...) { // loop1
op1 = {A,+,B}
}
for (...) { // loop2
op2 = {A,+,B}
S = add op1, op2
}
In this case there is no guarantee that in operand list of S the op2 comes
before op1 (loop depth is the same, so they will be sorted just
lexicographically), so we can incorrectly treat S as a recurrence of loop1,
which is wrong.
This patch changes the sorting logic so that it places the dominated recs
before the dominating recs. This ensures that when we pick the first recurrency
in the operands order, it will be the bottom-most in terms of domination tree.
The attached test set includes some tests that produce incorrect SCEV
estimations and crashes with oldlogic.
Reviewers: sanjoy, reames, apilipenko, anna
Reviewed By: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33121
llvm-svn: 303148
Update a few tests to use llvm.masked.load/store instead of arm neon
vector loads and stores, and move the tests that are actually specific
to those arm intrinsics to their own files. This lets us mark the
tests that use target specific intrinsics as requiring those targets.
llvm-svn: 302972
This is a follow up patch for https://reviews.llvm.org/rL300440
to address a comment.
To make implementation to be consistent with other cases we just
ignore the remainder after distribution of remaining probability between
reachable edges.
If we reduced the probability of some edges coming to unreachable
blocks we should distribute the remaining part across other edges
coming to reachable blocks to satisfy the condition that sum of all
probabilities should be equal to one. If this remaining part is not
divided by number of "reachable" edges then we get this remainder.
This remainder probability should be pretty small. Other cases just ignore
if the sum of probabilities is not equal to one so we do the same.
Reviewers: chandlerc, sanjoy, vsk, junbuml, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D32124
llvm-svn: 302883
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D32706
llvm-svn: 302582
Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets.
llvm-svn: 302378
Summary:
The existing implementation creates a symbolic SCEV expression every
time we analyze a phi node and then has to remove it, when the analysis
is finished. This is very expensive, and in most of the cases it's also
unnecessary. According to the data I collected, ~60-70% of analyzed phi
nodes (measured on SPEC) have the following form:
PN = phi(Start, OP(Self, Constant))
Handling such cases separately significantly speeds this up.
Reviewers: sanjoy, pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32663
llvm-svn: 302096
Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a
non-default address space pointer we fail with a "Calling a function with a
bad singature!" assertion. This patch solves this by adding the 'vector of
pointers' argument as an overloaded type which will determine the address
space.
Differential revision: https://reviews.llvm.org/D31490
llvm-svn: 302018
In cases where an instruction (a call site, say) is RAUW'ed with some
other value (this is possible via the `returned` attribute, for
instance), we want the slot in UnknownInsts to point to the original
Instruction we wanted to track, not the value it got replaced by.
Fixes PR32587.
This relands r301426.
llvm-svn: 301814
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
In cases where an instruction (a call site, say) is RAUW'ed with some
other value (this is possible via the `returned` attribute, amongst
other things), we want the slot in UnknownInsts to point to the
original Instruction we wanted to track, not the value it got replaced
by.
Fixes PR32587.
Reviewers: davide
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D32268
llvm-svn: 301426
Summary:
In a previous change I changed SCEV's normalization / denormalization
to work with non-affine add recs. So the bailout in IVUsers can be
removed.
Reviewers: atrick, efriedma
Reviewed By: atrick
Subscribers: davide, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D32105
llvm-svn: 301298
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.
llvm-svn: 300928
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.
Differential Revision: https://reviews.llvm.org/D32239
llvm-svn: 300746
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.
An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.
This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214
Reviewers: sanjoy, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits, reames, davidxl
Differential Revision: https://reviews.llvm.org/D30631
llvm-svn: 300440
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
Summary:
Readnone attribute would cause CSE of two barriers with
the same argument, which is invalid by example:
struct Base {
virtual int foo() { return 42; }
};
struct Derived1 : Base {
int foo() override { return 50; }
};
struct Derived2 : Base {
int foo() override { return 100; }
};
void foo() {
Base *x = new Base{};
new (x) Derived1{};
int a = std::launder(x)->foo();
new (x) Derived2{};
int b = std::launder(x)->foo();
}
Here 2 calls of std::launder will produce @llvm.invariant.group.barrier,
which would be merged into one call, causing devirtualization
to devirtualize second call into Derived1::foo() instead of
Derived2::foo()
Reviewers: chandlerc, dberlin, hfinkel
Subscribers: llvm-commits, rsmith, amharc
Differential Revision: https://reviews.llvm.org/D31531
llvm-svn: 300101
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).
This separation allows re-ordering of heuristics without loosing
the post-domination information.
Reviewers: sanjoy, junbuml, vsk, chandlerc, reames
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31701
llvm-svn: 300029
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
Reviewers: reames, apilipenko, anna, sanjoy
Reviewed By: sanjoy
Subscribers: mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31238
llvm-svn: 299205