This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert
for (i = 0; i < n; i++) {
guard(i < len);
...
}
to
for (i = 0; i < n; i++) {
guard(n - 1 < len);
...
}
After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition:
if (n - 1 < len)
for (i = 0; i < n; i++) {
...
}
else
deoptimize
This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062).
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D29034
llvm-svn: 293064
loops.
We do this by reconstructing the newly added loops after the unroll
completes to avoid threading pass manager details through all the mess
of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues
here and enabled a bunch of unroller tests to try and make sure this is
sane.
Currently, I'm manually running loop-simplify when needed. That should
go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
Summary:
GVNHoist performs all the optimizations that MLSM does to loads, in a
more general way, and in a faster time bound (MLSM is N^3 in most
cases, N^4 in a few edge cases).
This disables the load portion.
Note that the way ld_hoist_st_sink.ll is written makes one think that
the loads should be moved to the while.preheader block, but
1. Neither MLSM nor GVNHoist do it (they both move them to identical places).
2. MLSM couldn't possibly do it anyway, as the while.preheader block
is not the head of the diamond, while.body is. (GVNHoist could do it
if it was legal).
3. At a glance, it's not legal anyway because the in-loop load
conflict with the in-loop store, so the loads must stay in-loop.
I am happy to update the test to use update_test_checks so that
checking is tighter, just was going to do it as a followup.
Note that i can find no particular benefit to the store portion on any
real testcase/benchmark i have (even size-wise). If we really still
want it, i am happy to commit to writing a targeted store sinker, just
taking the code from the MemorySSA port of MergedLoadStoreMotion
(which is N^2 worst case, and N most of the time).
We can do what it does in a much better time bound.
We also should be both hoisting and sinking stores, not just sinking
them, anyway, since whether we should hoist or sink to merge depends
basically on luck of the draw of where the blockers are placed.
Nonetheless, i have left it alone for now.
Reviewers: chandlerc, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29079
llvm-svn: 292971
a lazy-asserting PoisoningVH.
AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.
This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.
The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.
The rest is straight cleanup.
I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.
Differential Revision: https://reviews.llvm.org/D29006
llvm-svn: 292928
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
invalidation of deleted functions in GlobalDCE.
This was always testing a bug really triggered in GlobalDCE. Right now
we have analyses with asserting value handles into IR. As long as those
remain, when *deleting* an IR unit, we cannot wait for the normal
invalidation scheme to kick in even though it was designed to work
correctly in the face of these kinds of deletions. Instead, the pass
needs to directly handle invalidating the analysis results pointing at
that IR unit.
I've tought the Inliner about this and this patch teaches GlobalDCE.
This will handle the asserting VH case in the existing test as well as
other issues of the same fundamental variety. I've moved the test into
the GlobalDCE directory and added a comment explaining what is going on.
Note that we cannot simply require LVI here because LVI is too lazy.
llvm-svn: 292773
become unavailable.
The AssumptionCache is now immutable but it still needs to respond to
DomTree invalidation if it ended up caching one.
This lets us remove one of the explicit invalidates of LVI but the
other one continues to avoid hitting a latent bug.
llvm-svn: 292769
Don't call `isTriviallyDeadInstructions()` once we discover that
an instruction is dead. Instead, set DFS number zero (as suggested
by Danny) and forget about it (this also speeds up things as we
won't try to reprocess that block).
Differential Revision: https://reviews.llvm.org/D28930
llvm-svn: 292676
Summary:
This rewrites store expression/leader handling. We no longer use the
value operand as the leader, instead, we store it separately. We also
now store the stored value as part of the expression, and compare it
when comparing stores for equality. This enables us to get rid of a
bunch of our previous hacks and machinations, as the existing
machinery takes care of everything *except* updating the stored value
on classes. The only time we have to update it is if the storecount
goes to 0, and when we do, we destroy it.
Since we no longer use the value operand as the leader, during elimination, we have to use the value operand. Doing this also fixes a bunch of store forwarding cases we were missing.
Any value operand we use is guaranteed to either be updated by previous eliminations, or minimized by future ones.
(IE the fact that we don't use the most dominating value operand when it's not a constant does not affect anything).
Sadly, this change also exposes that we didn't pay attention to the
output of the pr31594.ll test, as it also very clearly exposes the
same store leader bug we are fixing here.
(I added pr31682.ll anyway, but maybe we think that's too large to be useful)
On the plus side, propagate-ir-flags.ll now passes due to the
corrected store forwarding.
This change was 3 stage'd on darwin and linux, with the full test-suite.
Reviewers:
davide
Subscribers:
llvm-commits
llvm-svn: 292648
Like several other loop passes (the vectorizer, etc) this pass doesn't
really fit the model of a loop pass. The critical distinction is that it
isn't intended to be pipelined together with other loop passes. I plan
to add some documentation to the loop pass manager to make this more
clear on that side.
LoopSink is also different because it doesn't really need a lot of the
infrastructure of our loop passes. For example, if there aren't loop
invariant instructions causing a preheader to exist, there is no need to
form a preheader. It also doesn't need LCSSA because this pass is
only involved in sinking invariant instructions from a preheader into
the loop, not reasoning about live-outs.
This allows some nice simplifications to the pass in the new PM where we
can directly walk the loops once without restructuring them.
Differential Revision: https://reviews.llvm.org/D28921
llvm-svn: 292589
Part of the assert has been left active for further debugging.
The other part has been turned into a stat for tracking for the
moment.
llvm-svn: 292583
This can prove that:
extern int f;
int g() {
int x = 0;
for (int i = 0; i < 365; ++i) {
x /= f;
}
return x;
}
always returns zero. Thanks to Sanjoy for confirming this
transformation actually made sense (bugs are mine).
llvm-svn: 292531
Summary:
In case of non-alloca pointers, we check for whether it is a pointer
from malloc-like calls and it is not captured. In such case, we can
promote the pointer, as the caller will have no way to access this pointer
even if there is unwinding in middle of the loop.
Reviewers: hfinkel, sanjoy, reames, eli.friedman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28834
llvm-svn: 292510
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
unique exit block if available rather than rolling it ourselves.
This is a little disappointing because that helper doesn't do anything
clever to short-circuit the (surprisingly expensive) computation of all
exit blocks. What's worse is that the way we compute this is hopelessly,
hilariously inefficient. We're literally computing the same information
two different ways and multiple times each way:
- hasDedicatedExits computes the exit block set and then looks at the
predecessors of each
- getExitingBlocks computes the set of loop blocks which have exiting
successors
- getUniqueExitBlock(s) computes the set of non-loop blocks reached from
loop blocks (sound familiar?)
Anyways, at some point we should clean all of this up in the LoopInfo
API, but for now just simplifying the user I'm about to touch.
llvm-svn: 292282
I hope that for any code, it is changed only with good reason and only
when the author knows what they are doing...
There is of course good reason to comment here about the subtlety of the
process, and I've left that comment in tact.
llvm-svn: 292275
instead of members.
No state was being provided by the object so this seems strictly
simpler.
I've also tried to improve the name and comments for the functions to
more thoroughly document what they are doing.
llvm-svn: 292274
that we know has exactly one element when all we are going to do is get
that one element out of it.
Instead, pass around that one element.
There are more simplifications to come in this code...
llvm-svn: 292273
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
Summary:
This is a testcase where phi node cycling happens, and because we do
not order the leaders by domination or anything similar, the leader
keeps changing.
Using std::set for the members is too expensive, and we actually don't
need them sorted all the time, only at leader changes.
We could keep both a set and a vector, and keep them mostly sorted and
resort as necessary, or use a set and a fibheap, but all of this seems
premature.
After running some statistics, we are able to avoid the vast majority
of sorting by keeping a "next leader" field. Most congruence classes only have
leader changes once or twice during GVN.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28594
llvm-svn: 291968
It was always zero. When we move a store from `initial` to its
own congruency class, we end up with a negative store count, which
is obviously wrong.
Also, while here, change StoreCount to be signed so that the assertions
actually fire.
Ack'ed by Daniel Berlin.
llvm-svn: 291725
classes, and updating checking to allow for equivalence through
reachability.
(Sadly, the checking here is not perfect, and can't be made perfect,
so we'll have to disable it after we are satisfied with correctness.
Right now it is just "very unlikely" to happen.)
llvm-svn: 291698
the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current
plan is to have the PM itself establish and enforce both loop simplified
form and LCSSA. This would be a layering violation in the analysis
library.
Fundamentally, the idea behind the loop PM is to *transform* loops in
addition to running passes over them, so it really seemed like the most
natural place to sink this was into the transforms library.
We can't just move *everything* because we also have loop analyses that
rely on a subset of the invariants. So this patch splits the the loop
infrastructure into the analysis management that has to be part of the
analysis library, and the transform-aware pass manager.
This also required splitting the loop analyses' printer passes out to
the transforms library, which makes sense to me as running these will
transform the code into LCSSA in theory.
I haven't split the unittest though because testing one component
without the other seems nearly intractable.
Differential Revision: https://reviews.llvm.org/D28452
llvm-svn: 291662
arguments much like the CGSCC pass manager.
This is a major redesign following the pattern establish for the CGSCC layer to
support updates to the set of loops during the traversal of the loop nest and
to support invalidation of analyses.
An additional significant burden in the loop PM is that so many passes require
access to a large number of function analyses. Manually ensuring these are
cached, available, and preserved has been a long-standing burden in LLVM even
with the help of the automatic scheduling in the old pass manager. And it made
the new pass manager extremely unweildy. With this design, we can package the
common analyses up while in a function pass and make them immediately available
to all the loop passes. While in some cases this is unnecessary, I think the
simplicity afforded is worth it.
This does not (yet) address loop simplified form or LCSSA form, but those are
the next things on my radar and I have a clear plan for them.
While the patch is very large, most of it is either mechanically updating loop
passes to the new API or the new testing for the loop PM. The code for it is
reasonably compact.
I have not yet updated all of the loop passes to correctly leverage the update
mechanisms demonstrated in the unittests. I'll do that in follow-up patches
along with improved FileCheck tests for those passes that ensure things work in
more realistic scenarios. In many cases, there isn't much we can do with these
until the loop simplified form and LCSSA form are in place.
Differential Revision: https://reviews.llvm.org/D28292
llvm-svn: 291651
These are interesting again because the user may not be aware that this
is a common reason preventing LICM.
A const is removed from an instruction pointer declaration in order to
pass it to ORE.
Differential Revision: https://reviews.llvm.org/D27940
llvm-svn: 291649