The optimisation remarks for loop unrolling with an unrolled remainder looks something like:
test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
C[i] += A[i*N+j];
^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
for(int j = 0; j < N; j++)
^
This removes the first of the two messages.
Differential revision: https://reviews.llvm.org/D38725
llvm-svn: 316986
This avoid code duplication and allow us to add the disable unroll metadata elsewhere.
Differential Revision: https://reviews.llvm.org/D38928
llvm-svn: 315850
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor. Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.
Reviewers: apilipenko, reames, evstupac, mkuper
Subscribers: llvm-commits
Reviewed by: reames
Differential Revision: https://reviews.llvm.org/D35380
llvm-svn: 313363
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.
llvm-svn: 313358
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.
In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.
Differential Revision: https://reviews.llvm.org/D37338
llvm-svn: 312471
On some targets, the penalty of executing runtime unrolling checks
and then not the unrolled loop can be significantly detrimental to
performance. This results in the need to be more conservative with
the unroll count, keeping a trip count of 2 reduces the overhead as
well as increasing the chance of the unrolled body being executed. But
being conservative leaves performance gains on the table.
This patch enables the unrolling of the remainder loop introduced by
runtime unrolling. This can help reduce the overhead of misunrolled
loops because the cost of non-taken branches is much less than the
cost of the backedge that would normally be executed in the remainder
loop. This allows larger unroll factors to be used without suffering
performance loses with smaller iteration counts.
Differential Revision: https://reviews.llvm.org/D36309
llvm-svn: 310824
Separated out the profitability from the safety analysis for multiexit
loop unrolling. Currently, this is an NFC because profitability is true
only if the unroll-runtime-multi-exit is set to true (off-by-default).
This is to ease adding the profitability heuristic up for review at
D35380.
llvm-svn: 308753
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.
Reviewers: reames, mkuper, mzolotukhin, apilipenko
Reviewed by: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35304
llvm-svn: 307909
Refactored the code and separated out a function
`canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it
easier to add profitability heuristics later.
Added tests to runtime unrolling to make sure that unrolling for
multi-exit loops is not done unless the option
-unroll-runtime-multi-exit is true.
llvm-svn: 307843
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
llvm-svn: 307676
When unrolling under multiple exits which is under off-by-default option,
the assert that checks for VMap entry in loop exit values is too strong.
(assert if VMap entry did not exist, the value should be a
constant). However, values derived from
constants or from values outside loop, does not have a VMap entry too.
Removed the assert and added a testcase showcasing the property for
non-constant values.
llvm-svn: 307542
With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic
is in place to support multiple exit/exiting blocks when prolog
remainder is generated.
This patch removed the assert that multiple exit blocks unrolling is only
supported when epilog remainder is generated.
Also, added test runs and checks with PROLOG prefix in
runtime-loop-multiple-exits.ll test cases.
llvm-svn: 307435
Minor refactoring to use the preexisting loop exit that's already
calculated. We do not need to recompute the loop exit in ConnectProlog.
Apart from avoiding redundant computation, this is required for
supporting multiple loop exits when Prolog remainder loops are generated.
llvm-svn: 307417
Currently, we do not support multiple exiting blocks to the
latch exit block. However, this bailout wasn't triggered when we had a
unique exit block (which is the latch exit), with multiple exiting
blocks to that unique exit.
Moved the bailout so that it's triggered in both cases and added
testcase.
llvm-svn: 307291
Summary:
Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks).
Currently this is under an off-by-default option and is supported when
epilog code is generated. Support in presence of prolog code will be in
a future patch (we just need to add more tests, and update comments).
This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.
Reviewers: mkuper, sanjoy, reames, evstupac
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33001
llvm-svn: 306846
Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block
(which is proven to be the only exiting block in the loop to be unrolled).
llvm-svn: 306410
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Summary:
Cloning basic blocks in the loop for runtime loop unroller depends on loop being
in rotated form (i.e. loop latch target is the exit block).
Assert that this is true, so that callers of runtime loop unroller pass in
canonical loops.
The single caller of this function has that check recently added:
https://reviews.llvm.org/rL301239
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32801
llvm-svn: 302058
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.
@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.
Reviewers: mzolotukhin, chandlerc, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits, mkuper
Differential Revision: https://reviews.llvm.org/D29173
llvm-svn: 293615
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
Mostly straightforward changes; we just didn't do the computation before.
One sort of interesting change in LoopUnroll.cpp: we weren't handling
dominance for children of the loop latch correctly, but
foldBlockIntoPredecessor hid the problem for complete unrolling.
Currently punting on loop peeling; made some minor changes to isolate
that problem to LoopUnrollPeel.cpp.
Adds a flag -unroll-verify-domtree; it verifies the domtree immediately
after we finish updating it. This is on by default for +Asserts builds.
Differential Revision: https://reviews.llvm.org/D28073
llvm-svn: 292447
Summary:
This fixes Transforms/LoopUnroll/runtime-loop3.ll which failed with
EXTENSIVE_DEBUG, because the cloned basic blocks were not added to the
correct sub-loops in LoopUnrollRuntime.cpp.
Reviewers: dexonsmith, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28482
llvm-svn: 291619
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.
This fixes PR28888.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas
Differential Revision: https://reviews.llvm.org/D24203
llvm-svn: 280901
when unroll runtime iteration loop.
In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner
loop inside a loop nest, the scalar evolution needs to be dropped for its
parent loop which is done by ScalarEvolution::forgetLoop. However, we can
postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC
expansion can still reuse existing value.
Differential Revision: https://reviews.llvm.org/D23572
llvm-svn: 279748
Summary:
It is incorrect to compare TripCount (which is BECount + 1)
with extraiters (or Count) to check if we should enter unrolled
loop or not, because TripCount can potentially overflow
(when BECount is max unsigned integer).
While comparing BECount with (Count - 1) is overflow safe and
therefore correct.
Reviewer: hfinkel
Differential Revision: http://reviews.llvm.org/D19256
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
Apparently there isn't test coverage for all of these. I'd appreciate
if someone with could reproduce and send me something to reduce, but for
now I've just looked for users of RemapInstruction and MapValue and
ensured they don't accidentally insert nullptr. Here is one of the
bootstraps that caught:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11494
llvm-svn: 266567
Clarify what this RemapFlag actually means.
- Change the flag name to match its intended behaviour.
- Clearly document that it's not supposed to affect globals.
- Add a host of FIXMEs to indicate how to fix the behaviour to match
the intent of the flag.
RF_IgnoreMissingLocals should only affect the behaviour of
RemapInstruction for function-local operands; namely, for operands of
type Argument, Instruction, and BasicBlock. Currently, it is *only*
passed into RemapInstruction calls (and the transitive MapValue calls
that it makes).
When I split Metadata from Value I didn't understand the flag, and I
used it in a bunch of places for "global" metadata.
This commit doesn't have any functionality change, but prepares to
cleanup MapMetadata and MapValue.
llvm-svn: 265628
Summary:
Extending findExistingExpansion can use existing value in ExprValueMap.
This patch gives 0.3~0.5% performance improvements on
benchmarks(test-suite, spec2000, spec2006, commercial benchmark)
Reviewers: mzolotukhin, sanjoy, zzheng
Differential Revision: http://reviews.llvm.org/D15559
llvm-svn: 260938
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:
- The APIs have access to pretty well any Pass state they want, so
it's hard to tell what they may or may not do.
- Other APIs have copied these and pass around a `Pass *` even though
they don't even use it. Some of these just hand a nullptr to the API
since the callers don't even have a pass available.
- Passes in the new pass manager don't work like the current ones, so
the APIs can't be used as is there.
Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.
llvm-svn: 255669
Continuing the work from last week to remove implicit ilist iterator
conversions. First related commit was probably r249767, with some more
motivation in r249925. This edition gets LLVMTransformUtils compiling
without the implicit conversions.
No functional change intended.
llvm-svn: 250142
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
through APIs that are no longer necessary now that the update API has
been removed.
This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.
llvm-svn: 242882
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
llvm-svn: 241143
Use IRBuilder::Create(Cond)?Br instead of constructing instructions
manually with BranchInst::Create(). It's consistent with other
uses of IRBuilder in this pass, and has an additional important
benefit:
Using IRBuilder will ensure that new branch instruction will get
the same debug location as original terminator instruction it will
eventually replace.
For now I'm not adding a testcase, as currently original terminator
instruction also lack debug location due to missing debug location
propagation in BasicBlock::splitBasicBlock. That is, the testcase
will accompany the fix for the latter I'm going to mail soon.
llvm-svn: 239550