The SCEV code for constructing GEP expressions currently assumes
that the addition of the base and all the offsets is nsw if the GEP
is inbounds. While the addition of the offsets is indeed nsw, the
addition to the base address is not, as the base address is
interpreted as an unsigned value.
Fix the GEP expression code to not assume nsw for the base+offset
calculation. However, do assume nuw if we know that the offset is
non-negative. With this, we use the same behavior as the
construction of GEP addrecs does. (Modulo the fact that we
disregard SCEV unification, as the pre-existing FIXME points out).
Differential Revision: https://reviews.llvm.org/D90648
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.
Here is a simple scenario peeling can be used in loop fusion:
for (i = 0; i < 10; ++i)
a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
b[j] = b[j] + 5;
Here is we can make use of peeling, and then fuse the two loops together. We
can peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.
a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
a[i] = a[i] + 3;
b[i] = b[i] + 5;
}
Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.
Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D82927
This reverts commit bb8850d34d.
It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build.
LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.
Summary:
This patch adds the ability to peel off iterations of the first loop in loop
fusion. This can allow for both loops to have the same trip count, making it
legal for them to be fused together.
Here is a simple scenario peeling can be used in loop fusion:
for (i = 0; i < 10; ++i)
a[i] = a[i] + 3;
for (j = 1; j < 10; ++j)
b[j] = b[j] + 5;
Here is we can make use of peeling, and then fuse the two loops together. We can
peel off the 0th iteration of the loop i, and then combine loop i and j for
i = 1 to 10.
a[0] = a[0] +3;
for (i = 1; i < 10; ++i) {
a[i] = a[i] + 3;
b[i] = b[i] + 5;
}
Currently peeling with loop fusion is only supported for loops with constant
trip counts and a single exit point. Both unguarded and guarded loops are
supported.
Author: sidbav (Sidharth Baveja)
Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour
Reviewed By: bmahjour
Subscribers: bmahjour, mgorny, hiraditya, zzheng
Tags: LLVM
Differential Revision: https://reviews.llvm.org/D82927
blocks.
Summary: The current LoopFusion forget to update the incoming block of
the phis in second loop guard non loop successor from second loop guard
block to first loop guard block. A test case is provided to better
understand the problem.
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81421
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.
Reviewed By: kbarton
Differential Revision: https://reviews.llvm.org/D78679
from FC0.ExitBlock to FC1.ExitBlock when proven safe.
Summary:
Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:
if (0 < N) {
for (int i = 0; i < N; ++i) {}
x+=1;
}
y+=1;
if (0 < N) {
for (int i = 0; i < N; ++i) {}
}
The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D73641
proven safe.
Summary:
Currently LoopFusion give up when the second loop nest preheader is
not empty. For example:
for (int i = 0; i < 100; ++i) {}
x+=1;
for (int i = 0; i < 100; ++i) {}
The above example should be safe to fuse.
This PR moves instructions in FC1 preheader (e.g. x+=1; ) to
FC0 preheader, which then LoopFusion is able to fuse them.
Reviewer: kbarton, Meinersbur, jdoerfert, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71821
Summary:This PR move instructions from FC0.Latch bottom up to the
beginning of FC1.Latch as long as they are proven safe.
To illustrate why this is beneficial, let's consider the following
example:
Before Fusion:
header1:
br header2
header2:
br header2, latch1
latch1:
br header1, preheader3
preheader3:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header3, exit3
After Fusion (before this PR):
header1:
br header2
header2:
br header2, latch1
latch1:
br header3
header3:
br header4
header4:
br header4, latch3
latch3:
br header1, exit3
Note that preheader3 is removed during fusion before this PR.
Notice that we cannot fuse loop2 with loop4 as there exists block latch1
in between.
This PR move instructions from latch1 to beginning of latch3, and remove
block latch1. LoopFusion is now able to fuse loop nest recursively.
After Fusion (after this PR):
header1:
br header2
header2:
br header3
header3:
br header4
header4:
br header2, latch3
latch3:
br header1, exit3
Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel,
bmahjour, etiotto
Reviewed By: kbarton, Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D71165
Summary:
This patch restricts loop fusion to only consider rotated loops as valid candidates.
This simplifies the analysis and transformation and aligns with other loop optimizations.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: ormris, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71025
Summary:
This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65464
llvm-svn: 373018
Summary:
This patch extends the use of the OptimizationRemarkEmitter to provide
information about loops that are not fused, and loops that are not eligible for
fusion. In particular, it uses the OptimizationRemarkAnalysis to identify loops
that are not eligible for fusion and the OptimizationRemarkMissed to identify
loops that cannot be fused.
It also reuses the statistics to provide the messages used in the
OptimizationRemarks. This provides common message strings between the
optimization remarks and the statistics.
I would like feedback on this approach, in general. If people are OK with this,
I will flesh out additional remarks in subsequent commits.
Subscribers: hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63844
llvm-svn: 367327
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
1. Adjacent (no code between them)
2. Control flow equivalent (if one loop executes, the other loop executes)
3. Identical bounds (both loops iterate the same number of iterations)
4. No negative distance dependencies between the loop bodies.
The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.
The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.
Differential Revision: https://reviews.llvm.org/D55851
llvm-svn: 358607
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
This patch adds a basic loop fusion pass. It will fuse loops that conform to the
following 4 conditions:
1. Adjacent (no code between them)
2. Control flow equivalent (if one loop executes, the other loop executes)
3. Identical bounds (both loops iterate the same number of iterations)
4. No negative distance dependencies between the loop bodies.
The pass does not make any changes to the IR to create opportunities for fusion.
Instead, it checks if the necessary conditions are met and if so it fuses two
loops together.
The pass has not been added to the pass pipeline yet, and thus is not enabled by
default. It can be run stand alone using the -loop-fusion option.
Phabricator: https://reviews.llvm.org/D55851
llvm-svn: 358543