This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.
> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 305720
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like
(x * x * x * x * x * x * x * x)
If we try to expand it, it generates a very straightforward sequence of muls, like:
x2 = mul x, x
x3 = mul x2, x
x4 = mul x3, x
...
x8 = mul x7, x
This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:
x2 = mul x, x
x4 = mul x2, x2
x8 = mul x4, x4
In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:
x = a;
for (int i = 0; i < 3; i++)
x = x * x;
And this loop have been fully unrolled, we have something like:
x = a;
x2 = x * x;
x4 = x2 * x2;
x8 = x4 * x4;
The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.
This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.
Differential Revision: https://reviews.llvm.org/D34025
llvm-svn: 305663
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
Summary:
This fixes introduction of an incorrect inttoptr/ptrtoint pair in
the included test case which makes use of non-integral pointers. I
suspect there are more cases like this left, but this takes care of
the one I was seeing at the moment.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33129
llvm-svn: 304058
The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on
many non-X86 configs with this patch. The reason of this is that the patch
makes a correctless fix that changes optimizer's behavior for this test.
Without the change, LSR was making an overconfident simplification basing on a
wrong SCEV. Apparently it did not need the IV analysis to do this. With the
change, it chose a different way to simplify (that wasn't so confident), and
this way required the IV analysis. Now, following the right execution path,
LSR tries to make a transformation relying on IV Users analysis. This analysis
is target-dependent due to this code:
// LSR is not APInt clean, do not touch integers bigger than 64-bits.
// Also avoid creating IVs of non-native types. For example, we don't want a
// 64-bit IV in 32-bit code just because the loop has one 64-bit cast.
uint64_t Width = SE->getTypeSizeInBits(I->getType());
if (Width > 64 || !DL.isLegalInteger(Width))
return false;
To make a proper transformation in this test case, the type i32 needs to be
legal for the specified data layout. When the test runs on some non-X86
configuration (e.g. pure ARM 64), opt gets confused by the specified target
and does not use it, rejecting the specified data layout as well. Instead,
it uses some default layout that does not treat i32 as a legal type
(currently the layout that is used when it is not specified does not have
legal types at all). As result, the transformation we expect to happen does
not happen for this test.
This re-enabling patch does not have any source code changes compared to the
original patch rL303730. The only difference is that the failing test is
moved to X86 directory and now has requirement of running on x86 only to comply
with the specified target triple and data layout.
Differential Revision: https://reviews.llvm.org/D33543
llvm-svn: 303971
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.
Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:
for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.
This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.
llvm-svn: 303730
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.
The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.
llvm-svn: 303361
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.
Should fix PR32658.
llvm-svn: 300878
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.
Differential Revision: https://reviews.llvm.org/D30596
llvm-svn: 299723
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.
Differential Revision: https://reviews.llvm.org/D26781
llvm-svn: 295884
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).
Summary:
The method is based on registers number mathematical expectation and should be
generally closer to optimal solution.
Please see details in comments to
"LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
(in lib/Transforms/Scalar/LoopStrengthReduce.cpp).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D29862
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops.
// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j);
long cond;
void foo(long N) {
long i = 0;
long j = 0;
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
}
The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.
Differential Revision: https://reviews.llvm.org/D30021
llvm-svn: 295378
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
Summary:
The patch adds instructions number generated by a solution
to LSR cost under "-lsr-insns-cost" option.
Reviewers: qcolombet, hfinkel
Differential Revision: http://reviews.llvm.org/D28307
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
The recommit includes some changes of testcases. No functional change to the patch.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
llvm-svn: 294814
For targets with different addressing modes in each address space,
if this is dropped querying isLegalAddressingMode later with this
will give a nonsense result, breaking the isLegalUse assertions.
This is a candidate for the 4.0 release branch.
llvm-svn: 293542
bots ever since d0k fixed the CHECK lines so that it did something at
all.
It isn't actually testing SCEV directly but LSR, so move it into LSR and
the x86-specific tree of tests that already exists there. Target
dependence is common and unavoidable with the current design of LSR.
llvm-svn: 292774
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
llvm-svn: 292060
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
llvm-svn: 286999
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Differential Revision: https://reviews.llvm.org/D26185
llvm-svn: 286347
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
reason about and less error prone.
The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.
Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.
Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.
One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.
The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]
Thanks to Sean for the review!
Differential Revision: https://reviews.llvm.org/D22724
llvm-svn: 277561
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Originally reviewed in http://reviews.llvm.org/D18001
Reviewers: atrick
Subscribers: llvm-commits, mzolotukhin, mcrosier
Differential Revision: http://reviews.llvm.org/D18480
llvm-svn: 271929
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.
llvm-svn: 268002
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.
Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.
The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng
http://reviews.llvm.org/D16984
llvm-svn: 265397
We try to hoist the insertion point as high as possible to encourage
sharing. However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.
llvm-svn: 264344
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Reviewers: atrick
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18001
llvm-svn: 263644
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.
1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.
2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259736
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.
This fixes PR26373.
llvm-svn: 259702
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
Differential Revision: http://reviews.llvm.org/D12090
llvm-svn: 259662
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422