The TargetTransform changes are breaking LTO bootstraps of clang. I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.
This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741
llvm-svn: 166168
This class is used by LSR and a number of places in the codegen.
This is the first step in de-coupling LSR from TLI, and creating
a new interface in between them.
llvm-svn: 165455
This places limits on CollectSubexprs to constrains the number of
reassociation possibilities. It limits the recursion depth and skips
over chains of nested recurrences outside the current loop.
Fixes PR13361. Although underlying SCEV behavior is still potentially bad.
llvm-svn: 160340
All SCEV expressions used by LSR formulae must be safe to
expand. i.e. they may not contain UDiv unless we can prove nonzero
denominator.
Fixes PR11356: LSR hoists UDiv.
llvm-svn: 160205
For non-address users, Base and Scaled registers are not specially
associated to fit an address mode, so SCEVExpander should apply normal
expansion rules. Otherwise we may sink computation into inner loops
that have already been optimized.
llvm-svn: 158537
The required checks are moved to ChainInstruction() itself and the
policy decisions are moved to IVChain::isProfitableInc().
Also cache the ExprBase in IVChain to avoid frequent recomputations.
No functional change intended.
llvm-svn: 155676
This introduces a threshold of 200 IV Users, which is very
conservative but should be sufficient to avoid serious compile time
sink or stack overflow. The llvm test-suite with LTO never exceeds 190
users per loop.
The bug doesn't relate to a specific type of loop. Checking in an
arbitrary giant loop as a unit test would be silly.
Fixes rdar://11262507.
llvm-svn: 154983
LSR can fold three addressing modes into its ICmpZero node:
ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset
ICmpZero -1*ScaleReg + Offset => ICmp ScaleReg, Offset
ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg
The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.
Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.
<rdar://problem/11184260>
llvm-svn: 154079
Only record IVUsers that are dominated by simplified loop
headers. Otherwise SCEVExpander will crash while looking for a
preheader.
I previously tried to work around this in LSR itself, but that was
insufficient. This way, LSR can continue to run if some uses are not
in simple loops, as long as we don't attempt to analyze those users.
Fixes <rdar://problem/11049788> Segmentation fault: 11 in LoopStrengthReduce
llvm-svn: 152892
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:
- Always used properlyDominates to check safe code hoisting.
- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.
- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.
- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.
Fixes PR11783: SCEVExpander assert.
llvm-svn: 148535
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.
llvm-svn: 148288
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.
Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.
Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.
On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.
llvm-svn: 147826
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.
In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.
llvm-svn: 147801
This collects a set of IV uses within the loop whose values can be
computed relative to each other in a sequence. Following checkins will
make use of this information.
llvm-svn: 147797
This will be more important as we extend the LSR pass in ways that don't rely on the formula solver. In particular, we need it for constructing IV chains.
llvm-svn: 147724
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.
llvm-svn: 147718
Since we're not rewriting IVs in other loops, there's not much reason
to consider their stride when generating formulae.
This should reduce the number of useless formulas considered by LSR.
llvm-svn: 146302
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.
llvm-svn: 145906
Someone more familiar with LSR should double-check that the extra cast is actually doing the right thing in the overflow cases; I'm not completely confident that's that case.
llvm-svn: 141916
This handles the case in which LSR rewrites an IV user that is a phi and
splits critical edges originating from a switch.
Fixes <rdar://problem/6453893> LSR is not splitting edges "nicely"
llvm-svn: 141059
Rewriting the entire loop nest now requires -enable-lsr-nested.
See PR11035 for some performance data.
A few unit tests specifically test nested LSR, and are now under a flag.
llvm-svn: 140762
The minor bug heuristic was noticed by inspection. I added the
isLoser/isValid helpers because they will become more
important with subsequent checkins.
llvm-svn: 140580
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.
llvm-svn: 131744
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.
llvm-svn: 130743
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
llvm-svn: 129401
properties.
Added the self-wrap flag for SCEV::AddRecExpr.
A slew of temporary FIXMEs indicate the intention of the no-self-wrap flag
without changing behavior in this revision.
llvm-svn: 127590
Natural Loop Information
Loop Pass Manager
Canonicalize natural loops
Scalar Evolution Analysis
Loop Pass Manager
Induction Variable Users
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
into this:
Scalar Evolution Analysis
Loop Pass Manager
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
This fixes <rdar://problem/8869639>. I also filed PR9184 on doing this sort of
thing automatically, but it seems easier to just change the ordering of the
passes if this is the only case.
llvm-svn: 125254
the active loop. This is generally desirable, and it avoids trouble
in situations such as the testcase in PR9123, though the failure
mode depends on use-list order, so it is infeasible to test.
llvm-svn: 125065
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
formulae which become illegal as a result of the offset updating don't
escape.
This is for rdar://8529692. No testcase yet, because the given cases
hit use-list ordering differences.
llvm-svn: 116093
This doesn't usually matter, because the other heuristics usually
succeed regardless, but it's good to keep the register use
bookkeeping consistent.
llvm-svn: 116005
LSRInstance data structures up to date. This fixes some
pessimizations caused by stale data which will be exposed
in an upcoming change.
llvm-svn: 112440
uninteresting, just put all the operands on one list and make
GenerateReassociations make the decision about what's interesting.
This is simpler, and it avoids an extra ScalarEvolution::getAddExpr call.
llvm-svn: 111133
different widths. In a use with a narrower fixup, formulae
may be wider than the fixup, in which case the high bits
aren't necessarily meaningful, so it isn't safe to reuse
them for uses with wider fixups.
This fixes PR7618, though the testcase is too large for a
reasonable regression test, since it heavily dependes on
hitting LSR's heuristics in a certain way.
llvm-svn: 108455
SCEVUnknown values which are loop-variant, as LSR can't do anything
interesting with these values in any case. This fixes very slow compile
times on loops which have large numbers of such values.
llvm-svn: 106897
use sharing map. The reconcileNewOffset logic already forces a
separate use if the kinds differ, so incorporating the kind in the
key means we can track more sharing opportunities.
More sharing means fewer total uses to track, which means smaller
problem sizes, which means the conservative throttles don't kick
in as often.
llvm-svn: 106396
Changed directly instead of using a return value.
Rename FilterOutUndesirableDedicatedRegisters's Changed variable to
distinguish it from LSRInstance's Changed member.
llvm-svn: 104269
operand on the left, the interesting operand is on the right. This
fixes a bug where LSR was failing to recognize ICmpZero uses,
which led it to be unable to reverse the induction variable in the
attached testcase.
Delete test/CodeGen/X86/stack-color-with-reg-2.ll, because its test
is extremely fragile and hard to meaningfully update.
llvm-svn: 104262
LSRUse's Regs set after all pruning is done, rather than trying
to do it on the fly, which can produce an incomplete result.
This fixes a case where heuristic pruning was stripping all
formulae from a use, which led the solver to enter an infinite
loop.
Also, add a few asserts to diagnose this kind of situation.
llvm-svn: 103328
just ask ScalarEvolution for it on demand. This helps IVUsers be more robust
in the case of expressions changing underneath it. This fixes PR6862.
llvm-svn: 101819
into adjacent loops. Also, ensure that the insert position is
dominated by the loop latch of any loop in the post-inc set which
has a latch.
llvm-svn: 100906
explicitly split into stride-and-offset pairs. Also, add the
ability to track multiple post-increment loops on the same expression.
This refines the concept of "normalizing" SCEV expressions used for
to post-increment uses, and introduces a dedicated utility routine for
normalizing and denormalizing expressions.
This fixes the expansion of expressions which are post-increment users
of more than one loop at a time. More broadly, this takes LSR another
step closer to being able to reason about more than one loop at a time.
llvm-svn: 100699
induction variable value and a loop-variant value, don't force the
insert position to be at the post-increment position, because it may
not be dominated by the loop-variant value. This fixes a
use-before-def problem noticed on PPC.
llvm-svn: 96774
strides in foreign loops. This helps locate reuse opportunities
with existing induction variables in foreign loops and reduces
the need for inserting new ones. This fixes rdar://7657764.
llvm-svn: 96629
with multiplication by constants distributed through, occasionally
those subexpressions can include both x and -x. For now, if this
condition is discovered within LSR, just prune such cases away,
as they won't be profitable. This fixes a "zero allocated in a
base register" assertion failure.
llvm-svn: 96177
bug fixes, and with improved heuristics for analyzing foreign-loop
addrecs.
This change also flattens IVUsers, eliminating the stride-oriented
groupings, which makes it easier to work with.
llvm-svn: 95975
loop-variant components, adds must be inserted after the increment.
Keep track of the increment position for this case, and insert
these adds in the correct location.
llvm-svn: 94110