Chris Lattner
6bc98653c2
Make vector narrowing more effective, implementing
...
Transforms/InstCombine/vec_narrow.ll. This add support for narrowing
extract_element(insertelement) also.
llvm-svn: 26538
2006-03-05 00:22:33 +00:00
Chris Lattner
4c065091d8
Add factoring of multiplications, e.g. turning A*A+A*B into A*(A+B).
...
Testcase here: Transforms/Reassociate/mulfactor.ll
llvm-svn: 26524
2006-03-04 09:31:13 +00:00
Chris Lattner
32c01df299
Canonicalize (X+C1)*C2 -> X*C2+C1*C2
...
This implements Transforms/InstCombine/add.ll:test31
llvm-svn: 26519
2006-03-04 06:04:02 +00:00
Chris Lattner
681ef2f083
Change this to work with renamed intrinsics.
...
llvm-svn: 26484
2006-03-03 01:34:17 +00:00
Chris Lattner
85dda9a2bd
Generalize the REM folding code to handle another case Nick Lewycky
...
pointed out: realize the AND can provide factors and look through Casts.
llvm-svn: 26469
2006-03-02 06:50:58 +00:00
Chris Lattner
c5b6c9a12a
Fix a regression in a patch from a couple of days ago. This fixes
...
Transforms/InstCombine/2006-02-28-Crash.ll
llvm-svn: 26427
2006-02-28 19:47:20 +00:00
Chris Lattner
b70f141893
Implement rem.ll:test[7-9] and PR712
...
llvm-svn: 26415
2006-02-28 05:49:21 +00:00
Chris Lattner
2a7c7b8bab
Simplify some code now that the RHS of a rem can't be 0
...
llvm-svn: 26413
2006-02-28 05:40:55 +00:00
Chris Lattner
0de4a8d7b7
Rearrange some code, fold "rem X, 0", implementing rem.ll:test6
...
llvm-svn: 26411
2006-02-28 05:30:45 +00:00
Chris Lattner
c7bfed0f7b
Merge two almost-identical pieces of code.
...
Make this code more powerful by using ComputeMaskedBits instead of looking
for an AND operand. This lets us fold this:
int %test23(int %a) {
%tmp.1 = and int %a, 1
%tmp.2 = seteq int %tmp.1, 0
%tmp.3 = cast bool %tmp.2 to int ;; xor tmp1, 1
ret int %tmp.3
}
into: xor (and a, 1), 1
llvm-svn: 26396
2006-02-27 02:38:23 +00:00
Chris Lattner
f5c8a0b83f
Fold (A^B) == A -> B == 0
...
and (A-B) == A -> B == 0
llvm-svn: 26394
2006-02-27 01:44:11 +00:00
Chris Lattner
f78df7c14d
Fold (X|C1)^C2 -> X^(C1|C2) when possible. This implements
...
InstCombine/or.ll:test23.
llvm-svn: 26385
2006-02-26 19:57:54 +00:00
Chris Lattner
b580d26e7d
Fix a problem that Nate noticed that boils down to an over conservative check
...
in the code that does "select C, (X+Y), (X-Y) --> (X+(select C, Y, (-Y)))".
We now compile this loop:
LBB1_1: ; no_exit
add r6, r2, r3
subf r3, r2, r3
cmpwi cr0, r2, 0
addi r7, r5, 4
lwz r2, 0(r5)
addi r4, r4, 1
blt cr0, LBB1_4 ; no_exit
LBB1_3: ; no_exit
mr r3, r6
LBB1_4: ; no_exit
cmpwi cr0, r4, 16
mr r5, r7
bne cr0, LBB1_1 ; no_exit
into this instead:
LBB1_1: ; no_exit
srawi r6, r2, 31
add r2, r2, r6
xor r6, r2, r6
addi r7, r5, 4
lwz r2, 0(r5)
addi r4, r4, 1
add r3, r3, r6
cmpwi cr0, r4, 16
mr r5, r7
bne cr0, LBB1_1 ; no_exit
llvm-svn: 26356
2006-02-24 18:05:58 +00:00
Chris Lattner
e5521db5bc
Fix Regression/Transforms/LoopUnswitch/2006-02-22-UnswitchCrash.ll, which
...
caused SPASS to fail building last night.
We can't trivially unswitch a loop if the exit block has phi nodes in it,
because we don't know which predecessor to use.
llvm-svn: 26320
2006-02-22 23:55:00 +00:00
Chris Lattner
8a5a324dac
Add some comments, simplify some code, and fix a bug that caused rewriting
...
to rewrite with the wrong value.
llvm-svn: 26311
2006-02-22 06:37:14 +00:00
Chris Lattner
c2e3a7a4ce
improved support for branch folding, still not enabled.
...
llvm-svn: 26289
2006-02-18 07:57:38 +00:00
Jeff Cohen
0add83e969
Fix bugs identified by VC++.
...
llvm-svn: 26287
2006-02-18 03:20:33 +00:00
Chris Lattner
19fa8ac938
Implement deletion of dead blocks, currently disabled.
...
llvm-svn: 26285
2006-02-18 02:42:34 +00:00
Chris Lattner
cb853de534
a previous patch completely disabled trivial unswitching, this fixees it.
...
Thanks to nate for pointing this out :)
llvm-svn: 26280
2006-02-18 01:32:04 +00:00
Chris Lattner
29f771ba21
initial trivial support for folding branches that have now-constant destinations.
...
llvm-svn: 26279
2006-02-18 01:27:45 +00:00
Chris Lattner
8e44ff50b0
When unswitching a loop, make sure to update loop info with exit blocks in
...
the right loop.
llvm-svn: 26277
2006-02-18 00:55:32 +00:00
Chris Lattner
baddba41c7
Fix loops where the header has an exit, fixing a loop-unswitch crash on crafty
...
llvm-svn: 26258
2006-02-17 06:39:56 +00:00
Chris Lattner
6fd136239b
start of some new simplification code, not thoroughly tested, use at your own
...
risk :)
llvm-svn: 26248
2006-02-17 00:31:07 +00:00
Nate Begeman
8a77efe4f7
Rework the SelectionDAG-based implementations of SimplifyDemandedBits
...
and ComputeMaskedBits to match the new improved versions in instcombine.
Tested against all of multisource/benchmarks on ppc.
llvm-svn: 26238
2006-02-16 21:11:51 +00:00
Chris Lattner
fa335f6083
Change SplitBlock to increment a BasicBlock::iterator, not an Instruction*. Apparently they do different things :)
...
This fixes a testcase that nate reduced from spass.
Also included are a couple minor code changes that don't affect the generated
code at all.
llvm-svn: 26235
2006-02-16 19:36:22 +00:00
Jeff Cohen
55f63f1b53
Fix VC++ warning.
...
llvm-svn: 26228
2006-02-16 04:07:37 +00:00
Chris Lattner
ff42e81028
fix a bug where we unswitched the wrong way
...
llvm-svn: 26225
2006-02-16 01:24:41 +00:00
Chris Lattner
fdff0bb43e
Implement trivial unswitching for switch stmts. This allows us to trivial
...
unswitch this loop on 2 before sweating to unswitch on 1/3.
void test4(int N, int i, int C, int*P, int*Q) {
int j;
for (j = 0; j < N; ++j) {
switch (C) { // general unswitching.
default: P[i+j] = 0; break;
case 1: Q[i+j] = 0; break;
case 3: P[i+j] = Q[i+j]; break;
case 2: break; // TRIVIAL UNSWITCH on C==2
}
}
}
llvm-svn: 26223
2006-02-15 22:52:05 +00:00
Chris Lattner
e5cb76d744
make "trivial" unswitching significantly more general. It can now handle
...
this for example:
for (j = 0; j < N; ++j) { // trivial unswitch
if (C)
P[i+j] = 0;
}
turning it into the obvious code without bothering to duplicate an empty loop.
llvm-svn: 26220
2006-02-15 22:03:36 +00:00
Chris Lattner
65152d80ec
Checking the wrong value. This caused us to emit silly code like
...
Y = seteq bool X, true
instead of just using X :)
llvm-svn: 26215
2006-02-15 19:05:52 +00:00
Chris Lattner
01db04efb0
more refactoring, no functionality change.
...
llvm-svn: 26194
2006-02-15 01:44:42 +00:00
Chris Lattner
b0cbe7106e
pull some code out into a function
...
llvm-svn: 26191
2006-02-15 00:07:43 +00:00
Chris Lattner
0b8ec1a132
Use statistics to keep track of what flavors of loops we are unswitching
...
llvm-svn: 26157
2006-02-14 01:01:41 +00:00
Chris Lattner
8b10ab3002
Implement Instcombine/and.ll:test34
...
llvm-svn: 26155
2006-02-13 23:07:23 +00:00
Chris Lattner
7d8522884b
If any of the sign extended bits are demanded, the input sign bit is demanded
...
for a sign extension.
This fixes InstCombine/2006-02-13-DemandedMiscompile.ll and Ptrdist/bc.
llvm-svn: 26152
2006-02-13 22:41:07 +00:00
Chris Lattner
68e7475777
Be careful not to request or look at bits shifted in from outside the size
...
of the input. This fixes the mediabench/gsm/toast failure last night.
llvm-svn: 26138
2006-02-13 06:09:08 +00:00
Chris Lattner
f5b4ef7f58
remove some more dead special case code
...
llvm-svn: 26135
2006-02-12 08:07:37 +00:00
Chris Lattner
5b2edb1fca
Eliminate special case hacks that are superceded by general purpose hacks
...
llvm-svn: 26134
2006-02-12 08:02:11 +00:00
Chris Lattner
ee0f280743
Three changes:
...
1. Teach GetConstantInType to handle boolean constants.
2. Teach instcombine to fold (compare X, CST) when X has known 0/1 bits.
Testcase here: set.ll:test22
3. Improve the "(X >> c1) & C2 == 0" folding code to allow a noop cast
between the shift and and. More aggressive bitfolding for other reasons
was turning signed shr's into unsigned shr's, leaving the noop cast in
the way.
llvm-svn: 26131
2006-02-12 02:07:56 +00:00
Chris Lattner
0157e7f55b
Port the recent innovations in ComputeMaskedBits to SimplifyDemandedBits.
...
This allows us to simplify on conditions where bits are not known, but they
are not demanded either! This also fixes a couple of bugs in
ComputeMaskedBits that were exposed during this work.
In the future, swaths of instcombine should be removed, as this code
subsumes a bunch of ad-hockery.
llvm-svn: 26122
2006-02-11 09:31:47 +00:00
Chris Lattner
fbadd7e1ee
implement unswitching of loops with switch stmts and selects in them
...
llvm-svn: 26114
2006-02-11 00:43:37 +00:00
Chris Lattner
f1b151684d
Update PHI nodes in successors of exit blocks.
...
llvm-svn: 26113
2006-02-10 23:26:14 +00:00
Chris Lattner
fe4151efe7
Reform the unswitching code in terms of edge splitting, not block splitting.
...
llvm-svn: 26112
2006-02-10 23:16:39 +00:00
Chris Lattner
ec6b40a093
Fix a case where UnswitchTrivialCondition broke critical edges with
...
phi's in the successors
llvm-svn: 26108
2006-02-10 19:08:15 +00:00
Chris Lattner
6e263155a6
add some notes, move some code around. Implement unswitching of loops
...
with branches on partially invariant computations.
llvm-svn: 26104
2006-02-10 02:30:37 +00:00
Chris Lattner
4935417a84
Move code around to be more logical, no functionality change.
...
llvm-svn: 26103
2006-02-10 02:01:22 +00:00
Chris Lattner
3fc3148b85
When unswitching a trivial loop, do admit we are doing it! :)
...
llvm-svn: 26102
2006-02-10 01:36:35 +00:00
Chris Lattner
ed7a67b0de
Implement unconditional unswitching of 'trivial' loops, those loops that contain
...
branches in their entry block that control whether or not the loop is a noop or not.
llvm-svn: 26101
2006-02-10 01:24:09 +00:00
Chris Lattner
4f0e66df6a
Simplify control flow a bit, note that unswitch preserves canonical loop form
...
llvm-svn: 26098
2006-02-09 22:15:42 +00:00
Chris Lattner
8976219850
Make the threshold a parameter
...
llvm-svn: 26093
2006-02-09 20:15:48 +00:00
Chris Lattner
2826e0511b
Simplify the loop-unswitch pass, by not even trying to unswitch loops with
...
uses of loop values outside the loop. We need loop-closed SSA form to do
this right, or to use SSA rewriting if we really care.
llvm-svn: 26089
2006-02-09 19:14:52 +00:00
Chris Lattner
24cd2fa269
Fix 80-column violations
...
llvm-svn: 26088
2006-02-09 07:41:14 +00:00
Chris Lattner
4534dd59a3
Enhance MVIZ in three ways:
...
1. Teach it new tricks: in particular how to propagate through signed shr and sexts.
2. Teach it to return a bitset of known-1 and known-0 bits, instead of just zero.
3. Teach instcombine (AND X, C) to fold when we know all C bits of X.
This implements Regression/Transforms/InstCombine/bittest.ll, and allows
future things to be simplified.
llvm-svn: 26087
2006-02-09 07:38:58 +00:00
Chris Lattner
ab2dc4d70d
Simplify some code, reducing calls to MaskedValueIsZero. Implement a minor
...
optimization where we reduce the number of bits in AND masks when possible.
llvm-svn: 26056
2006-02-08 07:34:50 +00:00
Chris Lattner
5997cf9381
Use EraseInstFromFunction in a few cases to put the uses of the removed
...
instruction onto the worklist (in case they are now dead).
Add a really trivial local DSE implementation to help out bitfield code.
We now fold this:
struct S {
unsigned char a : 1, b : 1, c : 1, d : 2, e : 3;
S();
};
S::S() : a(0), b(0), c(1), d(0), e(6) {}
to this:
void %_ZN1SC1Ev(%struct.S* %this) {
entry:
%tmp.1 = getelementptr %struct.S* %this, int 0, uint 0
store ubyte 38, ubyte* %tmp.1
ret void
}
much earlier (in gccas instead of only in gccld after DSE runs).
llvm-svn: 26050
2006-02-08 03:25:32 +00:00
Chris Lattner
06a0ed1ee0
Implement some more interesting select sccp cases. This implements:
...
test/Regression/Transforms/SCCP/select.ll
llvm-svn: 26049
2006-02-08 02:38:11 +00:00
Chris Lattner
ddba3289b5
Fix a problem in my patch yesterday, causing a miscompilation of 176.gcc
...
llvm-svn: 26045
2006-02-08 01:20:23 +00:00
Chris Lattner
44314827d6
Fix Transforms/InstCombine/2006-02-07-SextZextCrash.ll
...
llvm-svn: 26040
2006-02-07 19:07:40 +00:00
Chris Lattner
92a6865321
Generalize MaskedValueIsZero into a ComputeMaskedNonZeroBits function, which
...
is just as efficient as MVIZ and is also more general.
Fix a few minor bugs introduced in recent patches
llvm-svn: 26036
2006-02-07 08:05:22 +00:00
Chris Lattner
c3ebf40031
Make MaskedValueIsZero take a uint64_t instead of a ConstantIntegral as a
...
mask. This allows the code to be simpler and more efficient.
Also, generalize some of the cases in MVIZ a bit, making it slightly more aggressive.
llvm-svn: 26035
2006-02-07 07:27:52 +00:00
Chris Lattner
77defbae0a
Use Type::getIntegralTypeMask() to simplify some code
...
llvm-svn: 26034
2006-02-07 07:00:41 +00:00
Chris Lattner
2590e511d8
Implement the beginnings of a facility for simplifying expressions based on
...
'demanded bits', inspired by Nate's work in the dag combiner. This isn't
complete, but needs to unrelated instcombiner changes to continue.
llvm-svn: 26033
2006-02-07 06:56:34 +00:00
Chris Lattner
2e90b732fa
Turn A % (C << N), where C is 2^k, into A & ((C << N)-1) [urem only].
...
Turn A / (C1 << N), where C1 is "1<<C2" into A >> (N+C2) [udiv only].
Tested with: rem.ll:test5, div.ll:test10
llvm-svn: 26003
2006-02-05 07:54:04 +00:00
Chris Lattner
d30c4991a1
Use SCEVExpander::InsertCastOfTo instead of our own code. This reduces
...
#LLVM LOC, and auto-cse's cast instructions.
llvm-svn: 25974
2006-02-04 09:52:43 +00:00
Chris Lattner
2959f0003e
Fix two significant bugs in LSR:
...
1. When rewriting code in outer loops, sometimes we would insert code into
inner loops that is invariant in that loop.
2. Notice that 4*(2+x) is 8+4*x and use that to simplify expressions.
This is a performance neutral change.
llvm-svn: 25964
2006-02-04 07:36:50 +00:00
Jeff Cohen
15a8c15a1f
Improve compatibility with VC2005, patch by Morten Ofstad!
...
llvm-svn: 25661
2006-01-26 20:41:32 +00:00
Chris Lattner
c0f633a598
Fix Regression/Transforms/ScalarRepl/2006-01-24-IllegalUnionPromoteCrash.ll
...
llvm-svn: 25587
2006-01-24 19:36:27 +00:00
Chris Lattner
c597b8a55e
Make iostream #inclusion explicit
...
llvm-svn: 25514
2006-01-22 23:32:06 +00:00
Chris Lattner
e154abf9b3
Implement casts.ll:test26: a cast from float -> double -> integer, doesn't
...
need the float->double part.
llvm-svn: 25452
2006-01-19 07:40:22 +00:00
Robert Bocchino
6dce25019d
Lowerpacked and SCCP support for the insertelement operation.
...
llvm-svn: 25406
2006-01-17 20:06:55 +00:00
Chris Lattner
307b7ea15f
fix a crash due to missing parens
...
llvm-svn: 25363
2006-01-16 19:47:21 +00:00
Chris Lattner
0de2c7d3d8
This pass has never worked correctly. Remove.
...
llvm-svn: 25349
2006-01-16 01:06:00 +00:00
Chris Lattner
ef530c24c1
FunctionPass's cannot do IPO things.
...
llvm-svn: 25315
2006-01-14 19:30:35 +00:00
Robert Bocchino
a83529678e
Added instcombine support for extractelement.
...
llvm-svn: 25299
2006-01-13 22:48:06 +00:00
Chris Lattner
503221f5c5
Do a simple instcombine xforms to delete llvm.stackrestore cases.
...
llvm-svn: 25294
2006-01-13 21:28:09 +00:00
Chris Lattner
c66b223b28
Simplify this a tiny bit by using the new IntrinsicInst functionality.
...
llvm-svn: 25292
2006-01-13 20:11:04 +00:00
Chris Lattner
cb36710ff9
Switch these to using ETForest instead of DominatorSet to compute itself.
...
Patch written by Daniel Berlin!
llvm-svn: 25202
2006-01-11 05:10:20 +00:00
Chris Lattner
48e4a2ebd8
Switch this to using ETForest instead of DominatorSet to compute itself.
...
Patch written by Daniel Berlin!
llvm-svn: 25201
2006-01-11 05:09:40 +00:00
Robert Bocchino
bd518d153b
Added lower packed support for the extractelement operation.
...
llvm-svn: 25180
2006-01-10 19:05:05 +00:00
Chris Lattner
9cbfbc21bb
fix some 176.gcc miscompilation from my previous patch.
...
llvm-svn: 25137
2006-01-07 01:32:28 +00:00
Chris Lattner
330628a6d8
silence some bogus gcc warnings on fenris
...
llvm-svn: 25130
2006-01-06 17:59:59 +00:00
Chris Lattner
eb372a0276
Enhance the shift-shift folding code to allow a no-op cast to occur in between
...
the shifts.
This allows us to fold this (which is the 'integer add a constant' sequence
from cozmic's scheme compmiler):
int %x(uint %anf-temporary776) {
%anf-temporary777 = shr uint %anf-temporary776, ubyte 1
%anf-temporary800 = cast uint %anf-temporary777 to int
%anf-temporary804 = shl int %anf-temporary800, ubyte 1
%anf-temporary805 = add int %anf-temporary804, -2
%anf-temporary806 = or int %anf-temporary805, 1
ret int %anf-temporary806
}
into this:
int %x(uint %anf-temporary776) {
%anf-temporary776 = cast uint %anf-temporary776 to int
%anf-temporary776.mask1 = add int %anf-temporary776, -2
%anf-temporary805 = or int %anf-temporary776.mask1, 1
ret int %anf-temporary805
}
note that instcombine already knew how to eliminate the AND that the two
shifts fold into. This is tested by InstCombine/shift.ll:test26
-Chris
llvm-svn: 25128
2006-01-06 07:52:12 +00:00
Chris Lattner
b330939d90
Simplify the code a bit more
...
llvm-svn: 25126
2006-01-06 07:22:22 +00:00
Chris Lattner
145539343f
Extract a bunch of code out of visitShiftInst into FoldShiftByConstant. No
...
functionality changes.
llvm-svn: 25125
2006-01-06 07:12:35 +00:00
Duraid Madina
7a3ad6cae2
getting there...
...
llvm-svn: 25021
2005-12-26 13:48:44 +00:00
Chris Lattner
8c9e14620f
Fix Transforms/ScalarRepl/2005-12-14-UnionPromoteCrash.ll, a crash on undefined
...
behavior in 126.gcc on big-endian systems.
llvm-svn: 24708
2005-12-14 17:23:59 +00:00
Chris Lattner
3b0a62d8a5
Implement a little hack for parity with GCC on crafty. This speeds up
...
186.crafty by about 16% (from 15.109s to 13.045s) on my system.
This turns allocas with unions/casts into scalars. For example crafty has
something like this:
union doub {
unsigned short i[4];
long long d;
};
int f(long long a) {
return ((union doub){.d=a}).i[1];
}
Instead of generating loads and stores to an alloca, we now promote the
whole thing to a scalar long value.
This implements: Transforms/ScalarRepl/AggregatePromote.ll
llvm-svn: 24667
2005-12-12 07:19:13 +00:00
Chris Lattner
077200737c
getRawValue zero extens for unsigned values, use getsextvalue so that we
...
know that small negative values fit into the immediate field of addressing
modes.
llvm-svn: 24608
2005-12-05 18:23:57 +00:00
Chris Lattner
dc4ffef633
Fix a bug where we didn't realize that vaarg reads memory. This fixes
...
Transforms/DeadStoreElimination/2005-11-30-vaarg.ll
llvm-svn: 24545
2005-11-30 19:38:22 +00:00
Andrew Lenharth
5fc3794e71
since reg2mem requires it, might as well mention that it preserves it
...
llvm-svn: 24491
2005-11-25 16:04:54 +00:00
Andrew Lenharth
061029dee2
Reg2Mem is something a pass may depend on, so allow that
...
llvm-svn: 24488
2005-11-22 22:14:23 +00:00
Andrew Lenharth
71b09bbb07
turns out, demotion and invokes and critical edges don't mix
...
llvm-svn: 24487
2005-11-22 21:45:19 +00:00
Chris Lattner
9c37f23645
Fix a crash building 176.gcc due to my recent patch, which only fixed
...
half the problem.
llvm-svn: 24414
2005-11-18 18:30:47 +00:00
Chris Lattner
bca0be812d
This was checking the wrong GEP expression. Fixing this fixes a gccas crash
...
compiling mysql reported by Ted Kremenek.
llvm-svn: 24402
2005-11-17 19:35:42 +00:00
Andrew Lenharth
d9c13b1336
the pain isn't gone unless the phinodes are spilled too
...
llvm-svn: 24288
2005-11-10 19:39:09 +00:00
Andrew Lenharth
8e66c0c8a9
this works with backedges to the existing entry block alot better
...
llvm-svn: 24270
2005-11-10 17:35:34 +00:00
Andrew Lenharth
4130a4f061
The pass everyone has been waiting for!
...
Reg2Mem
for fun you can opt -reg2mem -mem2reg
llvm-svn: 24267
2005-11-10 01:58:38 +00:00
Nate Begeman
848622f87f
Add support alignment of allocation instructions.
...
Add support for specifying alignment and size of setjmp jmpbufs.
No targets currently do anything with this information, nor is it presrved
in the bytecode representation. That's coming up next.
llvm-svn: 24196
2005-11-05 09:21:28 +00:00
Chris Lattner
16b29e9562
Implement Transforms/TailCallElim/return-undef.ll, a trivial case
...
that has been sitting in my inbox since May 18. :)
llvm-svn: 24194
2005-11-05 08:21:11 +00:00
Chris Lattner
dd0c174082
Turn sdiv into udiv if both operands have a clear sign bit. This occurs
...
a few times in crafty:
OLD: %tmp.36 = div int %tmp.35, 8 ; <int> [#uses=1]
NEW: %tmp.36 = div uint %tmp.35, 8 ; <uint> [#uses=0]
OLD: %tmp.19 = div int %tmp.18, 8 ; <int> [#uses=1]
NEW: %tmp.19 = div uint %tmp.18, 8 ; <uint> [#uses=0]
OLD: %tmp.117 = div int %tmp.116, 8 ; <int> [#uses=1]
NEW: %tmp.117 = div uint %tmp.116, 8 ; <uint> [#uses=0]
OLD: %tmp.92 = div int %tmp.91, 8 ; <int> [#uses=1]
NEW: %tmp.92 = div uint %tmp.91, 8 ; <uint> [#uses=0]
Which all turn into shrs.
llvm-svn: 24190
2005-11-05 07:40:31 +00:00
Chris Lattner
e9ff0eaf5b
Turn srem -> urem when neither input has their sign bit set. This triggers
...
8 times in vortex, allowing the srems to be turned into shrs:
OLD: %tmp.104 = rem int %tmp.5.i37, 16 ; <int> [#uses=1]
NEW: %tmp.104 = rem uint %tmp.5.i37, 16 ; <uint> [#uses=0]
OLD: %tmp.98 = rem int %tmp.5.i24, 16 ; <int> [#uses=1]
NEW: %tmp.98 = rem uint %tmp.5.i24, 16 ; <uint> [#uses=0]
OLD: %tmp.91 = rem int %tmp.5.i19, 8 ; <int> [#uses=1]
NEW: %tmp.91 = rem uint %tmp.5.i19, 8 ; <uint> [#uses=0]
OLD: %tmp.88 = rem int %tmp.5.i14, 8 ; <int> [#uses=1]
NEW: %tmp.88 = rem uint %tmp.5.i14, 8 ; <uint> [#uses=0]
OLD: %tmp.85 = rem int %tmp.5.i9, 1024 ; <int> [#uses=2]
NEW: %tmp.85 = rem uint %tmp.5.i9, 1024 ; <uint> [#uses=0]
OLD: %tmp.82 = rem int %tmp.5.i, 512 ; <int> [#uses=2]
NEW: %tmp.82 = rem uint %tmp.5.i1, 512 ; <uint> [#uses=0]
OLD: %tmp.48.i = rem int %tmp.5.i.i161, 4 ; <int> [#uses=1]
NEW: %tmp.48.i = rem uint %tmp.5.i.i161, 4 ; <uint> [#uses=0]
OLD: %tmp.20.i2 = rem int %tmp.5.i.i, 4 ; <int> [#uses=1]
NEW: %tmp.20.i2 = rem uint %tmp.5.i.i, 4 ; <uint> [#uses=0]
it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61)
so the payoff isn't as great.
llvm-svn: 24189
2005-11-05 07:28:37 +00:00
Andrew Lenharth
662295587d
make this 64 bit clean, fixed test30 of /Regression/Transforms/InstCombine/add.ll
...
llvm-svn: 24158
2005-11-02 18:35:40 +00:00
Chris Lattner
09efd4e5b6
Limit the search depth of MaskedValueIsZero to 6 instructions, to avoid
...
bad cases. This fixes Markus's second testcase in PR639, and should
seal it for good.
llvm-svn: 24123
2005-10-31 18:35:52 +00:00
Chris Lattner
27d351f159
This pass is now obsolete since all targets have moved to the SelectionDAG
...
infrastructure and the simple isels have been removed.
llvm-svn: 24090
2005-10-29 05:33:46 +00:00
Chris Lattner
8f663e8bbc
Pull some code out into a function, give it the ability to see through +.
...
This allows us to turn code like malloc(4*x+4) -> malloc int, (x+1)
llvm-svn: 24081
2005-10-29 04:36:15 +00:00
Chris Lattner
8270c33606
Remove a special case, allowing the general case to handle it. No functionality
...
change.
llvm-svn: 24076
2005-10-29 03:19:53 +00:00
Chris Lattner
b9d3ca5c3c
Fix a bit of backwards logic that broke exptree and smg2000
...
llvm-svn: 24056
2005-10-28 16:27:35 +00:00
Chris Lattner
c4f67e67d2
Do not sink any instruction with side effects, including vaarg. This fixes
...
PR640
llvm-svn: 24046
2005-10-27 17:13:11 +00:00
Chris Lattner
c6372cca78
Fix typo
...
llvm-svn: 24033
2005-10-27 06:26:26 +00:00
Chris Lattner
0fe7551bc0
Teach instcombine to promote stuff like (cast (malloc sbyte, 8*X) to int*)
...
into: malloc int, (2*X)
llvm-svn: 24032
2005-10-27 06:24:46 +00:00
Chris Lattner
b3ecf96900
Promote cases like cast (malloc sbyte, 100) to int* into
...
(malloc [25 x int]) directly without having to convert to
(malloc [100 x sbyte]) first.
llvm-svn: 24031
2005-10-27 06:12:00 +00:00
Chris Lattner
bb17180a23
Minor change to this file to support obscure cases with constant array amounts
...
llvm-svn: 24030
2005-10-27 05:53:56 +00:00
Chris Lattner
38a1b00a0f
fold nested and's early to avoid inefficiencies in MaskedValueIsZero. This
...
fixes a very slow compile in PR639.
llvm-svn: 24011
2005-10-26 17:18:16 +00:00
Jeff Cohen
2b8cbf319c
Update Visual Studio projects to reflect moved file.
...
llvm-svn: 23998
2005-10-26 05:36:51 +00:00
Chris Lattner
46705b2f2d
Handle allocations that, even after removing dead uses, still have more than
...
one use (but one is a cast). This handles the very common case of:
X = alloc [n x byte]
Y = cast X to somethingbetter
seteq X, null
In order to avoid infinite looping when there are multiple casts, we only
allow this if the xform is strictly increasing the alignment of the
allocation.
llvm-svn: 23961
2005-10-24 06:35:18 +00:00
Chris Lattner
355ecc09f8
Fix a bug where we would 'promote' an allocation from one type to another
...
where the second has less alignment required. If we had explicit alignment
support in the IR, we could handle this case, but we can't until we do.
llvm-svn: 23960
2005-10-24 06:26:18 +00:00
Chris Lattner
ac87beb03a
Before promoting a malloc type, remove dead uses. This makes instcombine
...
more effective at promoting these allocations, catching them earlier in the
compile process.
llvm-svn: 23959
2005-10-24 06:22:12 +00:00
Chris Lattner
216be91817
Pull some code out into a function, no functionality change
...
llvm-svn: 23958
2005-10-24 06:03:58 +00:00
Chris Lattner
bde3845548
DONT_BUILD_RELINKED is gone and implied by BUILD_ARCHIVE now
...
llvm-svn: 23940
2005-10-24 02:26:13 +00:00
Chris Lattner
8c087e962c
Only build .a file versions of these libraries, instead of .a and .o versions.
...
This should speed up build times.
llvm-svn: 23933
2005-10-24 01:59:48 +00:00
Chris Lattner
bd77fac034
Make sure that anything using the ADCE pass pulls in the UnifyFunctionExitNodes
...
code
llvm-svn: 23931
2005-10-24 01:40:23 +00:00
Jeff Cohen
11e26b52b2
When a function takes a variable number of pointer arguments, with a zero
...
pointer marking the end of the list, the zero *must* be cast to the pointer
type. An un-cast zero is a 32-bit int, and at least on x86_64, gcc will
not extend the zero to 64 bits, thus allowing the upper 32 bits to be
random junk.
The new END_WITH_NULL macro may be used to annotate a such a function
so that GCC (version 4 or newer) will detect the use of un-casted zero
at compile time.
llvm-svn: 23888
2005-10-23 04:37:20 +00:00
Chris Lattner
5df0e36e98
My previous patch was too conservative. Reject FP and void types, but do
...
allow pointer types.
llvm-svn: 23859
2005-10-21 05:45:41 +00:00
Chris Lattner
0c0b38bb4c
Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an
...
inner loop like this:
LBB_RateConvertMono8AltiVec_2: ; no_exit
lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
fmr f3, f3
fadd f0, f2, f0
fadd f3, f0, f3
fcmpu cr0, f3, f1
bge cr0, LBB_RateConvertMono8AltiVec_2 ; no_exit
to an inner loop like this:
LBB_RateConvertMono8AltiVec_1: ; no_exit
fsub f2, f2, f1
fcmpu cr0, f2, f1
fmr f0, f2
bge cr0, LBB_RateConvertMono8AltiVec_1 ; no_exit
Doh! good catch!
llvm-svn: 23838
2005-10-20 04:47:10 +00:00
Chris Lattner
da1b152c43
Make this work for FP constantexprs
...
llvm-svn: 23773
2005-10-17 20:18:38 +00:00
Chris Lattner
7fde91e365
Oops, X+0.0 isn't foldable, but X+-0.0 is.
...
llvm-svn: 23772
2005-10-17 17:56:38 +00:00
Chris Lattner
32979336a7
relax this a bit, as we only support the default rounding mode
...
llvm-svn: 23771
2005-10-17 17:49:32 +00:00
Chris Lattner
192cd18f53
Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling
...
out CSE's of base expressions it could build a result whose order was
nondet.
llvm-svn: 23698
2005-10-11 18:41:04 +00:00
Chris Lattner
5c9d63da31
Fix another problem where LSR was being nondeterminstic. Also remove elements
...
from the end of a vector instead of the beginning
llvm-svn: 23697
2005-10-11 18:30:57 +00:00
Chris Lattner
b7a3894e7c
Fix another lsr-is-nondeterministic case
...
llvm-svn: 23695
2005-10-11 18:17:57 +00:00
Chris Lattner
03b9eb506c
Make MaskedValueIsZero a bit more aggressive
...
llvm-svn: 23677
2005-10-09 22:08:50 +00:00
Chris Lattner
62010c450f
Fix funky xcode indentation
...
llvm-svn: 23674
2005-10-09 06:36:35 +00:00
Chris Lattner
eb4be8b942
Hrm, you didn't see this.
...
llvm-svn: 23673
2005-10-09 06:24:02 +00:00
Chris Lattner
4ea0a3eaac
Fix a source of non-determinism in the backend: the order of processing
...
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.
llvm-svn: 23672
2005-10-09 06:20:55 +00:00
Jeff Cohen
572910c9a2
Remove useless variable.
...
llvm-svn: 23656
2005-10-07 05:28:29 +00:00
Chris Lattner
f07a587c79
Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In
...
particular, it should realize that phi's use their values in the pred block
not the phi block itself. This change turns our em3d loop from this:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r2, 0
b LBB_test_6 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
or r2, r6, r6
lwz r6, 0(r3)
cmpw cr0, r6, r5
beq cr0, LBB_test_6 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r2, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; endif.loopexit.loopexit_crit_edge
addi r3, r2, 1
blr
LBB_test_6: ; loopexit
or r3, r2, r2
blr
into:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r2, 0
b LBB_test_5 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
or r2, r6, r6
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
or r2, r6, r6
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; loopexit
or r3, r2, r2
blr
Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow. If it were doing its job right, it could turn the
code into this:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r6, 0
b LBB_test_5 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; loopexit
or r3, r6, r6
blr
... which I'll work on next. :)
llvm-svn: 23604
2005-10-03 02:50:05 +00:00
Chris Lattner
e4ed42a426
Refactor some code into a function
...
llvm-svn: 23603
2005-10-03 01:04:44 +00:00
Chris Lattner
360928dbed
This break is bogus and I have no idea why it was there. Basically it prevents
...
memoizing code when IV's are used by phinodes outside of loops. In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):
li r6, 0
or r7, r6, r6
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
or r2, r7, r7
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r2, r7, 1
addi r7, r7, 1
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
Now we get:
li r6, 0
LBB_test_3: ; no_exit
or r2, r6, r6
lwz r6, 0(r3)
cmpw cr0, r6, r5
beq cr0, LBB_test_6 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r2, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
this was noticed in em3d.
llvm-svn: 23602
2005-10-03 00:37:33 +00:00
Chris Lattner
8fcce170cf
when checking if we should move a split edge block outside of a loop,
...
check the presplit pred, not the post-split pred. This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.
llvm-svn: 23601
2005-10-03 00:31:52 +00:00
Jeff Cohen
f8a5e5ae6e
Fix VC++ warnings.
...
llvm-svn: 23579
2005-10-01 03:57:14 +00:00
Chris Lattner
a554c9470b
Insert stores after phi nodes in the normal dest. This fixes
...
LowerInvoke/2005-08-03-InvokeWithPHI.ll
llvm-svn: 23525
2005-09-29 17:44:20 +00:00
Chris Lattner
3b63bb375c
add a note about a way to improve this code further, that I won't be getting
...
to right now.
llvm-svn: 23485
2005-09-27 22:44:59 +00:00
Chris Lattner
e285f5ed8f
Avoid spilling stack slots... to stack slots.
...
llvm-svn: 23478
2005-09-27 21:33:12 +00:00
Chris Lattner
87eb249300
Completely rewrite 'correct' eh support. This changes how setjmp insertion
...
is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function. This patch has the following perks:
1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
now about 4% faster than GCC.
Further improvements are also possible.
llvm-svn: 23477
2005-09-27 21:18:17 +00:00
Chris Lattner
92233d2175
Make the pass name simpler
...
llvm-svn: 23476
2005-09-27 21:10:32 +00:00
Chris Lattner
02ae21e1e0
Eliminate GetGEPGlobalInitializer in favor of the more powerful
...
ConstantFoldLoadThroughGEPConstantExpr function in the utils lib.
llvm-svn: 23446
2005-09-26 05:28:52 +00:00
Chris Lattner
0b011ec8e2
Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils
...
as ConstantFoldLoadThroughGEPConstantExpr.
llvm-svn: 23445
2005-09-26 05:28:06 +00:00
Chris Lattner
0b3557f54a
Move MaskedValueIsZero up.
...
Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll
llvm-svn: 23428
2005-09-24 23:43:33 +00:00
Chris Lattner
b4b2530a1a
Refactor this code a bit and make it more general. This now compiles:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) { b.j += x; }
To:
_plus2:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
slwi r3, r3, 6
add r3, r4, r3
rlwimi r3, r4, 0, 26, 14
stw r3, 0(r2)
blr
instead of:
_plus2:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
rlwinm r5, r4, 26, 21, 31
add r3, r5, r3
rlwimi r4, r3, 6, 15, 25
stw r4, 0(r2)
blr
by eliminating an 'and'.
I'm pretty sure this is as small as we can go :)
llvm-svn: 23386
2005-09-18 07:22:02 +00:00
Chris Lattner
797dee7705
Compile
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) {
b.j += x;
}
to:
plus2:
mov %EAX, DWORD PTR [b]
mov %ECX, %EAX
and %ECX, 131008
mov %EDX, DWORD PTR [%ESP + 4]
shl %EDX, 6
add %EDX, %ECX
and %EDX, 131008
and %EAX, -131009
or %EDX, %EAX
mov DWORD PTR [b], %EDX
ret
instead of:
plus2:
mov %EAX, DWORD PTR [b]
mov %ECX, %EAX
shr %ECX, 6
and %ECX, 2047
add %ECX, DWORD PTR [%ESP + 4]
shl %ECX, 6
and %ECX, 131008
and %EAX, -131009
or %ECX, %EAX
mov DWORD PTR [b], %ECX
ret
llvm-svn: 23385
2005-09-18 06:30:59 +00:00
Chris Lattner
01f56c68e9
Generalize this transform, using MaskedValueIsZero, allowing us to compile:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) { b.k += x; }
To:
plus3:
mov %EAX, DWORD PTR [%ESP + 4]
shl %EAX, 17
add DWORD PTR [b], %EAX
ret
instead of:
plus3:
mov %EAX, DWORD PTR [%ESP + 4]
shl %EAX, 17
mov %ECX, DWORD PTR [b]
add %EAX, %ECX
and %EAX, -131072
and %ECX, 131071
or %ECX, %EAX
mov DWORD PTR [b], %ECX
ret
llvm-svn: 23384
2005-09-18 06:02:59 +00:00
Chris Lattner
4ebc8ab4e0
fix typeo
...
llvm-svn: 23383
2005-09-18 05:25:20 +00:00
Chris Lattner
e5b23a6d67
Remove unintentionally committed code
...
llvm-svn: 23382
2005-09-18 05:12:51 +00:00
Chris Lattner
27cb9dbd35
implement shift.ll:test25. This compiles:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) {
b.k += x;
}
to:
_plus3:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r3, 0(r2)
rlwinm r4, r3, 0, 0, 14
add r4, r4, r3
rlwimi r4, r3, 0, 15, 31
stw r4, 0(r2)
blr
instead of:
_plus3:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
srwi r5, r4, 17
add r3, r5, r3
slwi r3, r3, 17
rlwimi r3, r4, 0, 15, 31
stw r3, 0(r2)
blr
llvm-svn: 23381
2005-09-18 05:12:10 +00:00
Chris Lattner
af517574ce
Implement add.ll:test29. Codegening:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus1 (unsigned int x) {
b.i += x;
}
as:
_plus1:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
add r3, r4, r3
rlwimi r3, r4, 0, 0, 25
stw r3, 0(r2)
blr
instead of:
_plus1:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
rlwinm r5, r4, 0, 26, 31
add r3, r5, r3
rlwimi r3, r4, 0, 0, 25
stw r3, 0(r2)
blr
llvm-svn: 23379
2005-09-18 04:24:45 +00:00
Chris Lattner
027eaf01cf
remove debug output
...
llvm-svn: 23377
2005-09-18 03:50:25 +00:00
Chris Lattner
1521298993
Implement or.ll:test21. This teaches instcombine to be able to turn this:
...
struct {
unsigned int bit0:1;
unsigned int ubyte:31;
} sdata;
void foo() {
sdata.ubyte++;
}
into this:
foo:
add DWORD PTR [sdata], 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [sdata]
mov %ECX, %EAX
add %ECX, 2
and %ECX, -2
and %EAX, 1
or %EAX, %ECX
mov DWORD PTR [sdata], %EAX
ret
llvm-svn: 23376
2005-09-18 03:42:07 +00:00
Chris Lattner
a393e4d4b3
Fix the regression last night compiling povray
...
llvm-svn: 23348
2005-09-14 17:32:56 +00:00
Chris Lattner
2a8932960d
Add a simple xform to simplify array accesses with casts in the way.
...
This is useful for 178.galgel where resolution of dope vectors (by the
optimizer) causes the scales to become apparent.
llvm-svn: 23328
2005-09-13 18:36:04 +00:00
Chris Lattner
fd018c8dfe
Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.
...
This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.
llvm-svn: 23327
2005-09-13 02:09:55 +00:00
Chris Lattner
567b81f0d2
Add a helper function, allowing us to simplify some code a bit, changing
...
indentation, no functionality change
llvm-svn: 23325
2005-09-13 00:40:14 +00:00
Chris Lattner
219175c84d
Implement a simple xform to turn code like this:
...
if () { store A -> P; } else { store B -> P; }
into a PHI node with one store, in the most trival case. This implements
load.ll:test10.
llvm-svn: 23324
2005-09-12 23:23:25 +00:00
Chris Lattner
e0bfdf1485
Another load-peephole optimization: do gcse when two loads are next to
...
each other. This implements InstCombine/load.ll:test9
llvm-svn: 23322
2005-09-12 22:21:03 +00:00
Chris Lattner
b990f7d8ed
Implement a trivial form of store->load forwarding where the store and the
...
load are exactly consequtive. This is picked up by other passes, but this
triggers thousands of times in fortran programs that use static locals
(and is thus a compile-time speedup).
llvm-svn: 23320
2005-09-12 22:00:15 +00:00
Chris Lattner
8048b85e8f
Fix a regression from last night, which caused this pass to create invalid
...
code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.
Also use a new LoopInfo method to simplify some code.
This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
llvm-svn: 23318
2005-09-12 17:11:27 +00:00
Chris Lattner
a67648396a
_test:
...
li r2, 0
LBB_test_1: ; no_exit.2
li r5, 0
stw r5, 0(r3)
addi r2, r2, 1
addi r3, r3, 4
cmpwi cr0, r2, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r2, 1
stw r2, 0(r4)
blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):
_test:
li r2, 0 **** IV starts at 0
LBB_test_1: ; no_exit.2
or r5, r2, r2 **** Copy for loop exit
li r2, 0
stw r2, 0(r3)
addi r3, r3, 4
addi r2, r5, 1
addi r6, r5, 2 **** IV+2
cmpwi cr0, r6, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r5, 2 **** IV+2
stw r2, 0(r4)
blr
And now generated code like this:
_test:
li r2, 1 *** IV starts at 1
LBB_test_1: ; no_exit.2
li r5, 0
stw r5, 0(r3)
addi r2, r2, 1
addi r3, r3, 4
cmpwi cr0, r2, 701 *** IV.postinc + 0
blt cr0, LBB_test_1
LBB_test_2: ; loopexit.2.loopexit
stw r2, 0(r4) *** IV.postinc + 0
blr
llvm-svn: 23313
2005-09-12 06:04:47 +00:00
Chris Lattner
530fe6ab30
implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.
...
We used to emit this code for it:
_test:
li r2, 1 ;; Value tying up a register for the whole loop
li r5, 0
LBB_test_1: ; no_exit.2
or r6, r5, r5
li r5, 0
stw r5, 0(r3)
addi r5, r6, 1
addi r3, r3, 4
add r7, r2, r5 ;; should be addi r7, r5, 1
cmpwi cr0, r7, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r6, 2
stw r2, 0(r4)
blr
now we emit this:
_test:
li r2, 0
LBB_test_1: ; no_exit.2
or r5, r2, r2
li r2, 0
stw r2, 0(r3)
addi r3, r3, 4
addi r2, r5, 1
addi r6, r5, 2 ;; whoa, fold those adds!
cmpwi cr0, r6, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r5, 2
stw r2, 0(r4)
blr
more improvement coming.
llvm-svn: 23306
2005-09-10 01:18:45 +00:00
Chris Lattner
b5e381a8cf
Fix a problem that Dan Berlin noticed, where reassociation would not succeed
...
in building maximal expressions before simplifying them. In particular, i
cases like this:
X-(A+B+X)
the code would consider A+B+X to be a maximal expression (not understanding
that the single use '-' would be turned into a + later), simplify it (a noop)
then later get simplified again.
Each of these simplify steps is where the cost of reassociation comes from,
so this patch should speed up the already fast pass a bit.
Thanks to Dan for noticing this!
llvm-svn: 23214
2005-09-02 07:07:58 +00:00
Chris Lattner
9fe263aa75
Avoid creating garbage instructions, just move the old add instruction
...
to where we need it when converting -(A+B+C) -> -A + -B + -C.
llvm-svn: 23213
2005-09-02 06:38:04 +00:00
Chris Lattner
d1325da091
add some assertions and fix problems where reassociate could access the
...
Ops vector out of range
llvm-svn: 23211
2005-09-02 05:23:22 +00:00
Chris Lattner
8ca5b2a6d2
Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll
...
llvm-svn: 23019
2005-08-24 17:55:32 +00:00
Chris Lattner
ea7dfd53d6
Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash
...
on 177.mesa
llvm-svn: 22843
2005-08-17 21:22:41 +00:00
Chris Lattner
2bf7cb5213
Use a new helper to split critical edges, making the code simpler.
...
Do not claim to not change the CFG. We do change the cfg to split critical
edges. This isn't causing us a problem now, but could likely do so in the
future.
llvm-svn: 22824
2005-08-17 06:35:16 +00:00
Chris Lattner
5cf983ee0f
Fix a bad case in gzip where we put lots of things in registers across the
...
loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability
llvm-svn: 22798
2005-08-16 00:38:11 +00:00
Chris Lattner
47d3ec3525
Ooops, don't forget to clear this. The real inner loop is now:
...
.LBB_foo_3: ; no_exit.1
lfd f2, 0(r9)
lfd f3, 8(r9)
fmul f4, f1, f2
fmadd f4, f0, f3, f4
stfd f4, 8(r9)
fmul f3, f1, f3
fmsub f2, f0, f2, f3
stfd f2, 0(r9)
addi r9, r9, 16
addi r8, r8, 1
cmpw cr0, r8, r4
ble .LBB_foo_3 ; no_exit.1
llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner
5949d49032
Recursively scan scev expressions for common subexpressions. This allows us
...
to handle nested loops much better, for example, by being able to tell that
these two expressions:
{( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1>
{(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1>
Have the following common part that can be shared:
{(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1>
This allows us to codegen an important inner loop in 168.wupwise as:
.LBB_foo_4: ; no_exit.1
lfd f2, 16(r9)
fmul f3, f0, f2
fmul f2, f1, f2
fadd f4, f3, f2
stfd f4, 8(r9)
fsub f2, f3, f2
stfd f2, 16(r9)
addi r8, r8, 1
addi r9, r9, 16
cmpw cr0, r8, r4
ble .LBB_foo_4 ; no_exit.1
instead of:
.LBB_foo_3: ; no_exit.1
lfdx f2, r6, r9
add r10, r6, r9
lfd f3, 8(r10)
fmul f4, f1, f2
fmadd f4, f0, f3, f4
stfd f4, 8(r10)
fmul f3, f1, f3
fmsub f2, f0, f2, f3
stfdx f2, r6, r9
addi r9, r9, 16
addi r8, r8, 1
cmpw cr0, r8, r4
ble .LBB_foo_3 ; no_exit.1
llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Chris Lattner
79396539d3
remove dead code. The exit block list is computed on demand, thus does not
...
need to be updated. This code is a relic from when it did.
llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner
8447b49526
When splitting critical edges, make sure not to leave the new block in the
...
middle of the loop. This turns a critical loop in gzip into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2: ; shortcirc_next.0
add r28, r3, r27
lhz r28, 5(r28)
add r26, r4, r27
lhz r26, 5(r26)
cmpw cr0, r28, r26
bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3: ; shortcirc_next.1
add r28, r3, r27
lhz r28, 7(r28)
add r26, r4, r27
lhz r26, 7(r26)
cmpw cr0, r28, r26
bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4: ; shortcirc_next.2
add r28, r3, r27
lhz r26, 9(r28)
add r28, r4, r27
lhz r25, 9(r28)
addi r28, r27, 8
cmpw cr7, r26, r25
mfcr r26, 1
rlwinm r26, r26, 31, 31, 31
add r25, r8, r27
cmpw cr7, r25, r7
mfcr r25, 1
rlwinm r25, r25, 29, 31, 31
and. r26, r26, r25
bne .LBB_test_1 ; loopentry
instead of this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_test_9 ; loopexit
.LBB_test_3: ; shortcirc_next.0
add r28, r3, r27
lhz r28, 5(r28)
add r26, r4, r27
lhz r26, 5(r26)
cmpw cr0, r28, r26
beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4: ; shortcirc_next.0.loopexit_crit_edge
add r2, r11, r27
add r8, r12, r27
b .LBB_test_9 ; loopexit
.LBB_test_5: ; shortcirc_next.1
add r28, r3, r27
lhz r28, 7(r28)
add r26, r4, r27
lhz r26, 7(r26)
cmpw cr0, r28, r26
beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6: ; shortcirc_next.1.loopexit_crit_edge
add r2, r9, r27
add r8, r10, r27
b .LBB_test_9 ; loopexit
.LBB_test_7: ; shortcirc_next.2
add r28, r3, r27
lhz r26, 9(r28)
add r28, r4, r27
lhz r25, 9(r28)
addi r28, r27, 8
cmpw cr7, r26, r25
mfcr r26, 1
rlwinm r26, r26, 31, 31, 31
add r25, r8, r27
cmpw cr7, r25, r7
mfcr r25, 1
rlwinm r25, r25, 29, 31, 31
and. r26, r26, r25
bne .LBB_test_1 ; loopentry
Next up, improve the code for the loop.
llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner
4fec86d348
Fix a FIXME: if we are inserting code for a PHI argument, split the critical
...
edge so that the code is not always executed for both operands. This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops). On gzip, for
example, we turn this ugly code:
.LBB_test_1: ; loopentry
add r27, r3, r28
lhz r27, 3(r27)
add r26, r4, r28
lhz r26, 3(r26)
add r25, r30, r28 ;; Only live if exiting the loop
add r24, r29, r28 ;; Only live if exiting the loop
cmpw cr0, r27, r26
bne .LBB_test_5 ; loopexit
into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_test_9 ; loopexit
.LBB_test_2: ; shortcirc_next.0
...
blt .LBB_test_1
into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_t_3: ; shortcirc_next.0
.LBB_test_3: ; shortcirc_next.0
...
blt .LBB_test_1
Next step: get the block out of the loop so that the loop is all
fall-throughs again.
llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner
62df798919
remove some trickiness that broke yacr2 and some other programs last night
...
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner
f83ce5faee
Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
...
into just Y. This often occurs when it seperates loops that have collapsed loop
headers. This implements LoopSimplify/phi-node-simplify.ll
llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner
677d85784a
Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
...
constant stride. This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll
llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner
edff91a49a
Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
...
For code like this:
void foo(float *a, float *b, int n, int stride_a, int stride_b) {
int i;
for (i=0; i<n; i++)
a[i*stride_a] = b[i*stride_b];
}
we now emit:
.LBB_foo2_2: ; no_exit
lfs f0, 0(r4)
stfs f0, 0(r3)
addi r7, r7, 1
add r4, r2, r4
add r3, r6, r3
cmpw cr0, r7, r5
blt .LBB_foo2_2 ; no_exit
instead of:
.LBB_foo_2: ; no_exit
mullw r8, r2, r7 ;; multiply!
slwi r8, r8, 2
lfsx f0, r4, r8
mullw r8, r2, r6 ;; multiply!
slwi r8, r8, 2
stfsx f0, r3, r8
addi r2, r2, 1
cmpw cr0, r2, r5
blt .LBB_foo_2 ; no_exit
loops with variable strides occur pretty often. For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.
Now we can allow indvars to turn functions written like this:
void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
int i, ai = 0, bi = 0;
for (i=0; i<n; i++)
{
a[ai] = b[bi];
ai += stride_a;
bi += stride_b;
}
}
into code like the above for better analysis. With this patch, they generate
identical code.
llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner
dde7dc525e
Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
...
by being more careful about updating PHI nodes
llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner
c6c4d99a21
Fix some 80 column violations.
...
Once we compute the evolution for a GEP, tell SE about it. This allows users
of the GEP to know it, if the users are not direct. This allows us to compile
this testcase:
void fbSolidFillmmx(int w, unsigned char *d) {
while (w >= 64) {
*(unsigned long long *) (d + 0) = 0;
*(unsigned long long *) (d + 8) = 0;
*(unsigned long long *) (d + 16) = 0;
*(unsigned long long *) (d + 24) = 0;
*(unsigned long long *) (d + 32) = 0;
*(unsigned long long *) (d + 40) = 0;
*(unsigned long long *) (d + 48) = 0;
*(unsigned long long *) (d + 56) = 0;
w -= 64;
d += 64;
}
}
into:
.LBB_fbSolidFillmmx_2: ; no_exit
li r2, 0
stw r2, 0(r4)
stw r2, 4(r4)
stw r2, 8(r4)
stw r2, 12(r4)
stw r2, 16(r4)
stw r2, 20(r4)
stw r2, 24(r4)
stw r2, 28(r4)
stw r2, 32(r4)
stw r2, 36(r4)
stw r2, 40(r4)
stw r2, 44(r4)
stw r2, 48(r4)
stw r2, 52(r4)
stw r2, 56(r4)
stw r2, 60(r4)
addi r4, r4, 64
addi r3, r3, -64
cmpwi cr0, r3, 63
bgt .LBB_fbSolidFillmmx_2 ; no_exit
instead of:
.LBB_fbSolidFillmmx_2: ; no_exit
li r11, 0
stw r11, 0(r4)
stw r11, 4(r4)
stwx r11, r10, r4
add r12, r10, r4
stw r11, 4(r12)
stwx r11, r9, r4
add r12, r9, r4
stw r11, 4(r12)
stwx r11, r8, r4
add r12, r8, r4
stw r11, 4(r12)
stwx r11, r7, r4
add r12, r7, r4
stw r11, 4(r12)
stwx r11, r6, r4
add r12, r6, r4
stw r11, 4(r12)
stwx r11, r5, r4
add r12, r5, r4
stw r11, 4(r12)
stwx r11, r2, r4
add r12, r2, r4
stw r11, 4(r12)
addi r4, r4, 64
addi r3, r3, -64
cmpwi cr0, r3, 63
bgt .LBB_fbSolidFillmmx_2 ; no_exit
llvm-svn: 22737
2005-08-09 23:39:36 +00:00
Chris Lattner
02742710f3
SCEVAddExpr::get() of an empty list is invalid.
...
llvm-svn: 22724
2005-08-09 01:13:47 +00:00
Chris Lattner
a091ff1764
Implement: LoopStrengthReduce/share_ivs.ll
...
Two changes:
* Only insert one PHI node for each stride. Other values are live in
values. This cannot introduce higher register pressure than the
previous approach, and can take advantage of reg+reg addressing modes.
* Factor common base values out of uses before moving values from the
base to the immediate fields. This improves codegen by starting the
stride-specific PHI node out at a common place for each IV use.
As an example, we used to generate this for a loop in swim:
.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2: ; no_exit.7.i
lfd f0, 0(r8)
stfd f0, 0(r3)
lfd f0, 0(r6)
stfd f0, 0(r7)
lfd f0, 0(r2)
stfd f0, 0(r5)
addi r9, r9, 1
addi r2, r2, 8
addi r5, r5, 8
addi r6, r6, 8
addi r7, r7, 8
addi r8, r8, 8
addi r3, r3, 8
cmpw cr0, r9, r4
bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
now we emit:
.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2: ; no_exit.7.i
lfdx f0, r8, r2
stfdx f0, r9, r2
lfdx f0, r5, r2
stfdx f0, r7, r2
lfdx f0, r3, r2
stfdx f0, r6, r2
addi r10, r10, 1
addi r2, r2, 8
cmpw cr0, r10, r4
bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
As another more dramatic example, we used to emit this:
.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2: ; no_exit.1.i19
lfd f0, 8(r21)
lfd f4, 8(r3)
lfd f5, 8(r27)
lfd f6, 8(r22)
lfd f7, 8(r5)
lfd f8, 8(r6)
lfd f9, 8(r30)
lfd f10, 8(r11)
lfd f11, 8(r12)
fsub f10, f10, f11
fadd f5, f4, f5
fmul f5, f5, f1
fadd f6, f6, f7
fadd f6, f6, f8
fadd f6, f6, f9
fmadd f0, f5, f6, f0
fnmsub f0, f10, f2, f0
stfd f0, 8(r4)
lfd f0, 8(r25)
lfd f5, 8(r26)
lfd f6, 8(r23)
lfd f9, 8(r28)
lfd f10, 8(r10)
lfd f12, 8(r9)
lfd f13, 8(r29)
fsub f11, f13, f11
fadd f4, f4, f5
fmul f4, f4, f1
fadd f5, f6, f9
fadd f5, f5, f10
fadd f5, f5, f12
fnmsub f0, f4, f5, f0
fnmsub f0, f11, f3, f0
stfd f0, 8(r24)
lfd f0, 8(r8)
fsub f4, f7, f8
fsub f5, f12, f10
fnmsub f0, f5, f2, f0
fnmsub f0, f4, f3, f0
stfd f0, 8(r2)
addi r20, r20, 1
addi r2, r2, 8
addi r8, r8, 8
addi r10, r10, 8
addi r12, r12, 8
addi r6, r6, 8
addi r29, r29, 8
addi r28, r28, 8
addi r26, r26, 8
addi r25, r25, 8
addi r24, r24, 8
addi r5, r5, 8
addi r23, r23, 8
addi r22, r22, 8
addi r3, r3, 8
addi r9, r9, 8
addi r11, r11, 8
addi r30, r30, 8
addi r27, r27, 8
addi r21, r21, 8
addi r4, r4, 8
cmpw cr0, r20, r7
bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
we now emit:
.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2: ; no_exit.1.i19
lfdx f0, r21, r20
lfdx f4, r3, r20
lfdx f5, r27, r20
lfdx f6, r22, r20
lfdx f7, r5, r20
lfdx f8, r6, r20
lfdx f9, r30, r20
lfdx f10, r11, r20
lfdx f11, r12, r20
fsub f10, f10, f11
fadd f5, f4, f5
fmul f5, f5, f1
fadd f6, f6, f7
fadd f6, f6, f8
fadd f6, f6, f9
fmadd f0, f5, f6, f0
fnmsub f0, f10, f2, f0
stfdx f0, r4, r20
lfdx f0, r25, r20
lfdx f5, r26, r20
lfdx f6, r23, r20
lfdx f9, r28, r20
lfdx f10, r10, r20
lfdx f12, r9, r20
lfdx f13, r29, r20
fsub f11, f13, f11
fadd f4, f4, f5
fmul f4, f4, f1
fadd f5, f6, f9
fadd f5, f5, f10
fadd f5, f5, f12
fnmsub f0, f4, f5, f0
fnmsub f0, f11, f3, f0
stfdx f0, r24, r20
lfdx f0, r8, r20
fsub f4, f7, f8
fsub f5, f12, f10
fnmsub f0, f5, f2, f0
fnmsub f0, f4, f3, f0
stfdx f0, r2, r20
addi r19, r19, 1
addi r20, r20, 8
cmpw cr0, r19, r7
bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
llvm-svn: 22722
2005-08-09 00:18:09 +00:00
Chris Lattner
37c24cc98c
Suck the base value out of the UsersToProcess vector into the BasedUser
...
class to simplify the code. Fuse two loops.
llvm-svn: 22721
2005-08-08 22:56:21 +00:00
Chris Lattner
37ed895bf1
Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The
...
first is a correctness thing, and the later is an optzn thing. This also
is needed to support a future change.
llvm-svn: 22720
2005-08-08 22:32:34 +00:00
Chris Lattner
9f269e40c9
Use the new 'moveBefore' method to simplify some code. Really, which is
...
easier to understand? :)
llvm-svn: 22706
2005-08-08 19:11:57 +00:00
Chris Lattner
14203e85b2
Not all constants are legal immediates in load/store instructions.
...
llvm-svn: 22704
2005-08-08 06:25:50 +00:00
Chris Lattner
c70bbc0c41
Implement LoopStrengthReduce/share_code_in_preheader.ll by having one
...
rewriter for all code inserted into the preheader, which is never flushed.
llvm-svn: 22702
2005-08-08 05:47:49 +00:00
Chris Lattner
9bfa6f8784
Implement a simple optimization for the termination condition of the loop.
...
The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.
On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:
_foo:
li r2, 0
.LBB_foo_1: ; no_exit
li r5, 0
stw r5, 0(r3)
addi r2, r2, 1
cmpw cr0, r2, r4
bne .LBB_foo_1 ; no_exit
blr
instead of:
_foo:
li r2, 1 ;; IV starts at 1, not 0
.LBB_foo_1: ; no_exit
li r5, 0
stw r5, 0(r3)
addi r5, r2, 1
cmpw cr0, r2, r4
or r2, r5, r5 ;; Reg-reg copy, extra live range
bne .LBB_foo_1 ; no_exit
blr
This implements LoopStrengthReduce/exit_compare_live_range.ll
llvm-svn: 22699
2005-08-08 05:28:22 +00:00
Chris Lattner
2c14cf7b74
Add some simple folds that occur in bitfield cases. Fix a minor bug in
...
isHighOnes, where it would consider 0 to have high ones.
llvm-svn: 22693
2005-08-07 07:03:10 +00:00
Chris Lattner
134ebd0801
Fix typoCVS: ----------------------------------------------------------------------
...
llvm-svn: 22692
2005-08-07 07:00:52 +00:00
Chris Lattner
f4dd8c445c
* Use the new PHINode::hasConstantValue method to simplify some code
...
* Teach this code to move allocas out of the loop when tail call eliminating
a call marked 'tail'. This implements TailCallElim/move_alloca_for_tail_call.ll
* Do not perform this transformation if a call is marked 'tail' and if there
are allocas that we cannot move out of the loop in #2 . Doing so would increase
the stack usage of the function. This implements fixes
PR615 and TailCallElim/dont-tce-tail-marked-call.ll.
llvm-svn: 22690
2005-08-07 04:27:41 +00:00
Chris Lattner
11e7a5eda7
Make sure to clean CastedPointers after casts are potentially deleted.
...
This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas
llvm-svn: 22673
2005-08-05 01:30:11 +00:00
Chris Lattner
9f9c260b8c
now that hasConstantValue defaults to only returning values that dominate
...
the PHI node, this ugly code can vanish.
llvm-svn: 22672
2005-08-05 01:04:30 +00:00
Chris Lattner
257efb2ad3
This code can handle non-dominating instructions
...
llvm-svn: 22667
2005-08-05 00:57:45 +00:00
Nate Begeman
b392321cae
Fix a fixme in CondPropagate.cpp by moving a PhiNode optimization into
...
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
llvm-svn: 22664
2005-08-04 23:24:19 +00:00