Chris Lattner
da1b152c43
Make this work for FP constantexprs
...
llvm-svn: 23773
2005-10-17 20:18:38 +00:00
Chris Lattner
7fde91e365
Oops, X+0.0 isn't foldable, but X+-0.0 is.
...
llvm-svn: 23772
2005-10-17 17:56:38 +00:00
Chris Lattner
32979336a7
relax this a bit, as we only support the default rounding mode
...
llvm-svn: 23771
2005-10-17 17:49:32 +00:00
Chris Lattner
192cd18f53
Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling
...
out CSE's of base expressions it could build a result whose order was
nondet.
llvm-svn: 23698
2005-10-11 18:41:04 +00:00
Chris Lattner
5c9d63da31
Fix another problem where LSR was being nondeterminstic. Also remove elements
...
from the end of a vector instead of the beginning
llvm-svn: 23697
2005-10-11 18:30:57 +00:00
Chris Lattner
b7a3894e7c
Fix another lsr-is-nondeterministic case
...
llvm-svn: 23695
2005-10-11 18:17:57 +00:00
Chris Lattner
03b9eb506c
Make MaskedValueIsZero a bit more aggressive
...
llvm-svn: 23677
2005-10-09 22:08:50 +00:00
Chris Lattner
62010c450f
Fix funky xcode indentation
...
llvm-svn: 23674
2005-10-09 06:36:35 +00:00
Chris Lattner
eb4be8b942
Hrm, you didn't see this.
...
llvm-svn: 23673
2005-10-09 06:24:02 +00:00
Chris Lattner
4ea0a3eaac
Fix a source of non-determinism in the backend: the order of processing
...
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.
llvm-svn: 23672
2005-10-09 06:20:55 +00:00
Jeff Cohen
572910c9a2
Remove useless variable.
...
llvm-svn: 23656
2005-10-07 05:28:29 +00:00
Chris Lattner
20b0754c41
Fix DemoteRegToStack on an invoke. This fixes PR634.
...
llvm-svn: 23618
2005-10-04 00:44:01 +00:00
Chris Lattner
4c3b2b536c
Clean up the code a bit. Use isInstructionTriviallyDead to be more aggressive
...
and more correct than use_empty(). This fixes PR635 and
SimplifyCFG/2005-10-02-InvokeSimplify.ll
llvm-svn: 23616
2005-10-03 23:43:43 +00:00
Chris Lattner
f07a587c79
Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In
...
particular, it should realize that phi's use their values in the pred block
not the phi block itself. This change turns our em3d loop from this:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r2, 0
b LBB_test_6 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
or r2, r6, r6
lwz r6, 0(r3)
cmpw cr0, r6, r5
beq cr0, LBB_test_6 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r2, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; endif.loopexit.loopexit_crit_edge
addi r3, r2, 1
blr
LBB_test_6: ; loopexit
or r3, r2, r2
blr
into:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r2, 0
b LBB_test_5 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
or r2, r6, r6
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
or r2, r6, r6
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; loopexit
or r3, r2, r2
blr
Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow. If it were doing its job right, it could turn the
code into this:
_test:
cmpwi cr0, r4, 0
bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge
LBB_test_1: ; entry.loopexit_crit_edge
li r6, 0
b LBB_test_5 ; loopexit
LBB_test_2: ; entry.no_exit_crit_edge
li r6, 0
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
LBB_test_5: ; loopexit
or r3, r6, r6
blr
... which I'll work on next. :)
llvm-svn: 23604
2005-10-03 02:50:05 +00:00
Chris Lattner
e4ed42a426
Refactor some code into a function
...
llvm-svn: 23603
2005-10-03 01:04:44 +00:00
Chris Lattner
360928dbed
This break is bogus and I have no idea why it was there. Basically it prevents
...
memoizing code when IV's are used by phinodes outside of loops. In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):
li r6, 0
or r7, r6, r6
LBB_test_3: ; no_exit
lwz r2, 0(r3)
cmpw cr0, r2, r5
or r2, r7, r7
beq cr0, LBB_test_5 ; loopexit
LBB_test_4: ; endif
addi r2, r7, 1
addi r7, r7, 1
addi r3, r3, 4
addi r6, r6, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
Now we get:
li r6, 0
LBB_test_3: ; no_exit
or r2, r6, r6
lwz r6, 0(r3)
cmpw cr0, r6, r5
beq cr0, LBB_test_6 ; loopexit
LBB_test_4: ; endif
addi r3, r3, 4
addi r6, r2, 1
cmpw cr0, r6, r4
blt cr0, LBB_test_3 ; no_exit
this was noticed in em3d.
llvm-svn: 23602
2005-10-03 00:37:33 +00:00
Chris Lattner
8fcce170cf
when checking if we should move a split edge block outside of a loop,
...
check the presplit pred, not the post-split pred. This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.
llvm-svn: 23601
2005-10-03 00:31:52 +00:00
Jeff Cohen
f8a5e5ae6e
Fix VC++ warnings.
...
llvm-svn: 23579
2005-10-01 03:57:14 +00:00
Chris Lattner
a554c9470b
Insert stores after phi nodes in the normal dest. This fixes
...
LowerInvoke/2005-08-03-InvokeWithPHI.ll
llvm-svn: 23525
2005-09-29 17:44:20 +00:00
Chris Lattner
87ef943a4c
Fold isascii into a simple comparison. This speeds up 197.parser by 7.4%,
...
bringing the LLC time down to the CBE time.
llvm-svn: 23521
2005-09-29 06:17:27 +00:00
Chris Lattner
5f6035feb0
remove a bunch of unneeded stuff, or self evident comments
...
llvm-svn: 23519
2005-09-29 06:16:11 +00:00
Chris Lattner
c244e7c178
Implement a couple of memcmp folds from the todo list
...
llvm-svn: 23517
2005-09-29 04:54:20 +00:00
Chris Lattner
ea7214b23d
Constant fold llvm.sqrt
...
llvm-svn: 23487
2005-09-28 01:34:32 +00:00
Chris Lattner
3b63bb375c
add a note about a way to improve this code further, that I won't be getting
...
to right now.
llvm-svn: 23485
2005-09-27 22:44:59 +00:00
Chris Lattner
eb953f0ef8
Fix a regression in my previous patch, fixing GlobalOpt/2005-09-27-Crash.ll
...
and PR632.
llvm-svn: 23484
2005-09-27 22:28:11 +00:00
Chris Lattner
e285f5ed8f
Avoid spilling stack slots... to stack slots.
...
llvm-svn: 23478
2005-09-27 21:33:12 +00:00
Chris Lattner
87eb249300
Completely rewrite 'correct' eh support. This changes how setjmp insertion
...
is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function. This patch has the following perks:
1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
now about 4% faster than GCC.
Further improvements are also possible.
llvm-svn: 23477
2005-09-27 21:18:17 +00:00
Chris Lattner
92233d2175
Make the pass name simpler
...
llvm-svn: 23476
2005-09-27 21:10:32 +00:00
Chris Lattner
16cd356fb2
allow demotion to volatile values, add support for invoke
...
llvm-svn: 23473
2005-09-27 19:39:00 +00:00
Chris Lattner
3d27e7f27f
Add support for external calls that we know how to constant fold. This implements
...
ctor-list-opt.ll:CTOR8
llvm-svn: 23465
2005-09-27 05:02:43 +00:00
Chris Lattner
29b2780c8a
Fix a bug where we would evaluate stores into linkonce objects which could be
...
potentially replaced at link-time.
llvm-svn: 23463
2005-09-27 04:50:03 +00:00
Chris Lattner
65a3a0918f
Implement support for static constructors with calls in them. This is useful
...
because gccas runs globalopt before inlining.
This implements ctor-list-opt.ll:CTOR7
llvm-svn: 23462
2005-09-27 04:45:34 +00:00
Chris Lattner
da1889b778
Refactor this code a bit, no functionality changes.
...
llvm-svn: 23460
2005-09-27 04:27:01 +00:00
Chris Lattner
f2f89af69a
Remove some dead code. ctor evaluation subsumes empty ctor elim
...
llvm-svn: 23453
2005-09-26 20:38:20 +00:00
Chris Lattner
6bf2cd5735
Add support for alloca, implementing ctor-list-opt.ll:CTOR6
...
llvm-svn: 23452
2005-09-26 17:07:09 +00:00
Chris Lattner
46d9ff081d
Add a debug printout, fix a crash on kc++
...
llvm-svn: 23450
2005-09-26 07:34:35 +00:00
Chris Lattner
46af55e0e4
Implement loads/stores through GEP's of globals. This implements
...
ctor-list-opt.ll:CTOR5.
llvm-svn: 23449
2005-09-26 06:52:44 +00:00
Chris Lattner
61ff32cd70
Replace TraverseGEPInitializer with ConstantFoldLoadThroughGEPConstantExpr
...
llvm-svn: 23447
2005-09-26 05:34:07 +00:00
Chris Lattner
02ae21e1e0
Eliminate GetGEPGlobalInitializer in favor of the more powerful
...
ConstantFoldLoadThroughGEPConstantExpr function in the utils lib.
llvm-svn: 23446
2005-09-26 05:28:52 +00:00
Chris Lattner
0b011ec8e2
Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils
...
as ConstantFoldLoadThroughGEPConstantExpr.
llvm-svn: 23445
2005-09-26 05:28:06 +00:00
Chris Lattner
c13c7b9376
Move the ConstantFoldLoadThroughGEPConstantExpr function out of the InstCombine
...
pass.
llvm-svn: 23444
2005-09-26 05:27:10 +00:00
Chris Lattner
b009663e27
add a comment
...
llvm-svn: 23442
2005-09-26 05:16:34 +00:00
Chris Lattner
4b05c322d5
Add support for getelementptr, load, and correctly reject volatile stores.
...
llvm-svn: 23441
2005-09-26 05:15:37 +00:00
Chris Lattner
3e9ea5ffec
Add support for br/brcond/switch and phi
...
llvm-svn: 23439
2005-09-26 04:57:38 +00:00
Chris Lattner
99e23fa74c
Add a simple interpreter to this code, allowing us to statically evaluate
...
global ctors that are simple enough. This implements ctor-list-opt.ll:CTOR2.
llvm-svn: 23437
2005-09-26 04:44:35 +00:00
Chris Lattner
696beefabb
factor some code into a InstallGlobalCtors method, add comments. No functionality change.
...
llvm-svn: 23435
2005-09-26 02:31:18 +00:00
Chris Lattner
838bdc1836
Make the global opt optimizer work on modules with a null terminator, by
...
accepting the null even with a non-65535 init prio
llvm-svn: 23434
2005-09-26 02:19:27 +00:00
Chris Lattner
41b6a5a693
Factor this code out into a few methods.
...
Implement the start of global ctor optimization. It is currently smart
enough to remove the global ctor for cases like this:
struct foo {
foo() {}
} x;
... saving a bit of startup time for the program.
llvm-svn: 23433
2005-09-26 01:43:45 +00:00
Chris Lattner
f487768062
Fix some logic I broke that caused a regression on
...
SimplifyLibCalls/2005-05-20-sprintf-crash.ll
llvm-svn: 23430
2005-09-25 07:06:48 +00:00
Chris Lattner
0b3557f54a
Move MaskedValueIsZero up.
...
Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll
llvm-svn: 23428
2005-09-24 23:43:33 +00:00
Chris Lattner
175463a165
Simplify this code a bit by relying on recursive simplification. Support
...
sprintf("%s", P)'s that have uses.
s/hasNUses(0)/use_empty()/
llvm-svn: 23425
2005-09-24 22:17:06 +00:00
Chris Lattner
499e33646e
remove some debugging code
...
llvm-svn: 23411
2005-09-23 18:49:09 +00:00
Chris Lattner
c59a371d45
Fold two consequtive branches that share a common destination between them.
...
This implements SimplifyCFG/branch-fold.ll, and is useful on ?:/min/max heavy
code
llvm-svn: 23410
2005-09-23 18:47:20 +00:00
Chris Lattner
3a978bf66d
simplify some logic further
...
llvm-svn: 23408
2005-09-23 07:23:18 +00:00
Chris Lattner
cc14ebc17b
pull a bunch of logic out of SimplifyCFG into a helper fn
...
llvm-svn: 23407
2005-09-23 06:39:30 +00:00
Chris Lattner
6c70106053
Start threading across blocks with code in them, so long as the code does
...
not define a value that is used outside of it's block. This catches many
more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc.
This implements branch-phi-thread.ll:test3.ll
llvm-svn: 23397
2005-09-20 01:48:40 +00:00
Chris Lattner
f0bd8d0107
Implement merging of blocks with the same condition if the block has multiple
...
predecessors. This implements branch-phi-thread.ll::test1
llvm-svn: 23395
2005-09-20 00:43:16 +00:00
Chris Lattner
049cb4482f
Reject a case we don't handle yet
...
llvm-svn: 23393
2005-09-19 23:57:04 +00:00
Chris Lattner
a160924d57
remove debugging code :-/
...
llvm-svn: 23392
2005-09-19 23:50:15 +00:00
Chris Lattner
748f903046
Implement SimplifyCFG/branch-phi-thread.ll, the most trivial case of threading
...
control across branches with determined outcomes. More generality to follow.
This triggers a couple thousand times in specint.
llvm-svn: 23391
2005-09-19 23:49:37 +00:00
Chris Lattner
b4b2530a1a
Refactor this code a bit and make it more general. This now compiles:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) { b.j += x; }
To:
_plus2:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
slwi r3, r3, 6
add r3, r4, r3
rlwimi r3, r4, 0, 26, 14
stw r3, 0(r2)
blr
instead of:
_plus2:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
rlwinm r5, r4, 26, 21, 31
add r3, r5, r3
rlwimi r4, r3, 6, 15, 25
stw r4, 0(r2)
blr
by eliminating an 'and'.
I'm pretty sure this is as small as we can go :)
llvm-svn: 23386
2005-09-18 07:22:02 +00:00
Chris Lattner
797dee7705
Compile
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) {
b.j += x;
}
to:
plus2:
mov %EAX, DWORD PTR [b]
mov %ECX, %EAX
and %ECX, 131008
mov %EDX, DWORD PTR [%ESP + 4]
shl %EDX, 6
add %EDX, %ECX
and %EDX, 131008
and %EAX, -131009
or %EDX, %EAX
mov DWORD PTR [b], %EDX
ret
instead of:
plus2:
mov %EAX, DWORD PTR [b]
mov %ECX, %EAX
shr %ECX, 6
and %ECX, 2047
add %ECX, DWORD PTR [%ESP + 4]
shl %ECX, 6
and %ECX, 131008
and %EAX, -131009
or %ECX, %EAX
mov DWORD PTR [b], %ECX
ret
llvm-svn: 23385
2005-09-18 06:30:59 +00:00
Chris Lattner
01f56c68e9
Generalize this transform, using MaskedValueIsZero, allowing us to compile:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) { b.k += x; }
To:
plus3:
mov %EAX, DWORD PTR [%ESP + 4]
shl %EAX, 17
add DWORD PTR [b], %EAX
ret
instead of:
plus3:
mov %EAX, DWORD PTR [%ESP + 4]
shl %EAX, 17
mov %ECX, DWORD PTR [b]
add %EAX, %ECX
and %EAX, -131072
and %ECX, 131071
or %ECX, %EAX
mov DWORD PTR [b], %ECX
ret
llvm-svn: 23384
2005-09-18 06:02:59 +00:00
Chris Lattner
4ebc8ab4e0
fix typeo
...
llvm-svn: 23383
2005-09-18 05:25:20 +00:00
Chris Lattner
e5b23a6d67
Remove unintentionally committed code
...
llvm-svn: 23382
2005-09-18 05:12:51 +00:00
Chris Lattner
27cb9dbd35
implement shift.ll:test25. This compiles:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) {
b.k += x;
}
to:
_plus3:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r3, 0(r2)
rlwinm r4, r3, 0, 0, 14
add r4, r4, r3
rlwimi r4, r3, 0, 15, 31
stw r4, 0(r2)
blr
instead of:
_plus3:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
srwi r5, r4, 17
add r3, r5, r3
slwi r3, r3, 17
rlwimi r3, r4, 0, 15, 31
stw r3, 0(r2)
blr
llvm-svn: 23381
2005-09-18 05:12:10 +00:00
Chris Lattner
af517574ce
Implement add.ll:test29. Codegening:
...
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus1 (unsigned int x) {
b.i += x;
}
as:
_plus1:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
add r3, r4, r3
rlwimi r3, r4, 0, 0, 25
stw r3, 0(r2)
blr
instead of:
_plus1:
lis r2, ha16(L_b$non_lazy_ptr)
lwz r2, lo16(L_b$non_lazy_ptr)(r2)
lwz r4, 0(r2)
rlwinm r5, r4, 0, 26, 31
add r3, r5, r3
rlwimi r3, r4, 0, 0, 25
stw r3, 0(r2)
blr
llvm-svn: 23379
2005-09-18 04:24:45 +00:00
Chris Lattner
027eaf01cf
remove debug output
...
llvm-svn: 23377
2005-09-18 03:50:25 +00:00
Chris Lattner
1521298993
Implement or.ll:test21. This teaches instcombine to be able to turn this:
...
struct {
unsigned int bit0:1;
unsigned int ubyte:31;
} sdata;
void foo() {
sdata.ubyte++;
}
into this:
foo:
add DWORD PTR [sdata], 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [sdata]
mov %ECX, %EAX
add %ECX, 2
and %ECX, -2
and %EAX, 1
or %EAX, %ECX
mov DWORD PTR [sdata], %EAX
ret
llvm-svn: 23376
2005-09-18 03:42:07 +00:00
Chris Lattner
a393e4d4b3
Fix the regression last night compiling povray
...
llvm-svn: 23348
2005-09-14 17:32:56 +00:00
Chris Lattner
2a8932960d
Add a simple xform to simplify array accesses with casts in the way.
...
This is useful for 178.galgel where resolution of dope vectors (by the
optimizer) causes the scales to become apparent.
llvm-svn: 23328
2005-09-13 18:36:04 +00:00
Chris Lattner
fd018c8dfe
Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.
...
This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.
llvm-svn: 23327
2005-09-13 02:09:55 +00:00
Chris Lattner
567b81f0d2
Add a helper function, allowing us to simplify some code a bit, changing
...
indentation, no functionality change
llvm-svn: 23325
2005-09-13 00:40:14 +00:00
Chris Lattner
219175c84d
Implement a simple xform to turn code like this:
...
if () { store A -> P; } else { store B -> P; }
into a PHI node with one store, in the most trival case. This implements
load.ll:test10.
llvm-svn: 23324
2005-09-12 23:23:25 +00:00
Chris Lattner
e0bfdf1485
Another load-peephole optimization: do gcse when two loads are next to
...
each other. This implements InstCombine/load.ll:test9
llvm-svn: 23322
2005-09-12 22:21:03 +00:00
Chris Lattner
b990f7d8ed
Implement a trivial form of store->load forwarding where the store and the
...
load are exactly consequtive. This is picked up by other passes, but this
triggers thousands of times in fortran programs that use static locals
(and is thus a compile-time speedup).
llvm-svn: 23320
2005-09-12 22:00:15 +00:00
Chris Lattner
8048b85e8f
Fix a regression from last night, which caused this pass to create invalid
...
code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.
Also use a new LoopInfo method to simplify some code.
This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
llvm-svn: 23318
2005-09-12 17:11:27 +00:00
Chris Lattner
a67648396a
_test:
...
li r2, 0
LBB_test_1: ; no_exit.2
li r5, 0
stw r5, 0(r3)
addi r2, r2, 1
addi r3, r3, 4
cmpwi cr0, r2, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r2, 1
stw r2, 0(r4)
blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):
_test:
li r2, 0 **** IV starts at 0
LBB_test_1: ; no_exit.2
or r5, r2, r2 **** Copy for loop exit
li r2, 0
stw r2, 0(r3)
addi r3, r3, 4
addi r2, r5, 1
addi r6, r5, 2 **** IV+2
cmpwi cr0, r6, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r5, 2 **** IV+2
stw r2, 0(r4)
blr
And now generated code like this:
_test:
li r2, 1 *** IV starts at 1
LBB_test_1: ; no_exit.2
li r5, 0
stw r5, 0(r3)
addi r2, r2, 1
addi r3, r3, 4
cmpwi cr0, r2, 701 *** IV.postinc + 0
blt cr0, LBB_test_1
LBB_test_2: ; loopexit.2.loopexit
stw r2, 0(r4) *** IV.postinc + 0
blr
llvm-svn: 23313
2005-09-12 06:04:47 +00:00
Chris Lattner
530fe6ab30
implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.
...
We used to emit this code for it:
_test:
li r2, 1 ;; Value tying up a register for the whole loop
li r5, 0
LBB_test_1: ; no_exit.2
or r6, r5, r5
li r5, 0
stw r5, 0(r3)
addi r5, r6, 1
addi r3, r3, 4
add r7, r2, r5 ;; should be addi r7, r5, 1
cmpwi cr0, r7, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r6, 2
stw r2, 0(r4)
blr
now we emit this:
_test:
li r2, 0
LBB_test_1: ; no_exit.2
or r5, r2, r2
li r2, 0
stw r2, 0(r3)
addi r3, r3, 4
addi r2, r5, 1
addi r6, r5, 2 ;; whoa, fold those adds!
cmpwi cr0, r6, 701
blt cr0, LBB_test_1 ; no_exit.2
LBB_test_2: ; loopexit.2.loopexit
addi r2, r5, 2
stw r2, 0(r4)
blr
more improvement coming.
llvm-svn: 23306
2005-09-10 01:18:45 +00:00
Chris Lattner
b5e381a8cf
Fix a problem that Dan Berlin noticed, where reassociation would not succeed
...
in building maximal expressions before simplifying them. In particular, i
cases like this:
X-(A+B+X)
the code would consider A+B+X to be a maximal expression (not understanding
that the single use '-' would be turned into a + later), simplify it (a noop)
then later get simplified again.
Each of these simplify steps is where the cost of reassociation comes from,
so this patch should speed up the already fast pass a bit.
Thanks to Dan for noticing this!
llvm-svn: 23214
2005-09-02 07:07:58 +00:00
Chris Lattner
9fe263aa75
Avoid creating garbage instructions, just move the old add instruction
...
to where we need it when converting -(A+B+C) -> -A + -B + -C.
llvm-svn: 23213
2005-09-02 06:38:04 +00:00
Chris Lattner
d1325da091
add some assertions and fix problems where reassociate could access the
...
Ops vector out of range
llvm-svn: 23211
2005-09-02 05:23:22 +00:00
Chris Lattner
8ca5b2a6d2
Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll
...
llvm-svn: 23019
2005-08-24 17:55:32 +00:00
Chris Lattner
4201cd1bbc
Transform floor((double)FLT) -> (double)floorf(FLT), implementing
...
Regression/Transforms/SimplifyLibCalls/floor.ll. This triggers 19 times in
177.mesa.
llvm-svn: 23017
2005-08-24 17:22:17 +00:00
Chris Lattner
ea7dfd53d6
Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash
...
on 177.mesa
llvm-svn: 22843
2005-08-17 21:22:41 +00:00
Chris Lattner
2bf7cb5213
Use a new helper to split critical edges, making the code simpler.
...
Do not claim to not change the CFG. We do change the cfg to split critical
edges. This isn't causing us a problem now, but could likely do so in the
future.
llvm-svn: 22824
2005-08-17 06:35:16 +00:00
Chris Lattner
5cf983ee0f
Fix a bad case in gzip where we put lots of things in registers across the
...
loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability
llvm-svn: 22798
2005-08-16 00:38:11 +00:00
Chris Lattner
47d3ec3525
Ooops, don't forget to clear this. The real inner loop is now:
...
.LBB_foo_3: ; no_exit.1
lfd f2, 0(r9)
lfd f3, 8(r9)
fmul f4, f1, f2
fmadd f4, f0, f3, f4
stfd f4, 8(r9)
fmul f3, f1, f3
fmsub f2, f0, f2, f3
stfd f2, 0(r9)
addi r9, r9, 16
addi r8, r8, 1
cmpw cr0, r8, r4
ble .LBB_foo_3 ; no_exit.1
llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner
5949d49032
Recursively scan scev expressions for common subexpressions. This allows us
...
to handle nested loops much better, for example, by being able to tell that
these two expressions:
{( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1>
{(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1>
Have the following common part that can be shared:
{(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1>
This allows us to codegen an important inner loop in 168.wupwise as:
.LBB_foo_4: ; no_exit.1
lfd f2, 16(r9)
fmul f3, f0, f2
fmul f2, f1, f2
fadd f4, f3, f2
stfd f4, 8(r9)
fsub f2, f3, f2
stfd f2, 16(r9)
addi r8, r8, 1
addi r9, r9, 16
cmpw cr0, r8, r4
ble .LBB_foo_4 ; no_exit.1
instead of:
.LBB_foo_3: ; no_exit.1
lfdx f2, r6, r9
add r10, r6, r9
lfd f3, 8(r10)
fmul f4, f1, f2
fmadd f4, f0, f3, f4
stfd f4, 8(r10)
fmul f3, f1, f3
fmsub f2, f0, f2, f3
stfdx f2, r6, r9
addi r9, r9, 16
addi r8, r8, 1
cmpw cr0, r8, r4
ble .LBB_foo_3 ; no_exit.1
llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Chris Lattner
89c1dfc733
Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes
...
a problem in LoopStrengthReduction, where it would split critical edges
then confused itself with outdated loop information.
llvm-svn: 22776
2005-08-13 01:38:43 +00:00
Chris Lattner
79396539d3
remove dead code. The exit block list is computed on demand, thus does not
...
need to be updated. This code is a relic from when it did.
llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner
8447b49526
When splitting critical edges, make sure not to leave the new block in the
...
middle of the loop. This turns a critical loop in gzip into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2: ; shortcirc_next.0
add r28, r3, r27
lhz r28, 5(r28)
add r26, r4, r27
lhz r26, 5(r26)
cmpw cr0, r28, r26
bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3: ; shortcirc_next.1
add r28, r3, r27
lhz r28, 7(r28)
add r26, r4, r27
lhz r26, 7(r26)
cmpw cr0, r28, r26
bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4: ; shortcirc_next.2
add r28, r3, r27
lhz r26, 9(r28)
add r28, r4, r27
lhz r25, 9(r28)
addi r28, r27, 8
cmpw cr7, r26, r25
mfcr r26, 1
rlwinm r26, r26, 31, 31, 31
add r25, r8, r27
cmpw cr7, r25, r7
mfcr r25, 1
rlwinm r25, r25, 29, 31, 31
and. r26, r26, r25
bne .LBB_test_1 ; loopentry
instead of this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_test_9 ; loopexit
.LBB_test_3: ; shortcirc_next.0
add r28, r3, r27
lhz r28, 5(r28)
add r26, r4, r27
lhz r26, 5(r26)
cmpw cr0, r28, r26
beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4: ; shortcirc_next.0.loopexit_crit_edge
add r2, r11, r27
add r8, r12, r27
b .LBB_test_9 ; loopexit
.LBB_test_5: ; shortcirc_next.1
add r28, r3, r27
lhz r28, 7(r28)
add r26, r4, r27
lhz r26, 7(r26)
cmpw cr0, r28, r26
beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6: ; shortcirc_next.1.loopexit_crit_edge
add r2, r9, r27
add r8, r10, r27
b .LBB_test_9 ; loopexit
.LBB_test_7: ; shortcirc_next.2
add r28, r3, r27
lhz r26, 9(r28)
add r28, r4, r27
lhz r25, 9(r28)
addi r28, r27, 8
cmpw cr7, r26, r25
mfcr r26, 1
rlwinm r26, r26, 31, 31, 31
add r25, r8, r27
cmpw cr7, r25, r7
mfcr r25, 1
rlwinm r25, r25, 29, 31, 31
and. r26, r26, r25
bne .LBB_test_1 ; loopentry
Next up, improve the code for the loop.
llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner
4fec86d348
Fix a FIXME: if we are inserting code for a PHI argument, split the critical
...
edge so that the code is not always executed for both operands. This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops). On gzip, for
example, we turn this ugly code:
.LBB_test_1: ; loopentry
add r27, r3, r28
lhz r27, 3(r27)
add r26, r4, r28
lhz r26, 3(r26)
add r25, r30, r28 ;; Only live if exiting the loop
add r24, r29, r28 ;; Only live if exiting the loop
cmpw cr0, r27, r26
bne .LBB_test_5 ; loopexit
into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_test_9 ; loopexit
.LBB_test_2: ; shortcirc_next.0
...
blt .LBB_test_1
into this:
.LBB_test_1: ; loopentry
or r27, r28, r28
add r28, r3, r27
lhz r28, 3(r28)
add r26, r4, r27
lhz r26, 3(r26)
cmpw cr0, r28, r26
beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2: ; loopentry.loopexit_crit_edge
add r2, r30, r27
add r8, r29, r27
b .LBB_t_3: ; shortcirc_next.0
.LBB_test_3: ; shortcirc_next.0
...
blt .LBB_test_1
Next step: get the block out of the loop so that the loop is all
fall-throughs again.
llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner
b7ebe65c56
Change break critical edges to not remove, then insert, PHI node entries.
...
Instead, just update the BB in-place. This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.
llvm-svn: 22765
2005-08-12 21:58:07 +00:00
Chris Lattner
62df798919
remove some trickiness that broke yacr2 and some other programs last night
...
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner
f83ce5faee
Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
...
into just Y. This often occurs when it seperates loops that have collapsed loop
headers. This implements LoopSimplify/phi-node-simplify.ll
llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner
677d85784a
Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
...
constant stride. This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll
llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner
edff91a49a
Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
...
For code like this:
void foo(float *a, float *b, int n, int stride_a, int stride_b) {
int i;
for (i=0; i<n; i++)
a[i*stride_a] = b[i*stride_b];
}
we now emit:
.LBB_foo2_2: ; no_exit
lfs f0, 0(r4)
stfs f0, 0(r3)
addi r7, r7, 1
add r4, r2, r4
add r3, r6, r3
cmpw cr0, r7, r5
blt .LBB_foo2_2 ; no_exit
instead of:
.LBB_foo_2: ; no_exit
mullw r8, r2, r7 ;; multiply!
slwi r8, r8, 2
lfsx f0, r4, r8
mullw r8, r2, r6 ;; multiply!
slwi r8, r8, 2
stfsx f0, r3, r8
addi r2, r2, 1
cmpw cr0, r2, r5
blt .LBB_foo_2 ; no_exit
loops with variable strides occur pretty often. For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.
Now we can allow indvars to turn functions written like this:
void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
int i, ai = 0, bi = 0;
for (i=0; i<n; i++)
{
a[ai] = b[bi];
ai += stride_a;
bi += stride_b;
}
}
into code like the above for better analysis. With this patch, they generate
identical code.
llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner
dde7dc525e
Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
...
by being more careful about updating PHI nodes
llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner
c6c4d99a21
Fix some 80 column violations.
...
Once we compute the evolution for a GEP, tell SE about it. This allows users
of the GEP to know it, if the users are not direct. This allows us to compile
this testcase:
void fbSolidFillmmx(int w, unsigned char *d) {
while (w >= 64) {
*(unsigned long long *) (d + 0) = 0;
*(unsigned long long *) (d + 8) = 0;
*(unsigned long long *) (d + 16) = 0;
*(unsigned long long *) (d + 24) = 0;
*(unsigned long long *) (d + 32) = 0;
*(unsigned long long *) (d + 40) = 0;
*(unsigned long long *) (d + 48) = 0;
*(unsigned long long *) (d + 56) = 0;
w -= 64;
d += 64;
}
}
into:
.LBB_fbSolidFillmmx_2: ; no_exit
li r2, 0
stw r2, 0(r4)
stw r2, 4(r4)
stw r2, 8(r4)
stw r2, 12(r4)
stw r2, 16(r4)
stw r2, 20(r4)
stw r2, 24(r4)
stw r2, 28(r4)
stw r2, 32(r4)
stw r2, 36(r4)
stw r2, 40(r4)
stw r2, 44(r4)
stw r2, 48(r4)
stw r2, 52(r4)
stw r2, 56(r4)
stw r2, 60(r4)
addi r4, r4, 64
addi r3, r3, -64
cmpwi cr0, r3, 63
bgt .LBB_fbSolidFillmmx_2 ; no_exit
instead of:
.LBB_fbSolidFillmmx_2: ; no_exit
li r11, 0
stw r11, 0(r4)
stw r11, 4(r4)
stwx r11, r10, r4
add r12, r10, r4
stw r11, 4(r12)
stwx r11, r9, r4
add r12, r9, r4
stw r11, 4(r12)
stwx r11, r8, r4
add r12, r8, r4
stw r11, 4(r12)
stwx r11, r7, r4
add r12, r7, r4
stw r11, 4(r12)
stwx r11, r6, r4
add r12, r6, r4
stw r11, 4(r12)
stwx r11, r5, r4
add r12, r5, r4
stw r11, 4(r12)
stwx r11, r2, r4
add r12, r2, r4
stw r11, 4(r12)
addi r4, r4, 64
addi r3, r3, -64
cmpwi cr0, r3, 63
bgt .LBB_fbSolidFillmmx_2 ; no_exit
llvm-svn: 22737
2005-08-09 23:39:36 +00:00