llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	da1b152c43	Make this work for FP constantexprs llvm-svn: 23773	2005-10-17 20:18:38 +00:00
Chris Lattner	7fde91e365	Oops, X+0.0 isn't foldable, but X+-0.0 is. llvm-svn: 23772	2005-10-17 17:56:38 +00:00
Chris Lattner	32979336a7	relax this a bit, as we only support the default rounding mode llvm-svn: 23771	2005-10-17 17:49:32 +00:00
Chris Lattner	192cd18f53	Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling out CSE's of base expressions it could build a result whose order was nondet. llvm-svn: 23698	2005-10-11 18:41:04 +00:00
Chris Lattner	5c9d63da31	Fix another problem where LSR was being nondeterminstic. Also remove elements from the end of a vector instead of the beginning llvm-svn: 23697	2005-10-11 18:30:57 +00:00
Chris Lattner	b7a3894e7c	Fix another lsr-is-nondeterministic case llvm-svn: 23695	2005-10-11 18:17:57 +00:00
Chris Lattner	03b9eb506c	Make MaskedValueIsZero a bit more aggressive llvm-svn: 23677	2005-10-09 22:08:50 +00:00
Chris Lattner	62010c450f	Fix funky xcode indentation llvm-svn: 23674	2005-10-09 06:36:35 +00:00
Chris Lattner	eb4be8b942	Hrm, you didn't see this. llvm-svn: 23673	2005-10-09 06:24:02 +00:00
Chris Lattner	4ea0a3eaac	Fix a source of non-determinism in the backend: the order of processing IV strides dependend on the pointer order of the strides in memory. Non-determinism is bad. llvm-svn: 23672	2005-10-09 06:20:55 +00:00
Jeff Cohen	572910c9a2	Remove useless variable. llvm-svn: 23656	2005-10-07 05:28:29 +00:00
Chris Lattner	20b0754c41	Fix DemoteRegToStack on an invoke. This fixes PR634. llvm-svn: 23618	2005-10-04 00:44:01 +00:00
Chris Lattner	4c3b2b536c	Clean up the code a bit. Use isInstructionTriviallyDead to be more aggressive and more correct than use_empty(). This fixes PR635 and SimplifyCFG/2005-10-02-InvokeSimplify.ll llvm-svn: 23616	2005-10-03 23:43:43 +00:00
Chris Lattner	f07a587c79	Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In particular, it should realize that phi's use their values in the pred block not the phi block itself. This change turns our em3d loop from this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_6 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; endif.loopexit.loopexit_crit_edge addi r3, r2, 1 blr LBB_test_6: ; loopexit or r3, r2, r2 blr into: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r6, r6 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 or r2, r6, r6 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r2, r2 blr Unfortunately, this is actually worse code, because the register coallescer is getting confused somehow. If it were doing its job right, it could turn the code into this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r6, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r6, r6 blr ... which I'll work on next. :) llvm-svn: 23604	2005-10-03 02:50:05 +00:00
Chris Lattner	e4ed42a426	Refactor some code into a function llvm-svn: 23603	2005-10-03 01:04:44 +00:00
Chris Lattner	360928dbed	This break is bogus and I have no idea why it was there. Basically it prevents memoizing code when IV's are used by phinodes outside of loops. In a simple example, we were getting this code before (note that r6 and r7 are isomorphic IV's): li r6, 0 or r7, r6, r6 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r7, r7 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r2, r7, 1 addi r7, r7, 1 addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit Now we get: li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit this was noticed in em3d. llvm-svn: 23602	2005-10-03 00:37:33 +00:00
Chris Lattner	8fcce170cf	when checking if we should move a split edge block outside of a loop, check the presplit pred, not the post-split pred. This was causing us to make the wrong decision in some cases, leaving the critical edge block in the loop. llvm-svn: 23601	2005-10-03 00:31:52 +00:00
Jeff Cohen	f8a5e5ae6e	Fix VC++ warnings. llvm-svn: 23579	2005-10-01 03:57:14 +00:00
Chris Lattner	a554c9470b	Insert stores after phi nodes in the normal dest. This fixes LowerInvoke/2005-08-03-InvokeWithPHI.ll llvm-svn: 23525	2005-09-29 17:44:20 +00:00
Chris Lattner	87ef943a4c	Fold isascii into a simple comparison. This speeds up 197.parser by 7.4%, bringing the LLC time down to the CBE time. llvm-svn: 23521	2005-09-29 06:17:27 +00:00
Chris Lattner	5f6035feb0	remove a bunch of unneeded stuff, or self evident comments llvm-svn: 23519	2005-09-29 06:16:11 +00:00
Chris Lattner	c244e7c178	Implement a couple of memcmp folds from the todo list llvm-svn: 23517	2005-09-29 04:54:20 +00:00
Chris Lattner	ea7214b23d	Constant fold llvm.sqrt llvm-svn: 23487	2005-09-28 01:34:32 +00:00
Chris Lattner	3b63bb375c	add a note about a way to improve this code further, that I won't be getting to right now. llvm-svn: 23485	2005-09-27 22:44:59 +00:00
Chris Lattner	eb953f0ef8	Fix a regression in my previous patch, fixing GlobalOpt/2005-09-27-Crash.ll and PR632. llvm-svn: 23484	2005-09-27 22:28:11 +00:00
Chris Lattner	e285f5ed8f	Avoid spilling stack slots... to stack slots. llvm-svn: 23478	2005-09-27 21:33:12 +00:00
Chris Lattner	87eb249300	Completely rewrite 'correct' eh support. This changes how setjmp insertion is performed so it is only at most once per function that contains an invoke instead of once per invoke in the function. This patch has the following perks: 1. It fixes PR631, which complains about slowness. 2. If fixes PR240, which complains about non-volatile vars being live across setjmp/longjmps. 3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not forcing the jmpbufs to always be 8-bytes off the alignment of the structure. 4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us now about 4% faster than GCC. Further improvements are also possible. llvm-svn: 23477	2005-09-27 21:18:17 +00:00
Chris Lattner	92233d2175	Make the pass name simpler llvm-svn: 23476	2005-09-27 21:10:32 +00:00
Chris Lattner	16cd356fb2	allow demotion to volatile values, add support for invoke llvm-svn: 23473	2005-09-27 19:39:00 +00:00
Chris Lattner	3d27e7f27f	Add support for external calls that we know how to constant fold. This implements ctor-list-opt.ll:CTOR8 llvm-svn: 23465	2005-09-27 05:02:43 +00:00
Chris Lattner	29b2780c8a	Fix a bug where we would evaluate stores into linkonce objects which could be potentially replaced at link-time. llvm-svn: 23463	2005-09-27 04:50:03 +00:00
Chris Lattner	65a3a0918f	Implement support for static constructors with calls in them. This is useful because gccas runs globalopt before inlining. This implements ctor-list-opt.ll:CTOR7 llvm-svn: 23462	2005-09-27 04:45:34 +00:00
Chris Lattner	da1889b778	Refactor this code a bit, no functionality changes. llvm-svn: 23460	2005-09-27 04:27:01 +00:00
Chris Lattner	f2f89af69a	Remove some dead code. ctor evaluation subsumes empty ctor elim llvm-svn: 23453	2005-09-26 20:38:20 +00:00
Chris Lattner	6bf2cd5735	Add support for alloca, implementing ctor-list-opt.ll:CTOR6 llvm-svn: 23452	2005-09-26 17:07:09 +00:00
Chris Lattner	46d9ff081d	Add a debug printout, fix a crash on kc++ llvm-svn: 23450	2005-09-26 07:34:35 +00:00
Chris Lattner	46af55e0e4	Implement loads/stores through GEP's of globals. This implements ctor-list-opt.ll:CTOR5. llvm-svn: 23449	2005-09-26 06:52:44 +00:00
Chris Lattner	61ff32cd70	Replace TraverseGEPInitializer with ConstantFoldLoadThroughGEPConstantExpr llvm-svn: 23447	2005-09-26 05:34:07 +00:00
Chris Lattner	02ae21e1e0	Eliminate GetGEPGlobalInitializer in favor of the more powerful ConstantFoldLoadThroughGEPConstantExpr function in the utils lib. llvm-svn: 23446	2005-09-26 05:28:52 +00:00
Chris Lattner	0b011ec8e2	Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils as ConstantFoldLoadThroughGEPConstantExpr. llvm-svn: 23445	2005-09-26 05:28:06 +00:00
Chris Lattner	c13c7b9376	Move the ConstantFoldLoadThroughGEPConstantExpr function out of the InstCombine pass. llvm-svn: 23444	2005-09-26 05:27:10 +00:00
Chris Lattner	b009663e27	add a comment llvm-svn: 23442	2005-09-26 05:16:34 +00:00
Chris Lattner	4b05c322d5	Add support for getelementptr, load, and correctly reject volatile stores. llvm-svn: 23441	2005-09-26 05:15:37 +00:00
Chris Lattner	3e9ea5ffec	Add support for br/brcond/switch and phi llvm-svn: 23439	2005-09-26 04:57:38 +00:00
Chris Lattner	99e23fa74c	Add a simple interpreter to this code, allowing us to statically evaluate global ctors that are simple enough. This implements ctor-list-opt.ll:CTOR2. llvm-svn: 23437	2005-09-26 04:44:35 +00:00
Chris Lattner	696beefabb	factor some code into a InstallGlobalCtors method, add comments. No functionality change. llvm-svn: 23435	2005-09-26 02:31:18 +00:00
Chris Lattner	838bdc1836	Make the global opt optimizer work on modules with a null terminator, by accepting the null even with a non-65535 init prio llvm-svn: 23434	2005-09-26 02:19:27 +00:00
Chris Lattner	41b6a5a693	Factor this code out into a few methods. Implement the start of global ctor optimization. It is currently smart enough to remove the global ctor for cases like this: struct foo { foo() {} } x; ... saving a bit of startup time for the program. llvm-svn: 23433	2005-09-26 01:43:45 +00:00
Chris Lattner	f487768062	Fix some logic I broke that caused a regression on SimplifyLibCalls/2005-05-20-sprintf-crash.ll llvm-svn: 23430	2005-09-25 07:06:48 +00:00
Chris Lattner	0b3557f54a	Move MaskedValueIsZero up. Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll llvm-svn: 23428	2005-09-24 23:43:33 +00:00
Chris Lattner	175463a165	Simplify this code a bit by relying on recursive simplification. Support sprintf("%s", P)'s that have uses. s/hasNUses(0)/use_empty()/ llvm-svn: 23425	2005-09-24 22:17:06 +00:00
Chris Lattner	499e33646e	remove some debugging code llvm-svn: 23411	2005-09-23 18:49:09 +00:00
Chris Lattner	c59a371d45	Fold two consequtive branches that share a common destination between them. This implements SimplifyCFG/branch-fold.ll, and is useful on ?:/min/max heavy code llvm-svn: 23410	2005-09-23 18:47:20 +00:00
Chris Lattner	3a978bf66d	simplify some logic further llvm-svn: 23408	2005-09-23 07:23:18 +00:00
Chris Lattner	cc14ebc17b	pull a bunch of logic out of SimplifyCFG into a helper fn llvm-svn: 23407	2005-09-23 06:39:30 +00:00
Chris Lattner	6c70106053	Start threading across blocks with code in them, so long as the code does not define a value that is used outside of it's block. This catches many more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc. This implements branch-phi-thread.ll:test3.ll llvm-svn: 23397	2005-09-20 01:48:40 +00:00
Chris Lattner	f0bd8d0107	Implement merging of blocks with the same condition if the block has multiple predecessors. This implements branch-phi-thread.ll::test1 llvm-svn: 23395	2005-09-20 00:43:16 +00:00
Chris Lattner	049cb4482f	Reject a case we don't handle yet llvm-svn: 23393	2005-09-19 23:57:04 +00:00
Chris Lattner	a160924d57	remove debugging code :-/ llvm-svn: 23392	2005-09-19 23:50:15 +00:00
Chris Lattner	748f903046	Implement SimplifyCFG/branch-phi-thread.ll, the most trivial case of threading control across branches with determined outcomes. More generality to follow. This triggers a couple thousand times in specint. llvm-svn: 23391	2005-09-19 23:49:37 +00:00
Chris Lattner	b4b2530a1a	Refactor this code a bit and make it more general. This now compiles: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } To: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) slwi r3, r3, 6 add r3, r4, r3 rlwimi r3, r4, 0, 26, 14 stw r3, 0(r2) blr instead of: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 26, 21, 31 add r3, r5, r3 rlwimi r4, r3, 6, 15, 25 stw r4, 0(r2) blr by eliminating an 'and'. I'm pretty sure this is as small as we can go :) llvm-svn: 23386	2005-09-18 07:22:02 +00:00
Chris Lattner	797dee7705	Compile struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } to: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX and %ECX, 131008 mov %EDX, DWORD PTR [%ESP + 4] shl %EDX, 6 add %EDX, %ECX and %EDX, 131008 and %EAX, -131009 or %EDX, %EAX mov DWORD PTR [b], %EDX ret instead of: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX shr %ECX, 6 and %ECX, 2047 add %ECX, DWORD PTR [%ESP + 4] shl %ECX, 6 and %ECX, 131008 and %EAX, -131009 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23385	2005-09-18 06:30:59 +00:00
Chris Lattner	01f56c68e9	Generalize this transform, using MaskedValueIsZero, allowing us to compile: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } To: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 add DWORD PTR [b], %EAX ret instead of: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 mov %ECX, DWORD PTR [b] add %EAX, %ECX and %EAX, -131072 and %ECX, 131071 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23384	2005-09-18 06:02:59 +00:00
Chris Lattner	4ebc8ab4e0	fix typeo llvm-svn: 23383	2005-09-18 05:25:20 +00:00
Chris Lattner	e5b23a6d67	Remove unintentionally committed code llvm-svn: 23382	2005-09-18 05:12:51 +00:00
Chris Lattner	27cb9dbd35	implement shift.ll:test25. This compiles: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } to: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r3, 0(r2) rlwinm r4, r3, 0, 0, 14 add r4, r4, r3 rlwimi r4, r3, 0, 15, 31 stw r4, 0(r2) blr instead of: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) srwi r5, r4, 17 add r3, r5, r3 slwi r3, r3, 17 rlwimi r3, r4, 0, 15, 31 stw r3, 0(r2) blr llvm-svn: 23381	2005-09-18 05:12:10 +00:00
Chris Lattner	af517574ce	Implement add.ll:test29. Codegening: struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus1 (unsigned int x) { b.i += x; } as: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) add r3, r4, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr instead of: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 0, 26, 31 add r3, r5, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr llvm-svn: 23379	2005-09-18 04:24:45 +00:00
Chris Lattner	027eaf01cf	remove debug output llvm-svn: 23377	2005-09-18 03:50:25 +00:00
Chris Lattner	1521298993	Implement or.ll:test21. This teaches instcombine to be able to turn this: struct { unsigned int bit0:1; unsigned int ubyte:31; } sdata; void foo() { sdata.ubyte++; } into this: foo: add DWORD PTR [sdata], 2 ret instead of this: foo: mov %EAX, DWORD PTR [sdata] mov %ECX, %EAX add %ECX, 2 and %ECX, -2 and %EAX, 1 or %EAX, %ECX mov DWORD PTR [sdata], %EAX ret llvm-svn: 23376	2005-09-18 03:42:07 +00:00
Chris Lattner	a393e4d4b3	Fix the regression last night compiling povray llvm-svn: 23348	2005-09-14 17:32:56 +00:00
Chris Lattner	2a8932960d	Add a simple xform to simplify array accesses with casts in the way. This is useful for 178.galgel where resolution of dope vectors (by the optimizer) causes the scales to become apparent. llvm-svn: 23328	2005-09-13 18:36:04 +00:00
Chris Lattner	fd018c8dfe	Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI. This fixes up a dot-product loop in galgel, speeding it up from 18.47s to 16.13s. llvm-svn: 23327	2005-09-13 02:09:55 +00:00
Chris Lattner	567b81f0d2	Add a helper function, allowing us to simplify some code a bit, changing indentation, no functionality change llvm-svn: 23325	2005-09-13 00:40:14 +00:00
Chris Lattner	219175c84d	Implement a simple xform to turn code like this: if () { store A -> P; } else { store B -> P; } into a PHI node with one store, in the most trival case. This implements load.ll:test10. llvm-svn: 23324	2005-09-12 23:23:25 +00:00
Chris Lattner	e0bfdf1485	Another load-peephole optimization: do gcse when two loads are next to each other. This implements InstCombine/load.ll:test9 llvm-svn: 23322	2005-09-12 22:21:03 +00:00
Chris Lattner	b990f7d8ed	Implement a trivial form of store->load forwarding where the store and the load are exactly consequtive. This is picked up by other passes, but this triggers thousands of times in fortran programs that use static locals (and is thus a compile-time speedup). llvm-svn: 23320	2005-09-12 22:00:15 +00:00
Chris Lattner	8048b85e8f	Fix a regression from last night, which caused this pass to create invalid code for IV uses outside of loops that are not dominated by the latch block. We should only convert these uses to use the post-inc value if they ARE dominated by the latch block. Also use a new LoopInfo method to simplify some code. This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll llvm-svn: 23318	2005-09-12 17:11:27 +00:00
Chris Lattner	a67648396a	_test: li r2, 0 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r2, 1 stw r2, 0(r4) blr [zion ~/llvm]$ cat > ~/xx Uses of IV's outside of the loop should use hte post-incremented version of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack) which used to generate code like this (this is the code from the dont-hoist-simple-loop-constants.ll testcase): _test: li r2, 0 ** IV starts at 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 Copy for loop exit li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 IV+2 cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 IV+2 stw r2, 0(r4) blr And now generated code like this: _test: li r2, 1 * IV starts at 1 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 * IV.postinc + 0 blt cr0, LBB_test_1 LBB_test_2: ; loopexit.2.loopexit stw r2, 0(r4) * IV.postinc + 0 blr llvm-svn: 23313	2005-09-12 06:04:47 +00:00
Chris Lattner	530fe6ab30	implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. We used to emit this code for it: _test: li r2, 1 ;; Value tying up a register for the whole loop li r5, 0 LBB_test_1: ; no_exit.2 or r6, r5, r5 li r5, 0 stw r5, 0(r3) addi r5, r6, 1 addi r3, r3, 4 add r7, r2, r5 ;; should be addi r7, r5, 1 cmpwi cr0, r7, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r6, 2 stw r2, 0(r4) blr now we emit this: _test: li r2, 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 ;; whoa, fold those adds! cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 stw r2, 0(r4) blr more improvement coming. llvm-svn: 23306	2005-09-10 01:18:45 +00:00
Chris Lattner	b5e381a8cf	Fix a problem that Dan Berlin noticed, where reassociation would not succeed in building maximal expressions before simplifying them. In particular, i cases like this: X-(A+B+X) the code would consider A+B+X to be a maximal expression (not understanding that the single use '-' would be turned into a + later), simplify it (a noop) then later get simplified again. Each of these simplify steps is where the cost of reassociation comes from, so this patch should speed up the already fast pass a bit. Thanks to Dan for noticing this! llvm-svn: 23214	2005-09-02 07:07:58 +00:00
Chris Lattner	9fe263aa75	Avoid creating garbage instructions, just move the old add instruction to where we need it when converting -(A+B+C) -> -A + -B + -C. llvm-svn: 23213	2005-09-02 06:38:04 +00:00
Chris Lattner	d1325da091	add some assertions and fix problems where reassociate could access the Ops vector out of range llvm-svn: 23211	2005-09-02 05:23:22 +00:00
Chris Lattner	8ca5b2a6d2	Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll llvm-svn: 23019	2005-08-24 17:55:32 +00:00
Chris Lattner	4201cd1bbc	Transform floor((double)FLT) -> (double)floorf(FLT), implementing Regression/Transforms/SimplifyLibCalls/floor.ll. This triggers 19 times in 177.mesa. llvm-svn: 23017	2005-08-24 17:22:17 +00:00
Chris Lattner	ea7dfd53d6	Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash on 177.mesa llvm-svn: 22843	2005-08-17 21:22:41 +00:00
Chris Lattner	2bf7cb5213	Use a new helper to split critical edges, making the code simpler. Do not claim to not change the CFG. We do change the cfg to split critical edges. This isn't causing us a problem now, but could likely do so in the future. llvm-svn: 22824	2005-08-17 06:35:16 +00:00
Chris Lattner	5cf983ee0f	Fix a bad case in gzip where we put lots of things in registers across the loop, because a IV-dependent value was used outside of the loop and didn't have immediate-folding capability llvm-svn: 22798	2005-08-16 00:38:11 +00:00
Chris Lattner	47d3ec3525	Ooops, don't forget to clear this. The real inner loop is now: .LBB_foo_3: ; no_exit.1 lfd f2, 0(r9) lfd f3, 8(r9) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r9) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfd f2, 0(r9) addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22782	2005-08-13 07:42:01 +00:00
Chris Lattner	5949d49032	Recursively scan scev expressions for common subexpressions. This allows us to handle nested loops much better, for example, by being able to tell that these two expressions: {( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1> {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> Have the following common part that can be shared: {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> This allows us to codegen an important inner loop in 168.wupwise as: .LBB_foo_4: ; no_exit.1 lfd f2, 16(r9) fmul f3, f0, f2 fmul f2, f1, f2 fadd f4, f3, f2 stfd f4, 8(r9) fsub f2, f3, f2 stfd f2, 16(r9) addi r8, r8, 1 addi r9, r9, 16 cmpw cr0, r8, r4 ble .LBB_foo_4 ; no_exit.1 instead of: .LBB_foo_3: ; no_exit.1 lfdx f2, r6, r9 add r10, r6, r9 lfd f3, 8(r10) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r10) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfdx f2, r6, r9 addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22781	2005-08-13 07:27:18 +00:00
Chris Lattner	89c1dfc733	Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes a problem in LoopStrengthReduction, where it would split critical edges then confused itself with outdated loop information. llvm-svn: 22776	2005-08-13 01:38:43 +00:00
Chris Lattner	79396539d3	remove dead code. The exit block list is computed on demand, thus does not need to be updated. This code is a relic from when it did. llvm-svn: 22775	2005-08-13 01:30:36 +00:00
Chris Lattner	8447b49526	When splitting critical edges, make sure not to leave the new block in the middle of the loop. This turns a critical loop in gzip into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 bne .LBB_test_8 ; loopentry.loopexit_crit_edge .LBB_test_2: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge .LBB_test_3: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge .LBB_test_4: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry instead of this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_test_9 ; loopexit .LBB_test_3: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 beq .LBB_test_5 ; shortcirc_next.1 .LBB_test_4: ; shortcirc_next.0.loopexit_crit_edge add r2, r11, r27 add r8, r12, r27 b .LBB_test_9 ; loopexit .LBB_test_5: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 beq .LBB_test_7 ; shortcirc_next.2 .LBB_test_6: ; shortcirc_next.1.loopexit_crit_edge add r2, r9, r27 add r8, r10, r27 b .LBB_test_9 ; loopexit .LBB_test_7: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry Next up, improve the code for the loop. llvm-svn: 22769	2005-08-12 22:22:17 +00:00
Chris Lattner	4fec86d348	Fix a FIXME: if we are inserting code for a PHI argument, split the critical edge so that the code is not always executed for both operands. This prevents LSR from inserting code into loops whose exit blocks contain PHI uses of IV expressions (which are outside of loops). On gzip, for example, we turn this ugly code: .LBB_test_1: ; loopentry add r27, r3, r28 lhz r27, 3(r27) add r26, r4, r28 lhz r26, 3(r26) add r25, r30, r28 ;; Only live if exiting the loop add r24, r29, r28 ;; Only live if exiting the loop cmpw cr0, r27, r26 bne .LBB_test_5 ; loopexit into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_test_9 ; loopexit .LBB_test_2: ; shortcirc_next.0 ... blt .LBB_test_1 into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_t_3: ; shortcirc_next.0 .LBB_test_3: ; shortcirc_next.0 ... blt .LBB_test_1 Next step: get the block out of the loop so that the loop is all fall-throughs again. llvm-svn: 22766	2005-08-12 22:06:11 +00:00
Chris Lattner	b7ebe65c56	Change break critical edges to not remove, then insert, PHI node entries. Instead, just update the BB in-place. This is both faster, and it prevents split-critical-edges from shuffling the PHI argument list unneccesarily. llvm-svn: 22765	2005-08-12 21:58:07 +00:00
Chris Lattner	62df798919	remove some trickiness that broke yacr2 and some other programs last night llvm-svn: 22751	2005-08-10 17:15:20 +00:00
Chris Lattner	f83ce5faee	Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y] into just Y. This often occurs when it seperates loops that have collapsed loop headers. This implements LoopSimplify/phi-node-simplify.ll llvm-svn: 22746	2005-08-10 02:07:32 +00:00
Chris Lattner	677d85784a	Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with constant stride. This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll llvm-svn: 22744	2005-08-10 01:12:06 +00:00
Chris Lattner	edff91a49a	Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride. For code like this: void foo(float a, float b, int n, int stride_a, int stride_b) { int i; for (i=0; i<n; i++) a[istride_a] = b[istride_b]; } we now emit: .LBB_foo2_2: ; no_exit lfs f0, 0(r4) stfs f0, 0(r3) addi r7, r7, 1 add r4, r2, r4 add r3, r6, r3 cmpw cr0, r7, r5 blt .LBB_foo2_2 ; no_exit instead of: .LBB_foo_2: ; no_exit mullw r8, r2, r7 ;; multiply! slwi r8, r8, 2 lfsx f0, r4, r8 mullw r8, r2, r6 ;; multiply! slwi r8, r8, 2 stfsx f0, r3, r8 addi r2, r2, 1 cmpw cr0, r2, r5 blt .LBB_foo_2 ; no_exit loops with variable strides occur pretty often. For example, in SPECFP2K there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp, 56 in 168.wupwise, 36 in 172.mgrid. Now we can allow indvars to turn functions written like this: void foo2(float a, float b, int n, int stride_a, int stride_b) { int i, ai = 0, bi = 0; for (i=0; i<n; i++) { a[ai] = b[bi]; ai += stride_a; bi += stride_b; } } into code like the above for better analysis. With this patch, they generate identical code. llvm-svn: 22740	2005-08-10 00:45:21 +00:00
Chris Lattner	dde7dc525e	Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll by being more careful about updating PHI nodes llvm-svn: 22739	2005-08-10 00:35:32 +00:00
Chris Lattner	c6c4d99a21	Fix some 80 column violations. Once we compute the evolution for a GEP, tell SE about it. This allows users of the GEP to know it, if the users are not direct. This allows us to compile this testcase: void fbSolidFillmmx(int w, unsigned char d) { while (w >= 64) { (unsigned long long ) (d + 0) = 0; (unsigned long long ) (d + 8) = 0; (unsigned long long ) (d + 16) = 0; (unsigned long long ) (d + 24) = 0; (unsigned long long ) (d + 32) = 0; (unsigned long long ) (d + 40) = 0; (unsigned long long ) (d + 48) = 0; (unsigned long long *) (d + 56) = 0; w -= 64; d += 64; } } into: .LBB_fbSolidFillmmx_2: ; no_exit li r2, 0 stw r2, 0(r4) stw r2, 4(r4) stw r2, 8(r4) stw r2, 12(r4) stw r2, 16(r4) stw r2, 20(r4) stw r2, 24(r4) stw r2, 28(r4) stw r2, 32(r4) stw r2, 36(r4) stw r2, 40(r4) stw r2, 44(r4) stw r2, 48(r4) stw r2, 52(r4) stw r2, 56(r4) stw r2, 60(r4) addi r4, r4, 64 addi r3, r3, -64 cmpwi cr0, r3, 63 bgt .LBB_fbSolidFillmmx_2 ; no_exit instead of: .LBB_fbSolidFillmmx_2: ; no_exit li r11, 0 stw r11, 0(r4) stw r11, 4(r4) stwx r11, r10, r4 add r12, r10, r4 stw r11, 4(r12) stwx r11, r9, r4 add r12, r9, r4 stw r11, 4(r12) stwx r11, r8, r4 add r12, r8, r4 stw r11, 4(r12) stwx r11, r7, r4 add r12, r7, r4 stw r11, 4(r12) stwx r11, r6, r4 add r12, r6, r4 stw r11, 4(r12) stwx r11, r5, r4 add r12, r5, r4 stw r11, 4(r12) stwx r11, r2, r4 add r12, r2, r4 stw r11, 4(r12) addi r4, r4, 64 addi r3, r3, -64 cmpwi cr0, r3, 63 bgt .LBB_fbSolidFillmmx_2 ; no_exit llvm-svn: 22737	2005-08-09 23:39:36 +00:00

1 2 3 4 5 ...

2184 Commits