llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	f365f5f0c1	Fix spello llvm-svn: 27052	2006-03-24 07:14:34 +00:00
Chris Lattner	7d80b4f366	silence a bogus gcc warning llvm-svn: 26953	2006-03-22 17:27:24 +00:00
Evan Cheng	c28282bd87	- Fixed a bogus if condition. - Added more debugging info. - Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride. llvm-svn: 26841	2006-03-18 08:03:12 +00:00
Evan Cheng	f09f0ebd48	Sort StrideOrder so we can process the smallest strides first. This allows for more IV reuses. llvm-svn: 26837	2006-03-18 00:44:49 +00:00
Evan Cheng	4520698820	Allow users of iv / stride to be rewritten with expression that is a multiply of a smaller stride even if they have a common loop invariant expression part. llvm-svn: 26828	2006-03-17 19:52:23 +00:00
Evan Cheng	3df447d354	For each loop, keep track of all the IV expressions inserted indexed by stride. For a set of uses of the IV of a stride which is a multiple of another stride, do not insert a new IV expression. Rather, reuse the previous IV and rewrite the uses as uses of IV expression multiplied by the factor. e.g. x = 0 ...; x ++ y = 0 ...; y += 4 then use of y can be rewritten as use of 4*x for x86. llvm-svn: 26803	2006-03-16 21:53:05 +00:00
Evan Cheng	c567c4efbb	Added target lowering hooks which LSR consults to make more intelligent transformation decisions. llvm-svn: 26738	2006-03-13 23:14:23 +00:00
Chris Lattner	d30c4991a1	Use SCEVExpander::InsertCastOfTo instead of our own code. This reduces #LLVM LOC, and auto-cse's cast instructions. llvm-svn: 25974	2006-02-04 09:52:43 +00:00
Chris Lattner	2959f0003e	Fix two significant bugs in LSR: 1. When rewriting code in outer loops, sometimes we would insert code into inner loops that is invariant in that loop. 2. Notice that 4(2+x) is 8+4x and use that to simplify expressions. This is a performance neutral change. llvm-svn: 25964	2006-02-04 07:36:50 +00:00
Chris Lattner	c597b8a55e	Make iostream #inclusion explicit llvm-svn: 25514	2006-01-22 23:32:06 +00:00
Chris Lattner	cb36710ff9	Switch these to using ETForest instead of DominatorSet to compute itself. Patch written by Daniel Berlin! llvm-svn: 25202	2006-01-11 05:10:20 +00:00
Chris Lattner	077200737c	getRawValue zero extens for unsigned values, use getsextvalue so that we know that small negative values fit into the immediate field of addressing modes. llvm-svn: 24608	2005-12-05 18:23:57 +00:00
Chris Lattner	5df0e36e98	My previous patch was too conservative. Reject FP and void types, but do allow pointer types. llvm-svn: 23859	2005-10-21 05:45:41 +00:00
Chris Lattner	0c0b38bb4c	Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an inner loop like this: LBB_RateConvertMono8AltiVec_2: ; no_exit lis r2, ha16(.CPI_RateConvertMono8AltiVec_0) lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2) fmr f3, f3 fadd f0, f2, f0 fadd f3, f0, f3 fcmpu cr0, f3, f1 bge cr0, LBB_RateConvertMono8AltiVec_2 ; no_exit to an inner loop like this: LBB_RateConvertMono8AltiVec_1: ; no_exit fsub f2, f2, f1 fcmpu cr0, f2, f1 fmr f0, f2 bge cr0, LBB_RateConvertMono8AltiVec_1 ; no_exit Doh! good catch! llvm-svn: 23838	2005-10-20 04:47:10 +00:00
Chris Lattner	192cd18f53	Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling out CSE's of base expressions it could build a result whose order was nondet. llvm-svn: 23698	2005-10-11 18:41:04 +00:00
Chris Lattner	5c9d63da31	Fix another problem where LSR was being nondeterminstic. Also remove elements from the end of a vector instead of the beginning llvm-svn: 23697	2005-10-11 18:30:57 +00:00
Chris Lattner	b7a3894e7c	Fix another lsr-is-nondeterministic case llvm-svn: 23695	2005-10-11 18:17:57 +00:00
Chris Lattner	eb4be8b942	Hrm, you didn't see this. llvm-svn: 23673	2005-10-09 06:24:02 +00:00
Chris Lattner	4ea0a3eaac	Fix a source of non-determinism in the backend: the order of processing IV strides dependend on the pointer order of the strides in memory. Non-determinism is bad. llvm-svn: 23672	2005-10-09 06:20:55 +00:00
Chris Lattner	f07a587c79	Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In particular, it should realize that phi's use their values in the pred block not the phi block itself. This change turns our em3d loop from this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_6 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; endif.loopexit.loopexit_crit_edge addi r3, r2, 1 blr LBB_test_6: ; loopexit or r3, r2, r2 blr into: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r6, r6 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 or r2, r6, r6 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r2, r2 blr Unfortunately, this is actually worse code, because the register coallescer is getting confused somehow. If it were doing its job right, it could turn the code into this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r6, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r6, r6 blr ... which I'll work on next. :) llvm-svn: 23604	2005-10-03 02:50:05 +00:00
Chris Lattner	e4ed42a426	Refactor some code into a function llvm-svn: 23603	2005-10-03 01:04:44 +00:00
Chris Lattner	360928dbed	This break is bogus and I have no idea why it was there. Basically it prevents memoizing code when IV's are used by phinodes outside of loops. In a simple example, we were getting this code before (note that r6 and r7 are isomorphic IV's): li r6, 0 or r7, r6, r6 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r7, r7 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r2, r7, 1 addi r7, r7, 1 addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit Now we get: li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit this was noticed in em3d. llvm-svn: 23602	2005-10-03 00:37:33 +00:00
Chris Lattner	8fcce170cf	when checking if we should move a split edge block outside of a loop, check the presplit pred, not the post-split pred. This was causing us to make the wrong decision in some cases, leaving the critical edge block in the loop. llvm-svn: 23601	2005-10-03 00:31:52 +00:00
Chris Lattner	92233d2175	Make the pass name simpler llvm-svn: 23476	2005-09-27 21:10:32 +00:00
Chris Lattner	fd018c8dfe	Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI. This fixes up a dot-product loop in galgel, speeding it up from 18.47s to 16.13s. llvm-svn: 23327	2005-09-13 02:09:55 +00:00
Chris Lattner	8048b85e8f	Fix a regression from last night, which caused this pass to create invalid code for IV uses outside of loops that are not dominated by the latch block. We should only convert these uses to use the post-inc value if they ARE dominated by the latch block. Also use a new LoopInfo method to simplify some code. This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll llvm-svn: 23318	2005-09-12 17:11:27 +00:00
Chris Lattner	a67648396a	_test: li r2, 0 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r2, 1 stw r2, 0(r4) blr [zion ~/llvm]$ cat > ~/xx Uses of IV's outside of the loop should use hte post-incremented version of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack) which used to generate code like this (this is the code from the dont-hoist-simple-loop-constants.ll testcase): _test: li r2, 0 ** IV starts at 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 Copy for loop exit li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 IV+2 cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 IV+2 stw r2, 0(r4) blr And now generated code like this: _test: li r2, 1 * IV starts at 1 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 * IV.postinc + 0 blt cr0, LBB_test_1 LBB_test_2: ; loopexit.2.loopexit stw r2, 0(r4) * IV.postinc + 0 blr llvm-svn: 23313	2005-09-12 06:04:47 +00:00
Chris Lattner	530fe6ab30	implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. We used to emit this code for it: _test: li r2, 1 ;; Value tying up a register for the whole loop li r5, 0 LBB_test_1: ; no_exit.2 or r6, r5, r5 li r5, 0 stw r5, 0(r3) addi r5, r6, 1 addi r3, r3, 4 add r7, r2, r5 ;; should be addi r7, r5, 1 cmpwi cr0, r7, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r6, 2 stw r2, 0(r4) blr now we emit this: _test: li r2, 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 ;; whoa, fold those adds! cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 stw r2, 0(r4) blr more improvement coming. llvm-svn: 23306	2005-09-10 01:18:45 +00:00
Chris Lattner	ea7dfd53d6	Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash on 177.mesa llvm-svn: 22843	2005-08-17 21:22:41 +00:00
Chris Lattner	2bf7cb5213	Use a new helper to split critical edges, making the code simpler. Do not claim to not change the CFG. We do change the cfg to split critical edges. This isn't causing us a problem now, but could likely do so in the future. llvm-svn: 22824	2005-08-17 06:35:16 +00:00
Chris Lattner	5cf983ee0f	Fix a bad case in gzip where we put lots of things in registers across the loop, because a IV-dependent value was used outside of the loop and didn't have immediate-folding capability llvm-svn: 22798	2005-08-16 00:38:11 +00:00
Chris Lattner	47d3ec3525	Ooops, don't forget to clear this. The real inner loop is now: .LBB_foo_3: ; no_exit.1 lfd f2, 0(r9) lfd f3, 8(r9) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r9) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfd f2, 0(r9) addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22782	2005-08-13 07:42:01 +00:00
Chris Lattner	5949d49032	Recursively scan scev expressions for common subexpressions. This allows us to handle nested loops much better, for example, by being able to tell that these two expressions: {( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1> {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> Have the following common part that can be shared: {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> This allows us to codegen an important inner loop in 168.wupwise as: .LBB_foo_4: ; no_exit.1 lfd f2, 16(r9) fmul f3, f0, f2 fmul f2, f1, f2 fadd f4, f3, f2 stfd f4, 8(r9) fsub f2, f3, f2 stfd f2, 16(r9) addi r8, r8, 1 addi r9, r9, 16 cmpw cr0, r8, r4 ble .LBB_foo_4 ; no_exit.1 instead of: .LBB_foo_3: ; no_exit.1 lfdx f2, r6, r9 add r10, r6, r9 lfd f3, 8(r10) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r10) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfdx f2, r6, r9 addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22781	2005-08-13 07:27:18 +00:00
Chris Lattner	8447b49526	When splitting critical edges, make sure not to leave the new block in the middle of the loop. This turns a critical loop in gzip into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 bne .LBB_test_8 ; loopentry.loopexit_crit_edge .LBB_test_2: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge .LBB_test_3: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge .LBB_test_4: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry instead of this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_test_9 ; loopexit .LBB_test_3: ; shortcirc_next.0 add r28, r3, r27 lhz r28, 5(r28) add r26, r4, r27 lhz r26, 5(r26) cmpw cr0, r28, r26 beq .LBB_test_5 ; shortcirc_next.1 .LBB_test_4: ; shortcirc_next.0.loopexit_crit_edge add r2, r11, r27 add r8, r12, r27 b .LBB_test_9 ; loopexit .LBB_test_5: ; shortcirc_next.1 add r28, r3, r27 lhz r28, 7(r28) add r26, r4, r27 lhz r26, 7(r26) cmpw cr0, r28, r26 beq .LBB_test_7 ; shortcirc_next.2 .LBB_test_6: ; shortcirc_next.1.loopexit_crit_edge add r2, r9, r27 add r8, r10, r27 b .LBB_test_9 ; loopexit .LBB_test_7: ; shortcirc_next.2 add r28, r3, r27 lhz r26, 9(r28) add r28, r4, r27 lhz r25, 9(r28) addi r28, r27, 8 cmpw cr7, r26, r25 mfcr r26, 1 rlwinm r26, r26, 31, 31, 31 add r25, r8, r27 cmpw cr7, r25, r7 mfcr r25, 1 rlwinm r25, r25, 29, 31, 31 and. r26, r26, r25 bne .LBB_test_1 ; loopentry Next up, improve the code for the loop. llvm-svn: 22769	2005-08-12 22:22:17 +00:00
Chris Lattner	4fec86d348	Fix a FIXME: if we are inserting code for a PHI argument, split the critical edge so that the code is not always executed for both operands. This prevents LSR from inserting code into loops whose exit blocks contain PHI uses of IV expressions (which are outside of loops). On gzip, for example, we turn this ugly code: .LBB_test_1: ; loopentry add r27, r3, r28 lhz r27, 3(r27) add r26, r4, r28 lhz r26, 3(r26) add r25, r30, r28 ;; Only live if exiting the loop add r24, r29, r28 ;; Only live if exiting the loop cmpw cr0, r27, r26 bne .LBB_test_5 ; loopexit into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_test_9 ; loopexit .LBB_test_2: ; shortcirc_next.0 ... blt .LBB_test_1 into this: .LBB_test_1: ; loopentry or r27, r28, r28 add r28, r3, r27 lhz r28, 3(r28) add r26, r4, r27 lhz r26, 3(r26) cmpw cr0, r28, r26 beq .LBB_test_3 ; shortcirc_next.0 .LBB_test_2: ; loopentry.loopexit_crit_edge add r2, r30, r27 add r8, r29, r27 b .LBB_t_3: ; shortcirc_next.0 .LBB_test_3: ; shortcirc_next.0 ... blt .LBB_test_1 Next step: get the block out of the loop so that the loop is all fall-throughs again. llvm-svn: 22766	2005-08-12 22:06:11 +00:00
Chris Lattner	edff91a49a	Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride. For code like this: void foo(float a, float b, int n, int stride_a, int stride_b) { int i; for (i=0; i<n; i++) a[istride_a] = b[istride_b]; } we now emit: .LBB_foo2_2: ; no_exit lfs f0, 0(r4) stfs f0, 0(r3) addi r7, r7, 1 add r4, r2, r4 add r3, r6, r3 cmpw cr0, r7, r5 blt .LBB_foo2_2 ; no_exit instead of: .LBB_foo_2: ; no_exit mullw r8, r2, r7 ;; multiply! slwi r8, r8, 2 lfsx f0, r4, r8 mullw r8, r2, r6 ;; multiply! slwi r8, r8, 2 stfsx f0, r3, r8 addi r2, r2, 1 cmpw cr0, r2, r5 blt .LBB_foo_2 ; no_exit loops with variable strides occur pretty often. For example, in SPECFP2K there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp, 56 in 168.wupwise, 36 in 172.mgrid. Now we can allow indvars to turn functions written like this: void foo2(float a, float b, int n, int stride_a, int stride_b) { int i, ai = 0, bi = 0; for (i=0; i<n; i++) { a[ai] = b[bi]; ai += stride_a; bi += stride_b; } } into code like the above for better analysis. With this patch, they generate identical code. llvm-svn: 22740	2005-08-10 00:45:21 +00:00
Chris Lattner	dde7dc525e	Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll by being more careful about updating PHI nodes llvm-svn: 22739	2005-08-10 00:35:32 +00:00
Chris Lattner	c6c4d99a21	Fix some 80 column violations. Once we compute the evolution for a GEP, tell SE about it. This allows users of the GEP to know it, if the users are not direct. This allows us to compile this testcase: void fbSolidFillmmx(int w, unsigned char d) { while (w >= 64) { (unsigned long long ) (d + 0) = 0; (unsigned long long ) (d + 8) = 0; (unsigned long long ) (d + 16) = 0; (unsigned long long ) (d + 24) = 0; (unsigned long long ) (d + 32) = 0; (unsigned long long ) (d + 40) = 0; (unsigned long long ) (d + 48) = 0; (unsigned long long *) (d + 56) = 0; w -= 64; d += 64; } } into: .LBB_fbSolidFillmmx_2: ; no_exit li r2, 0 stw r2, 0(r4) stw r2, 4(r4) stw r2, 8(r4) stw r2, 12(r4) stw r2, 16(r4) stw r2, 20(r4) stw r2, 24(r4) stw r2, 28(r4) stw r2, 32(r4) stw r2, 36(r4) stw r2, 40(r4) stw r2, 44(r4) stw r2, 48(r4) stw r2, 52(r4) stw r2, 56(r4) stw r2, 60(r4) addi r4, r4, 64 addi r3, r3, -64 cmpwi cr0, r3, 63 bgt .LBB_fbSolidFillmmx_2 ; no_exit instead of: .LBB_fbSolidFillmmx_2: ; no_exit li r11, 0 stw r11, 0(r4) stw r11, 4(r4) stwx r11, r10, r4 add r12, r10, r4 stw r11, 4(r12) stwx r11, r9, r4 add r12, r9, r4 stw r11, 4(r12) stwx r11, r8, r4 add r12, r8, r4 stw r11, 4(r12) stwx r11, r7, r4 add r12, r7, r4 stw r11, 4(r12) stwx r11, r6, r4 add r12, r6, r4 stw r11, 4(r12) stwx r11, r5, r4 add r12, r5, r4 stw r11, 4(r12) stwx r11, r2, r4 add r12, r2, r4 stw r11, 4(r12) addi r4, r4, 64 addi r3, r3, -64 cmpwi cr0, r3, 63 bgt .LBB_fbSolidFillmmx_2 ; no_exit llvm-svn: 22737	2005-08-09 23:39:36 +00:00
Chris Lattner	02742710f3	SCEVAddExpr::get() of an empty list is invalid. llvm-svn: 22724	2005-08-09 01:13:47 +00:00
Chris Lattner	a091ff1764	Implement: LoopStrengthReduce/share_ivs.ll Two changes: * Only insert one PHI node for each stride. Other values are live in values. This cannot introduce higher register pressure than the previous approach, and can take advantage of reg+reg addressing modes. * Factor common base values out of uses before moving values from the base to the immediate fields. This improves codegen by starting the stride-specific PHI node out at a common place for each IV use. As an example, we used to generate this for a loop in swim: .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2: ; no_exit.7.i lfd f0, 0(r8) stfd f0, 0(r3) lfd f0, 0(r6) stfd f0, 0(r7) lfd f0, 0(r2) stfd f0, 0(r5) addi r9, r9, 1 addi r2, r2, 8 addi r5, r5, 8 addi r6, r6, 8 addi r7, r7, 8 addi r8, r8, 8 addi r3, r3, 8 cmpw cr0, r9, r4 bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1 now we emit: .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2: ; no_exit.7.i lfdx f0, r8, r2 stfdx f0, r9, r2 lfdx f0, r5, r2 stfdx f0, r7, r2 lfdx f0, r3, r2 stfdx f0, r6, r2 addi r10, r10, 1 addi r2, r2, 8 cmpw cr0, r10, r4 bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1 As another more dramatic example, we used to emit this: .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2: ; no_exit.1.i19 lfd f0, 8(r21) lfd f4, 8(r3) lfd f5, 8(r27) lfd f6, 8(r22) lfd f7, 8(r5) lfd f8, 8(r6) lfd f9, 8(r30) lfd f10, 8(r11) lfd f11, 8(r12) fsub f10, f10, f11 fadd f5, f4, f5 fmul f5, f5, f1 fadd f6, f6, f7 fadd f6, f6, f8 fadd f6, f6, f9 fmadd f0, f5, f6, f0 fnmsub f0, f10, f2, f0 stfd f0, 8(r4) lfd f0, 8(r25) lfd f5, 8(r26) lfd f6, 8(r23) lfd f9, 8(r28) lfd f10, 8(r10) lfd f12, 8(r9) lfd f13, 8(r29) fsub f11, f13, f11 fadd f4, f4, f5 fmul f4, f4, f1 fadd f5, f6, f9 fadd f5, f5, f10 fadd f5, f5, f12 fnmsub f0, f4, f5, f0 fnmsub f0, f11, f3, f0 stfd f0, 8(r24) lfd f0, 8(r8) fsub f4, f7, f8 fsub f5, f12, f10 fnmsub f0, f5, f2, f0 fnmsub f0, f4, f3, f0 stfd f0, 8(r2) addi r20, r20, 1 addi r2, r2, 8 addi r8, r8, 8 addi r10, r10, 8 addi r12, r12, 8 addi r6, r6, 8 addi r29, r29, 8 addi r28, r28, 8 addi r26, r26, 8 addi r25, r25, 8 addi r24, r24, 8 addi r5, r5, 8 addi r23, r23, 8 addi r22, r22, 8 addi r3, r3, 8 addi r9, r9, 8 addi r11, r11, 8 addi r30, r30, 8 addi r27, r27, 8 addi r21, r21, 8 addi r4, r4, 8 cmpw cr0, r20, r7 bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1 we now emit: .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2: ; no_exit.1.i19 lfdx f0, r21, r20 lfdx f4, r3, r20 lfdx f5, r27, r20 lfdx f6, r22, r20 lfdx f7, r5, r20 lfdx f8, r6, r20 lfdx f9, r30, r20 lfdx f10, r11, r20 lfdx f11, r12, r20 fsub f10, f10, f11 fadd f5, f4, f5 fmul f5, f5, f1 fadd f6, f6, f7 fadd f6, f6, f8 fadd f6, f6, f9 fmadd f0, f5, f6, f0 fnmsub f0, f10, f2, f0 stfdx f0, r4, r20 lfdx f0, r25, r20 lfdx f5, r26, r20 lfdx f6, r23, r20 lfdx f9, r28, r20 lfdx f10, r10, r20 lfdx f12, r9, r20 lfdx f13, r29, r20 fsub f11, f13, f11 fadd f4, f4, f5 fmul f4, f4, f1 fadd f5, f6, f9 fadd f5, f5, f10 fadd f5, f5, f12 fnmsub f0, f4, f5, f0 fnmsub f0, f11, f3, f0 stfdx f0, r24, r20 lfdx f0, r8, r20 fsub f4, f7, f8 fsub f5, f12, f10 fnmsub f0, f5, f2, f0 fnmsub f0, f4, f3, f0 stfdx f0, r2, r20 addi r19, r19, 1 addi r20, r20, 8 cmpw cr0, r19, r7 bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1 llvm-svn: 22722	2005-08-09 00:18:09 +00:00
Chris Lattner	37c24cc98c	Suck the base value out of the UsersToProcess vector into the BasedUser class to simplify the code. Fuse two loops. llvm-svn: 22721	2005-08-08 22:56:21 +00:00
Chris Lattner	37ed895bf1	Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The first is a correctness thing, and the later is an optzn thing. This also is needed to support a future change. llvm-svn: 22720	2005-08-08 22:32:34 +00:00
Chris Lattner	14203e85b2	Not all constants are legal immediates in load/store instructions. llvm-svn: 22704	2005-08-08 06:25:50 +00:00
Chris Lattner	c70bbc0c41	Implement LoopStrengthReduce/share_code_in_preheader.ll by having one rewriter for all code inserted into the preheader, which is never flushed. llvm-svn: 22702	2005-08-08 05:47:49 +00:00
Chris Lattner	9bfa6f8784	Implement a simple optimization for the termination condition of the loop. The termination condition actually wants to use the post-incremented value of the loop, not a new indvar with an unusual base. On PPC, for example, this allows us to compile LoopStrengthReduce/exit_compare_live_range.ll to: _foo: li r2, 0 .LBB_foo_1: ; no_exit li r5, 0 stw r5, 0(r3) addi r2, r2, 1 cmpw cr0, r2, r4 bne .LBB_foo_1 ; no_exit blr instead of: _foo: li r2, 1 ;; IV starts at 1, not 0 .LBB_foo_1: ; no_exit li r5, 0 stw r5, 0(r3) addi r5, r2, 1 cmpw cr0, r2, r4 or r2, r5, r5 ;; Reg-reg copy, extra live range bne .LBB_foo_1 ; no_exit blr This implements LoopStrengthReduce/exit_compare_live_range.ll llvm-svn: 22699	2005-08-08 05:28:22 +00:00
Chris Lattner	11e7a5eda7	Make sure to clean CastedPointers after casts are potentially deleted. This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas llvm-svn: 22673	2005-08-05 01:30:11 +00:00
Chris Lattner	45f8b6e7aa	Modify how immediates are removed from base expressions to deal with the fact that the symbolic evaluator is not always able to use subtraction to remove expressions. This makes the code faster, and fixes the last crash on 178.galgel. Finally, add a statistic to see how many phi nodes are inserted. On 178.galgel, we get the follow stats: 2562 loop-reduce - Number of PHIs inserted 3927 loop-reduce - Number of GEPs strength reduced llvm-svn: 22662	2005-08-04 22:34:05 +00:00
Chris Lattner	a6d7c355bc	* Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase method. * Fix a crash on 178.galgel, where we would insert expressions before PHI nodes instead of into the PHI node predecessor blocks. llvm-svn: 22657	2005-08-04 20:03:32 +00:00
Chris Lattner	0f7c0fa2a7	Fix a case that caused this to crash on 178.galgel llvm-svn: 22653	2005-08-04 19:26:19 +00:00
Chris Lattner	acc42c4df1	Teach LSR about loop-variant expressions, such as loops like this: for (i = 0; i < N; ++i) A[i][foo()] = 0; here we still want to strength reduce the A[i] part, even though foo() is l-v. This also simplifies some of the 'CanReduce' logic. This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll llvm-svn: 22652	2005-08-04 19:08:16 +00:00

1 2

82 Commits