Commit Graph

105 Commits

Author SHA1 Message Date
Reid Spencer bf96e02a54 For PR1097:
Enable complex addressing modes on 64-bit platforms involving two induction
variables by keeping a size and scale in 64-bits not 32.
Patch by Dan Gohman.

llvm-svn: 33011
2007-01-08 16:17:51 +00:00
Chris Lattner 3fe98ae10a no need to worry about int vs uint any more.
llvm-svn: 32946
2007-01-06 01:37:35 +00:00
Reid Spencer c635f47d9a For PR950:
This patch replaces signed integer types with signless ones:
1. [US]Byte -> Int8
2. [U]Short -> Int16
3. [U]Int   -> Int32
4. [U]Long  -> Int64.
5. Removal of isSigned, isUnsigned, getSignedVersion, getUnsignedVersion
   and other methods related to signedness. In a few places this warranted
   identifying the signedness information from other sources.

llvm-svn: 32785
2006-12-31 05:48:39 +00:00
Reid Spencer 266e42b312 For PR950:
This patch removes the SetCC instructions and replaces them with the ICmp
and FCmp instructions. The SetCondInst instruction has been removed and
been replaced with ICmpInst and FCmpInst.

llvm-svn: 32751
2006-12-23 06:05:41 +00:00
Chris Lattner 79a42ac941 Switch over Transforms/Scalar to use the STATISTIC macro. For each statistic
converted, we lose a static initializer.  This also allows GCC to emit warnings
about unused statistics.

llvm-svn: 32690
2006-12-19 21:40:18 +00:00
Reid Spencer df1f19a8ef Change the interface to SCEVExpander::InsertCastOfTo to take a cast opcode
so the decision of which opcode to use is pushed upward to the caller.
Adjust the callers to pass the expected opcode.

llvm-svn: 32535
2006-12-13 08:06:42 +00:00
Reid Spencer b341b0861d Change inferred getCast into specific getCast. Passes all tests.
llvm-svn: 32469
2006-12-12 05:05:00 +00:00
Bill Wendling f3baad3ee1 Changed llvm_ostream et all to OStream. llvm_cerr, llvm_cout, llvm_null, are
now cerr, cout, and NullStream resp.

llvm-svn: 32298
2006-12-07 01:30:32 +00:00
Chris Lattner 700b873130 Detemplatize the Statistic class. The only type it is instantiated with
is 'unsigned'.

llvm-svn: 32279
2006-12-06 17:46:33 +00:00
Reid Spencer 6c38f0bb07 For PR950:
The long awaited CAST patch. This introduces 12 new instructions into LLVM
to replace the cast instruction. Corresponding changes throughout LLVM are
provided. This passes llvm-test, llvm/test, and SPEC CPUINT2000 with the
exception of 175.vpr which fails only on a slight floating point output
difference.

llvm-svn: 31931
2006-11-27 01:05:10 +00:00
Bill Wendling 5dbf43c983 Removed #include <iostream> and replaced with llvm_* streams.
llvm-svn: 31923
2006-11-26 09:46:52 +00:00
Chris Lattner 21eba2da26 If an indvar with a variable stride is used by the exit condition, go ahead
and handle it like constant stride vars.  This fixes some bad codegen in
variable stride cases.  For example, it compiles this:

void foo(int k, int i) {
  for (k=i+i; k <= 8192; k+=i)
    flags2[k] = 0;
}

to:

LBB1_1: #bb.preheader
        movl %eax, %ecx
        addl %ecx, %ecx
        movl L_flags2$non_lazy_ptr, %edx
LBB1_2: #bb
        movb $0, (%edx,%ecx)
        addl %eax, %ecx
        cmpl $8192, %ecx
        jle LBB1_2      #bb
LBB1_5: #return
        ret

or (if the array is local and we are in dynamic-nonpic or static mode):

LBB3_2: #bb
        movb $0, _flags2(%ecx)
        addl %eax, %ecx
        cmpl $8192, %ecx
        jle LBB3_2      #bb

and:

        lis r2, ha16(L_flags2$non_lazy_ptr)
        lwz r2, lo16(L_flags2$non_lazy_ptr)(r2)
        slwi r3, r4, 1
LBB1_2: ;bb
        li r5, 0
        add r6, r4, r3
        stbx r5, r2, r3
        cmpwi cr0, r6, 8192
        bgt cr0, LBB1_5 ;return

instead of:

        leal (%eax,%eax,2), %ecx
        movl %eax, %edx
        addl %edx, %edx
        addl L_flags2$non_lazy_ptr, %edx
        xorl %esi, %esi
LBB1_2: #bb
        movb $0, (%edx,%esi)
        movl %eax, %edi
        addl %esi, %edi
        addl %ecx, %esi
        cmpl $8192, %esi
        jg LBB1_5       #return

and:

        lis r2, ha16(L_flags2$non_lazy_ptr)
        lwz r2, lo16(L_flags2$non_lazy_ptr)(r2)
        mulli r3, r4, 3
        slwi r5, r4, 1
        li r6, 0
        add r2, r2, r5
LBB1_2: ;bb
        li r5, 0
        add r7, r3, r6
        stbx r5, r2, r6
        add r6, r4, r6
        cmpwi cr0, r7, 8192
        ble cr0, LBB1_2 ;bb

This speeds up Benchmarks/Shootout/sieve from 8.533s to 6.464s and
implements LoopStrengthReduce/var_stride_used_by_compare.ll

llvm-svn: 31809
2006-11-17 06:17:33 +00:00
Reid Spencer de46e48420 For PR786:
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.

llvm-svn: 31380
2006-11-02 20:25:50 +00:00
Chris Lattner a6eb7e0803 break edges more intelligently
llvm-svn: 31257
2006-10-28 06:45:33 +00:00
Chris Lattner 5191c65485 prepare for a change I'm about to make
llvm-svn: 31248
2006-10-28 00:59:20 +00:00
Reid Spencer e0fc4dfc22 For PR950:
This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.

llvm-svn: 31063
2006-10-20 07:07:24 +00:00
Chris Lattner c2d3d3112e eliminate RegisterOpt. It does the same thing as RegisterPass.
llvm-svn: 29925
2006-08-27 22:42:52 +00:00
Chris Lattner 3d27be1333 s|llvm/Support/Visibility.h|llvm/Support/Compiler.h|
llvm-svn: 29911
2006-08-27 12:54:02 +00:00
Chris Lattner 3ff620178b Changes:
1. Update an obsolete comment.
  2. Make the sorting by base an explicit (though still N^2) step, so
     that the code is more clear on what it is doing.
  3. Partition uses so that uses inside the loop are handled before uses
     outside the loop.

Note that none of these changes currently changes the code inserted by LSR,
but they are a stepping stone to getting there.

This code is the result of some crazy pair programming with Nate. :)

llvm-svn: 29493
2006-08-03 06:34:50 +00:00
Evan Cheng e9c68f52e1 Only reuse a previous IV if it would not require a type conversion.
llvm-svn: 29186
2006-07-18 19:07:58 +00:00
Chris Lattner 996795b0dd Use hidden visibility to make symbols in an anonymous namespace get
dropped.  This shrinks libllvmgcc.dylib another 67K

llvm-svn: 28975
2006-06-28 23:17:24 +00:00
Evan Cheng 398f70292c RewriteExpr, either the new PHI node of induction variable or the
post-increment value, should be first cast to the appropriated type (to the
type of the common expr). Otherwise, the rewrite of a use based on (common +
iv) may end up with an incorrect type.

llvm-svn: 28735
2006-06-09 00:12:42 +00:00
Reid Spencer 13a1a7a4a6 Get rid of a signed/unsigned compare warning.
llvm-svn: 27625
2006-04-12 19:28:15 +00:00
Chris Lattner f365f5f0c1 Fix spello
llvm-svn: 27052
2006-03-24 07:14:34 +00:00
Chris Lattner 7d80b4f366 silence a bogus gcc warning
llvm-svn: 26953
2006-03-22 17:27:24 +00:00
Evan Cheng c28282bd87 - Fixed a bogus if condition.
- Added more debugging info.
- Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride.

llvm-svn: 26841
2006-03-18 08:03:12 +00:00
Evan Cheng f09f0ebd48 Sort StrideOrder so we can process the smallest strides first. This allows
for more IV reuses.

llvm-svn: 26837
2006-03-18 00:44:49 +00:00
Evan Cheng 4520698820 Allow users of iv / stride to be rewritten with expression that is a multiply
of a smaller stride even if they have a common loop invariant expression part.

llvm-svn: 26828
2006-03-17 19:52:23 +00:00
Evan Cheng 3df447d354 For each loop, keep track of all the IV expressions inserted indexed by
stride. For a set of uses of the IV of a stride which is a multiple
of another stride, do not insert a new IV expression. Rather, reuse the
previous IV and rewrite the uses as uses of IV expression multiplied by
the factor.

e.g.
x = 0 ...; x ++
y = 0 ...; y += 4
then use of y can be rewritten as use of 4*x for x86.

llvm-svn: 26803
2006-03-16 21:53:05 +00:00
Evan Cheng c567c4efbb Added target lowering hooks which LSR consults to make more intelligent
transformation decisions.

llvm-svn: 26738
2006-03-13 23:14:23 +00:00
Chris Lattner d30c4991a1 Use SCEVExpander::InsertCastOfTo instead of our own code. This reduces
#LLVM LOC, and auto-cse's cast instructions.

llvm-svn: 25974
2006-02-04 09:52:43 +00:00
Chris Lattner 2959f0003e Fix two significant bugs in LSR:
1. When rewriting code in outer loops, sometimes we would insert code into
   inner loops that is invariant in that loop.
2. Notice that 4*(2+x) is 8+4*x and use that to simplify expressions.

This is a performance neutral change.

llvm-svn: 25964
2006-02-04 07:36:50 +00:00
Chris Lattner c597b8a55e Make iostream #inclusion explicit
llvm-svn: 25514
2006-01-22 23:32:06 +00:00
Chris Lattner cb36710ff9 Switch these to using ETForest instead of DominatorSet to compute itself.
Patch written by Daniel Berlin!

llvm-svn: 25202
2006-01-11 05:10:20 +00:00
Chris Lattner 077200737c getRawValue zero extens for unsigned values, use getsextvalue so that we
know that small negative values fit into the immediate field of addressing
modes.

llvm-svn: 24608
2005-12-05 18:23:57 +00:00
Chris Lattner 5df0e36e98 My previous patch was too conservative. Reject FP and void types, but do
allow pointer types.

llvm-svn: 23859
2005-10-21 05:45:41 +00:00
Chris Lattner 0c0b38bb4c Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an
inner loop like this:

LBB_RateConvertMono8AltiVec_2:  ; no_exit
        lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
        lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
        fmr f3, f3
        fadd f0, f2, f0
        fadd f3, f0, f3
        fcmpu cr0, f3, f1
        bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit

to an inner loop like this:

LBB_RateConvertMono8AltiVec_1:  ; no_exit
        fsub f2, f2, f1
        fcmpu cr0, f2, f1
        fmr f0, f2
        bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit

Doh! good catch!

llvm-svn: 23838
2005-10-20 04:47:10 +00:00
Chris Lattner 192cd18f53 Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling
out CSE's of base expressions it could build a result whose order was
nondet.

llvm-svn: 23698
2005-10-11 18:41:04 +00:00
Chris Lattner 5c9d63da31 Fix another problem where LSR was being nondeterminstic. Also remove elements
from the end of a vector instead of the beginning

llvm-svn: 23697
2005-10-11 18:30:57 +00:00
Chris Lattner b7a3894e7c Fix another lsr-is-nondeterministic case
llvm-svn: 23695
2005-10-11 18:17:57 +00:00
Chris Lattner eb4be8b942 Hrm, you didn't see this.
llvm-svn: 23673
2005-10-09 06:24:02 +00:00
Chris Lattner 4ea0a3eaac Fix a source of non-determinism in the backend: the order of processing
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.

llvm-svn: 23672
2005-10-09 06:20:55 +00:00
Chris Lattner f07a587c79 Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In
particular, it should realize that phi's use their values in the pred block
not the phi block itself.  This change turns our em3d loop from this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_6    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
        addi r3, r2, 1
        blr
LBB_test_6:     ; loopexit
        or r3, r2, r2
        blr

into:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r6, r6
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        or r2, r6, r6
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r2, r2
        blr


Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow.  If it were doing its job right, it could turn the
code into this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r6, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r6, r6
        blr

... which I'll work on next. :)

llvm-svn: 23604
2005-10-03 02:50:05 +00:00
Chris Lattner e4ed42a426 Refactor some code into a function
llvm-svn: 23603
2005-10-03 01:04:44 +00:00
Chris Lattner 360928dbed This break is bogus and I have no idea why it was there. Basically it prevents
memoizing code when IV's are used by phinodes outside of loops.  In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):

        li r6, 0
        or r7, r6, r6
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r7, r7
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r2, r7, 1
        addi r7, r7, 1
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

Now we get:

        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

this was noticed in em3d.

llvm-svn: 23602
2005-10-03 00:37:33 +00:00
Chris Lattner 8fcce170cf when checking if we should move a split edge block outside of a loop,
check the presplit pred, not the post-split pred.  This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.

llvm-svn: 23601
2005-10-03 00:31:52 +00:00
Chris Lattner 92233d2175 Make the pass name simpler
llvm-svn: 23476
2005-09-27 21:10:32 +00:00
Chris Lattner fd018c8dfe Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.
This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.

llvm-svn: 23327
2005-09-13 02:09:55 +00:00
Chris Lattner 8048b85e8f Fix a regression from last night, which caused this pass to create invalid
code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.

Also use a new LoopInfo method to simplify some code.

This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll

llvm-svn: 23318
2005-09-12 17:11:27 +00:00
Chris Lattner a67648396a _test:
li r2, 0
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r2, 1
        stw r2, 0(r4)
        blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):

_test:
        li r2, 0                 **** IV starts at 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2            **** Copy for loop exit
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2           **** IV+2
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2       ****  IV+2
        stw r2, 0(r4)
        blr

And now generated code like this:

_test:
        li r2, 1               *** IV starts at 1
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701     *** IV.postinc + 0
        blt cr0, LBB_test_1
LBB_test_2:     ; loopexit.2.loopexit
        stw r2, 0(r4)          *** IV.postinc + 0
        blr

llvm-svn: 23313
2005-09-12 06:04:47 +00:00