Commit Graph

1467 Commits

Author SHA1 Message Date
Chris Lattner fbb77a408b Implement InstCombine/vec_shuffle.ll:test[12]
llvm-svn: 27571
2006-04-10 22:45:52 +00:00
Chris Lattner 17bd60588c Add supprot for shufflevector
llvm-svn: 27513
2006-04-08 01:19:12 +00:00
Chris Lattner e79d249c29 Lower vperm(x,y, mask) -> shuffle(x,y,mask) if mask is constant. This allows
us to compile oh-so-realistic stuff like this:

 vec_vperm(A, B, (vector unsigned char){14});

to:
        vspltb v0, v0, 14

instead of:

        vspltisb v0, 14
        vperm v0, v2, v1, v0

llvm-svn: 27452
2006-04-06 19:19:17 +00:00
Chris Lattner caba72b6ff vector casts of casts are eliminable. Transform this:
%tmp = cast <4 x uint> %tmp to <4 x int>                ; <<4 x int>> [#uses=1]
        %tmp = cast <4 x int> %tmp to <4 x float>               ; <<4 x float>> [#uses=1]

into:

        %tmp = cast <4 x uint> %tmp to <4 x float>              ; <<4 x float>> [#uses=1]

llvm-svn: 27355
2006-04-02 05:43:13 +00:00
Chris Lattner ebca476b27 Allow transforming this:
%tmp = cast <4 x uint>* %testData to <4 x int>*         ; <<4 x int>*> [#uses=1]
        %tmp = load <4 x int>* %tmp             ; <<4 x int>> [#uses=1]

to this:

        %tmp = load <4 x uint>* %testData               ; <<4 x uint>> [#uses=1]
        %tmp = cast <4 x uint> %tmp to <4 x int>                ; <<4 x int>> [#uses=1]

llvm-svn: 27353
2006-04-02 05:37:12 +00:00
Chris Lattner f42d0aeda1 Turn altivec lvx/stvx intrinsics into loads and stores. This allows the
elimination of one load from this:

int AreSecondAndThirdElementsBothNegative( vector float *in ) {
#define QNaN 0x7FC00000
const vector unsigned int testData = (vector unsigned int)( QNaN, 0, 0, QNaN );
vector float test = vec_ld( 0, (float*) &testData );
return ! vec_any_ge( test, *in );
}

Now generating:

_AreSecondAndThirdElementsBothNegative:
        mfspr r2, 256
        oris r4, r2, 49152
        mtspr 256, r4
        li r4, lo16(LCPI1_0)
        lis r5, ha16(LCPI1_0)
        addi r6, r1, -16
        lvx v0, r5, r4
        stvx v0, 0, r6
        lvx v1, 0, r3
        vcmpgefp. v0, v0, v1
        mfcr r3, 2
        rlwinm r3, r3, 27, 31, 31
        xori r3, r3, 1
        cntlzw r3, r3
        srwi r3, r3, 5
        mtspr 256, r2
        blr

llvm-svn: 27352
2006-04-02 05:30:25 +00:00
Chris Lattner 6cf4914fd4 Fix InstCombine/2006-04-01-InfLoop.ll
llvm-svn: 27330
2006-04-01 22:05:01 +00:00
Chris Lattner dcd0792622 Fold A^(B&A) -> (B&A)^A
Fold (B&A)^A == ~B & A

This implements InstCombine/xor.ll:test2[56]

llvm-svn: 27328
2006-04-01 08:03:55 +00:00
Chris Lattner 8d1d8d364c If we can look through vector operations to find the scalar version of an
extract_element'd value, do so.

llvm-svn: 27323
2006-03-31 23:01:56 +00:00
Chris Lattner 92346c315e extractelement(undef,x) -> undef
llvm-svn: 27300
2006-03-31 18:25:14 +00:00
Chris Lattner 612fa8e6f3 Fix Transforms/InstCombine/2006-03-30-ExtractElement.ll
llvm-svn: 27261
2006-03-30 22:02:40 +00:00
Chris Lattner d70d9f5b24 Don't crash on packed logical ops
llvm-svn: 27125
2006-03-25 21:58:26 +00:00
Chris Lattner f365f5f0c1 Fix spello
llvm-svn: 27052
2006-03-24 07:14:34 +00:00
Chris Lattner 5821a6a17a add the actual cost to the debug info
llvm-svn: 27051
2006-03-24 07:14:00 +00:00
Jim Laskey 83f99115db Can't combine anymore - we don't have a chain through llvm.dbg intrinsics.
llvm-svn: 26992
2006-03-23 18:10:42 +00:00
Chris Lattner 7d80b4f366 silence a bogus gcc warning
llvm-svn: 26953
2006-03-22 17:27:24 +00:00
Chris Lattner d783c76c18 Teach cee to propagate through switch statements. This implements
Transforms/CorrelatedExprs/switch.ll

Patch contributed by Eric Kidd!

llvm-svn: 26872
2006-03-19 19:37:24 +00:00
Evan Cheng c28282bd87 - Fixed a bogus if condition.
- Added more debugging info.
- Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride.

llvm-svn: 26841
2006-03-18 08:03:12 +00:00
Evan Cheng f09f0ebd48 Sort StrideOrder so we can process the smallest strides first. This allows
for more IV reuses.

llvm-svn: 26837
2006-03-18 00:44:49 +00:00
Evan Cheng 4520698820 Allow users of iv / stride to be rewritten with expression that is a multiply
of a smaller stride even if they have a common loop invariant expression part.

llvm-svn: 26828
2006-03-17 19:52:23 +00:00
Evan Cheng 3df447d354 For each loop, keep track of all the IV expressions inserted indexed by
stride. For a set of uses of the IV of a stride which is a multiple
of another stride, do not insert a new IV expression. Rather, reuse the
previous IV and rewrite the uses as uses of IV expression multiplied by
the factor.

e.g.
x = 0 ...; x ++
y = 0 ...; y += 4
then use of y can be rewritten as use of 4*x for x86.

llvm-svn: 26803
2006-03-16 21:53:05 +00:00
Chris Lattner c5f866bb4a Implement a FIXME, recusively reassociating
A*A*B + A*A*C   -->   A*(A*B+A*C)   -->   A*(A*(B+C))

This implements Reassociate/mul-factor3.ll

llvm-svn: 26757
2006-03-14 16:04:29 +00:00
Chris Lattner 2fc319d444 extract some code into a method, no functionality change
llvm-svn: 26755
2006-03-14 07:11:11 +00:00
Chris Lattner d6bde46d85 Promote shifts by a constant to multiplies so that we can reassociate
(x<<1)+(y<<1) -> (X+Y)<<1.  This implements
Transforms/Reassociate/shift-factor.ll

llvm-svn: 26753
2006-03-14 06:55:18 +00:00
Evan Cheng c567c4efbb Added target lowering hooks which LSR consults to make more intelligent
transformation decisions.

llvm-svn: 26738
2006-03-13 23:14:23 +00:00
Chris Lattner fc34f8bb48 Fix a miscompilation of 188.ammp with the new CFE. 188.ammp is accessing
arrays out of range in a horrible way, but we shouldn't break it anyway.
Details in the comments.

llvm-svn: 26606
2006-03-08 01:05:29 +00:00
Chris Lattner 53ef5a032c Teach the alignment handling code to look through constant expr casts and GEPs
llvm-svn: 26580
2006-03-07 01:28:57 +00:00
Chris Lattner 82f2ef20b6 Teach instcombine to increase the alignment of memset/memcpy/memmove when
the pointer is known to come from either a global variable, alloca or
malloc.  This allows us to compile this:

  P = malloc(28);
  memset(P, 0, 28);

into explicit stores on PPC instead of a memset call.

llvm-svn: 26577
2006-03-06 20:18:44 +00:00
Chris Lattner 6bc98653c2 Make vector narrowing more effective, implementing
Transforms/InstCombine/vec_narrow.ll.  This add support for narrowing
extract_element(insertelement) also.

llvm-svn: 26538
2006-03-05 00:22:33 +00:00
Chris Lattner 4c065091d8 Add factoring of multiplications, e.g. turning A*A+A*B into A*(A+B).
Testcase here: Transforms/Reassociate/mulfactor.ll

llvm-svn: 26524
2006-03-04 09:31:13 +00:00
Chris Lattner 32c01df299 Canonicalize (X+C1)*C2 -> X*C2+C1*C2
This implements Transforms/InstCombine/add.ll:test31

llvm-svn: 26519
2006-03-04 06:04:02 +00:00
Chris Lattner 681ef2f083 Change this to work with renamed intrinsics.
llvm-svn: 26484
2006-03-03 01:34:17 +00:00
Chris Lattner 85dda9a2bd Generalize the REM folding code to handle another case Nick Lewycky
pointed out: realize the AND can provide factors and look through Casts.

llvm-svn: 26469
2006-03-02 06:50:58 +00:00
Chris Lattner c5b6c9a12a Fix a regression in a patch from a couple of days ago. This fixes
Transforms/InstCombine/2006-02-28-Crash.ll

llvm-svn: 26427
2006-02-28 19:47:20 +00:00
Chris Lattner b70f141893 Implement rem.ll:test[7-9] and PR712
llvm-svn: 26415
2006-02-28 05:49:21 +00:00
Chris Lattner 2a7c7b8bab Simplify some code now that the RHS of a rem can't be 0
llvm-svn: 26413
2006-02-28 05:40:55 +00:00
Chris Lattner 0de4a8d7b7 Rearrange some code, fold "rem X, 0", implementing rem.ll:test6
llvm-svn: 26411
2006-02-28 05:30:45 +00:00
Chris Lattner c7bfed0f7b Merge two almost-identical pieces of code.
Make this code more powerful by using ComputeMaskedBits instead of looking
for an AND operand.  This lets us fold this:

int %test23(int %a) {
        %tmp.1 = and int %a, 1
        %tmp.2 = seteq int %tmp.1, 0
        %tmp.3 = cast bool %tmp.2 to int  ;; xor tmp1, 1
        ret int %tmp.3
}

into: xor (and a, 1), 1
llvm-svn: 26396
2006-02-27 02:38:23 +00:00
Chris Lattner f5c8a0b83f Fold (A^B) == A -> B == 0
and  (A-B) == A  ->  B == 0

llvm-svn: 26394
2006-02-27 01:44:11 +00:00
Chris Lattner f78df7c14d Fold (X|C1)^C2 -> X^(C1|C2) when possible. This implements
InstCombine/or.ll:test23.

llvm-svn: 26385
2006-02-26 19:57:54 +00:00
Chris Lattner b580d26e7d Fix a problem that Nate noticed that boils down to an over conservative check
in the code that does "select C, (X+Y), (X-Y) --> (X+(select C, Y, (-Y)))".
We now compile this loop:

LBB1_1: ; no_exit
        add r6, r2, r3
        subf r3, r2, r3
        cmpwi cr0, r2, 0
        addi r7, r5, 4
        lwz r2, 0(r5)
        addi r4, r4, 1
        blt cr0, LBB1_4 ; no_exit
LBB1_3: ; no_exit
        mr r3, r6
LBB1_4: ; no_exit
        cmpwi cr0, r4, 16
        mr r5, r7
        bne cr0, LBB1_1 ; no_exit

into this instead:

LBB1_1: ; no_exit
        srawi r6, r2, 31
        add r2, r2, r6
        xor r6, r2, r6
        addi r7, r5, 4
        lwz r2, 0(r5)
        addi r4, r4, 1
        add r3, r3, r6
        cmpwi cr0, r4, 16
        mr r5, r7
        bne cr0, LBB1_1 ; no_exit

llvm-svn: 26356
2006-02-24 18:05:58 +00:00
Chris Lattner e5521db5bc Fix Regression/Transforms/LoopUnswitch/2006-02-22-UnswitchCrash.ll, which
caused SPASS to fail building last night.

We can't trivially unswitch a loop if the exit block has phi nodes in it,
because we don't know which predecessor to use.

llvm-svn: 26320
2006-02-22 23:55:00 +00:00
Chris Lattner 8a5a324dac Add some comments, simplify some code, and fix a bug that caused rewriting
to rewrite with the wrong value.

llvm-svn: 26311
2006-02-22 06:37:14 +00:00
Chris Lattner c2e3a7a4ce improved support for branch folding, still not enabled.
llvm-svn: 26289
2006-02-18 07:57:38 +00:00
Jeff Cohen 0add83e969 Fix bugs identified by VC++.
llvm-svn: 26287
2006-02-18 03:20:33 +00:00
Chris Lattner 19fa8ac938 Implement deletion of dead blocks, currently disabled.
llvm-svn: 26285
2006-02-18 02:42:34 +00:00
Chris Lattner cb853de534 a previous patch completely disabled trivial unswitching, this fixees it.
Thanks to nate for pointing this out :)

llvm-svn: 26280
2006-02-18 01:32:04 +00:00
Chris Lattner 29f771ba21 initial trivial support for folding branches that have now-constant destinations.
llvm-svn: 26279
2006-02-18 01:27:45 +00:00
Chris Lattner 8e44ff50b0 When unswitching a loop, make sure to update loop info with exit blocks in
the right loop.

llvm-svn: 26277
2006-02-18 00:55:32 +00:00
Chris Lattner baddba41c7 Fix loops where the header has an exit, fixing a loop-unswitch crash on crafty
llvm-svn: 26258
2006-02-17 06:39:56 +00:00
Chris Lattner 6fd136239b start of some new simplification code, not thoroughly tested, use at your own
risk :)

llvm-svn: 26248
2006-02-17 00:31:07 +00:00
Nate Begeman 8a77efe4f7 Rework the SelectionDAG-based implementations of SimplifyDemandedBits
and ComputeMaskedBits to match the new improved versions in instcombine.
Tested against all of multisource/benchmarks on ppc.

llvm-svn: 26238
2006-02-16 21:11:51 +00:00
Chris Lattner fa335f6083 Change SplitBlock to increment a BasicBlock::iterator, not an Instruction*. Apparently they do different things :)
This fixes a testcase that nate reduced from spass.

Also included are a couple minor code changes that don't affect the generated
code at all.

llvm-svn: 26235
2006-02-16 19:36:22 +00:00
Jeff Cohen 55f63f1b53 Fix VC++ warning.
llvm-svn: 26228
2006-02-16 04:07:37 +00:00
Chris Lattner ff42e81028 fix a bug where we unswitched the wrong way
llvm-svn: 26225
2006-02-16 01:24:41 +00:00
Chris Lattner fdff0bb43e Implement trivial unswitching for switch stmts. This allows us to trivial
unswitch this loop on 2 before sweating to unswitch on 1/3.

void test4(int N, int i, int C, int*P, int*Q) {
  int j;
  for (j = 0; j < N; ++j) {
    switch (C) {                // general unswitching.
    default: P[i+j] = 0; break;
    case 1: Q[i+j] = 0; break;
    case 3: P[i+j] = Q[i+j]; break;
    case 2: break;              //  TRIVIAL UNSWITCH on C==2
    }
  }
}

llvm-svn: 26223
2006-02-15 22:52:05 +00:00
Chris Lattner e5cb76d744 make "trivial" unswitching significantly more general. It can now handle
this for example:

  for (j = 0; j < N; ++j) {     // trivial unswitch
    if (C)
      P[i+j] = 0;
  }

turning it into the obvious code without bothering to duplicate an empty loop.

llvm-svn: 26220
2006-02-15 22:03:36 +00:00
Chris Lattner 65152d80ec Checking the wrong value. This caused us to emit silly code like
Y = seteq bool X, true
instead of just using X :)

llvm-svn: 26215
2006-02-15 19:05:52 +00:00
Chris Lattner 01db04efb0 more refactoring, no functionality change.
llvm-svn: 26194
2006-02-15 01:44:42 +00:00
Chris Lattner b0cbe7106e pull some code out into a function
llvm-svn: 26191
2006-02-15 00:07:43 +00:00
Chris Lattner 0b8ec1a132 Use statistics to keep track of what flavors of loops we are unswitching
llvm-svn: 26157
2006-02-14 01:01:41 +00:00
Chris Lattner 8b10ab3002 Implement Instcombine/and.ll:test34
llvm-svn: 26155
2006-02-13 23:07:23 +00:00
Chris Lattner 7d8522884b If any of the sign extended bits are demanded, the input sign bit is demanded
for a sign extension.

This fixes InstCombine/2006-02-13-DemandedMiscompile.ll and Ptrdist/bc.

llvm-svn: 26152
2006-02-13 22:41:07 +00:00
Chris Lattner 68e7475777 Be careful not to request or look at bits shifted in from outside the size
of the input.  This fixes the mediabench/gsm/toast failure last night.

llvm-svn: 26138
2006-02-13 06:09:08 +00:00
Chris Lattner f5b4ef7f58 remove some more dead special case code
llvm-svn: 26135
2006-02-12 08:07:37 +00:00
Chris Lattner 5b2edb1fca Eliminate special case hacks that are superceded by general purpose hacks
llvm-svn: 26134
2006-02-12 08:02:11 +00:00
Chris Lattner ee0f280743 Three changes:
1. Teach GetConstantInType to handle boolean constants.
2. Teach instcombine to fold (compare X, CST) when X has known 0/1 bits.
   Testcase here: set.ll:test22
3. Improve the "(X >> c1) & C2 == 0" folding code to allow a noop cast
   between the shift and and.  More aggressive bitfolding for other reasons
   was turning signed shr's into unsigned shr's, leaving the noop cast in
   the way.

llvm-svn: 26131
2006-02-12 02:07:56 +00:00
Chris Lattner 0157e7f55b Port the recent innovations in ComputeMaskedBits to SimplifyDemandedBits.
This allows us to simplify on conditions where bits are not known, but they
are not demanded either!  This also fixes a couple of bugs in
ComputeMaskedBits that were exposed during this work.

In the future, swaths of instcombine should be removed, as this code
subsumes a bunch of ad-hockery.

llvm-svn: 26122
2006-02-11 09:31:47 +00:00
Chris Lattner fbadd7e1ee implement unswitching of loops with switch stmts and selects in them
llvm-svn: 26114
2006-02-11 00:43:37 +00:00
Chris Lattner f1b151684d Update PHI nodes in successors of exit blocks.
llvm-svn: 26113
2006-02-10 23:26:14 +00:00
Chris Lattner fe4151efe7 Reform the unswitching code in terms of edge splitting, not block splitting.
llvm-svn: 26112
2006-02-10 23:16:39 +00:00
Chris Lattner ec6b40a093 Fix a case where UnswitchTrivialCondition broke critical edges with
phi's in the successors

llvm-svn: 26108
2006-02-10 19:08:15 +00:00
Chris Lattner 6e263155a6 add some notes, move some code around. Implement unswitching of loops
with branches on partially invariant computations.

llvm-svn: 26104
2006-02-10 02:30:37 +00:00
Chris Lattner 4935417a84 Move code around to be more logical, no functionality change.
llvm-svn: 26103
2006-02-10 02:01:22 +00:00
Chris Lattner 3fc3148b85 When unswitching a trivial loop, do admit we are doing it! :)
llvm-svn: 26102
2006-02-10 01:36:35 +00:00
Chris Lattner ed7a67b0de Implement unconditional unswitching of 'trivial' loops, those loops that contain
branches in their entry block that control whether or not the loop is a noop or not.

llvm-svn: 26101
2006-02-10 01:24:09 +00:00
Chris Lattner 4f0e66df6a Simplify control flow a bit, note that unswitch preserves canonical loop form
llvm-svn: 26098
2006-02-09 22:15:42 +00:00
Chris Lattner 8976219850 Make the threshold a parameter
llvm-svn: 26093
2006-02-09 20:15:48 +00:00
Chris Lattner 2826e0511b Simplify the loop-unswitch pass, by not even trying to unswitch loops with
uses of loop values outside the loop.  We need loop-closed SSA form to do
this right, or to use SSA rewriting if we really care.

llvm-svn: 26089
2006-02-09 19:14:52 +00:00
Chris Lattner 24cd2fa269 Fix 80-column violations
llvm-svn: 26088
2006-02-09 07:41:14 +00:00
Chris Lattner 4534dd59a3 Enhance MVIZ in three ways:
1. Teach it new tricks: in particular how to propagate through signed shr and sexts.
2. Teach it to return a bitset of known-1 and known-0 bits, instead of just zero.
3. Teach instcombine (AND X, C) to fold when we know all C bits of X.

This implements Regression/Transforms/InstCombine/bittest.ll, and allows
future things to be simplified.

llvm-svn: 26087
2006-02-09 07:38:58 +00:00
Chris Lattner ab2dc4d70d Simplify some code, reducing calls to MaskedValueIsZero. Implement a minor
optimization where we reduce the number of bits in AND masks when possible.

llvm-svn: 26056
2006-02-08 07:34:50 +00:00
Chris Lattner 5997cf9381 Use EraseInstFromFunction in a few cases to put the uses of the removed
instruction onto the worklist (in case they are now dead).

Add a really trivial local DSE implementation to help out bitfield code.
We now fold this:

struct S {
    unsigned char a : 1, b : 1, c : 1, d : 2, e : 3;
    S();
};

S::S() : a(0), b(0), c(1), d(0), e(6) {}

to this:

void %_ZN1SC1Ev(%struct.S* %this) {
entry:
        %tmp.1 = getelementptr %struct.S* %this, int 0, uint 0
        store ubyte 38, ubyte* %tmp.1
        ret void
}

much earlier (in gccas instead of only in gccld after DSE runs).

llvm-svn: 26050
2006-02-08 03:25:32 +00:00
Chris Lattner 06a0ed1ee0 Implement some more interesting select sccp cases. This implements:
test/Regression/Transforms/SCCP/select.ll

llvm-svn: 26049
2006-02-08 02:38:11 +00:00
Chris Lattner ddba3289b5 Fix a problem in my patch yesterday, causing a miscompilation of 176.gcc
llvm-svn: 26045
2006-02-08 01:20:23 +00:00
Chris Lattner 44314827d6 Fix Transforms/InstCombine/2006-02-07-SextZextCrash.ll
llvm-svn: 26040
2006-02-07 19:07:40 +00:00
Chris Lattner 92a6865321 Generalize MaskedValueIsZero into a ComputeMaskedNonZeroBits function, which
is just as efficient as MVIZ and is also more general.

Fix a few minor bugs introduced in recent patches

llvm-svn: 26036
2006-02-07 08:05:22 +00:00
Chris Lattner c3ebf40031 Make MaskedValueIsZero take a uint64_t instead of a ConstantIntegral as a
mask.  This allows the code to be simpler and more efficient.

Also, generalize some of the cases in MVIZ a bit, making it slightly more aggressive.

llvm-svn: 26035
2006-02-07 07:27:52 +00:00
Chris Lattner 77defbae0a Use Type::getIntegralTypeMask() to simplify some code
llvm-svn: 26034
2006-02-07 07:00:41 +00:00
Chris Lattner 2590e511d8 Implement the beginnings of a facility for simplifying expressions based on
'demanded bits', inspired by Nate's work in the dag combiner.  This isn't
complete, but needs to unrelated instcombiner changes to continue.

llvm-svn: 26033
2006-02-07 06:56:34 +00:00
Chris Lattner 2e90b732fa Turn A % (C << N), where C is 2^k, into A & ((C << N)-1) [urem only].
Turn A / (C1 << N), where C1 is "1<<C2" into A >> (N+C2) [udiv only].

Tested with: rem.ll:test5, div.ll:test10

llvm-svn: 26003
2006-02-05 07:54:04 +00:00
Chris Lattner d30c4991a1 Use SCEVExpander::InsertCastOfTo instead of our own code. This reduces
#LLVM LOC, and auto-cse's cast instructions.

llvm-svn: 25974
2006-02-04 09:52:43 +00:00
Chris Lattner 2959f0003e Fix two significant bugs in LSR:
1. When rewriting code in outer loops, sometimes we would insert code into
   inner loops that is invariant in that loop.
2. Notice that 4*(2+x) is 8+4*x and use that to simplify expressions.

This is a performance neutral change.

llvm-svn: 25964
2006-02-04 07:36:50 +00:00
Jeff Cohen 15a8c15a1f Improve compatibility with VC2005, patch by Morten Ofstad!
llvm-svn: 25661
2006-01-26 20:41:32 +00:00
Chris Lattner c0f633a598 Fix Regression/Transforms/ScalarRepl/2006-01-24-IllegalUnionPromoteCrash.ll
llvm-svn: 25587
2006-01-24 19:36:27 +00:00
Chris Lattner c597b8a55e Make iostream #inclusion explicit
llvm-svn: 25514
2006-01-22 23:32:06 +00:00
Chris Lattner e154abf9b3 Implement casts.ll:test26: a cast from float -> double -> integer, doesn't
need the float->double part.

llvm-svn: 25452
2006-01-19 07:40:22 +00:00
Robert Bocchino 6dce25019d Lowerpacked and SCCP support for the insertelement operation.
llvm-svn: 25406
2006-01-17 20:06:55 +00:00
Chris Lattner 307b7ea15f fix a crash due to missing parens
llvm-svn: 25363
2006-01-16 19:47:21 +00:00
Chris Lattner 0de2c7d3d8 This pass has never worked correctly. Remove.
llvm-svn: 25349
2006-01-16 01:06:00 +00:00
Chris Lattner ef530c24c1 FunctionPass's cannot do IPO things.
llvm-svn: 25315
2006-01-14 19:30:35 +00:00
Robert Bocchino a83529678e Added instcombine support for extractelement.
llvm-svn: 25299
2006-01-13 22:48:06 +00:00
Chris Lattner 503221f5c5 Do a simple instcombine xforms to delete llvm.stackrestore cases.
llvm-svn: 25294
2006-01-13 21:28:09 +00:00
Chris Lattner c66b223b28 Simplify this a tiny bit by using the new IntrinsicInst functionality.
llvm-svn: 25292
2006-01-13 20:11:04 +00:00
Chris Lattner cb36710ff9 Switch these to using ETForest instead of DominatorSet to compute itself.
Patch written by Daniel Berlin!

llvm-svn: 25202
2006-01-11 05:10:20 +00:00
Chris Lattner 48e4a2ebd8 Switch this to using ETForest instead of DominatorSet to compute itself.
Patch written by Daniel Berlin!

llvm-svn: 25201
2006-01-11 05:09:40 +00:00
Robert Bocchino bd518d153b Added lower packed support for the extractelement operation.
llvm-svn: 25180
2006-01-10 19:05:05 +00:00
Chris Lattner 9cbfbc21bb fix some 176.gcc miscompilation from my previous patch.
llvm-svn: 25137
2006-01-07 01:32:28 +00:00
Chris Lattner 330628a6d8 silence some bogus gcc warnings on fenris
llvm-svn: 25130
2006-01-06 17:59:59 +00:00
Chris Lattner eb372a0276 Enhance the shift-shift folding code to allow a no-op cast to occur in between
the shifts.

This allows us to fold this (which is the 'integer add a constant' sequence
from cozmic's scheme compmiler):

int %x(uint %anf-temporary776) {
        %anf-temporary777 = shr uint %anf-temporary776, ubyte 1
        %anf-temporary800 = cast uint %anf-temporary777 to int
        %anf-temporary804 = shl int %anf-temporary800, ubyte 1
        %anf-temporary805 = add int %anf-temporary804, -2
        %anf-temporary806 = or int %anf-temporary805, 1
        ret int %anf-temporary806
}

into this:

int %x(uint %anf-temporary776) {
        %anf-temporary776 = cast uint %anf-temporary776 to int
        %anf-temporary776.mask1 = add int %anf-temporary776, -2
        %anf-temporary805 = or int %anf-temporary776.mask1, 1
        ret int %anf-temporary805
}

note that instcombine already knew how to eliminate the AND that the two
shifts fold into.  This is tested by InstCombine/shift.ll:test26

-Chris

llvm-svn: 25128
2006-01-06 07:52:12 +00:00
Chris Lattner b330939d90 Simplify the code a bit more
llvm-svn: 25126
2006-01-06 07:22:22 +00:00
Chris Lattner 145539343f Extract a bunch of code out of visitShiftInst into FoldShiftByConstant. No
functionality changes.

llvm-svn: 25125
2006-01-06 07:12:35 +00:00
Duraid Madina 7a3ad6cae2 getting there...
llvm-svn: 25021
2005-12-26 13:48:44 +00:00
Chris Lattner 8c9e14620f Fix Transforms/ScalarRepl/2005-12-14-UnionPromoteCrash.ll, a crash on undefined
behavior in 126.gcc on big-endian systems.

llvm-svn: 24708
2005-12-14 17:23:59 +00:00
Chris Lattner 3b0a62d8a5 Implement a little hack for parity with GCC on crafty. This speeds up
186.crafty by about 16% (from 15.109s to 13.045s) on my system.

This turns allocas with unions/casts into scalars.  For example crafty has
something like this:

    union doub {
      unsigned short i[4];
      long long d;
    };
int f(long long a) {
  return ((union doub){.d=a}).i[1];
}

Instead of generating loads and stores to an alloca, we now promote the
whole thing to a scalar long value.

This implements: Transforms/ScalarRepl/AggregatePromote.ll

llvm-svn: 24667
2005-12-12 07:19:13 +00:00
Chris Lattner 077200737c getRawValue zero extens for unsigned values, use getsextvalue so that we
know that small negative values fit into the immediate field of addressing
modes.

llvm-svn: 24608
2005-12-05 18:23:57 +00:00
Chris Lattner dc4ffef633 Fix a bug where we didn't realize that vaarg reads memory. This fixes
Transforms/DeadStoreElimination/2005-11-30-vaarg.ll

llvm-svn: 24545
2005-11-30 19:38:22 +00:00
Andrew Lenharth 5fc3794e71 since reg2mem requires it, might as well mention that it preserves it
llvm-svn: 24491
2005-11-25 16:04:54 +00:00
Andrew Lenharth 061029dee2 Reg2Mem is something a pass may depend on, so allow that
llvm-svn: 24488
2005-11-22 22:14:23 +00:00
Andrew Lenharth 71b09bbb07 turns out, demotion and invokes and critical edges don't mix
llvm-svn: 24487
2005-11-22 21:45:19 +00:00
Chris Lattner 9c37f23645 Fix a crash building 176.gcc due to my recent patch, which only fixed
half the problem.

llvm-svn: 24414
2005-11-18 18:30:47 +00:00
Chris Lattner bca0be812d This was checking the wrong GEP expression. Fixing this fixes a gccas crash
compiling mysql reported by Ted Kremenek.

llvm-svn: 24402
2005-11-17 19:35:42 +00:00
Andrew Lenharth d9c13b1336 the pain isn't gone unless the phinodes are spilled too
llvm-svn: 24288
2005-11-10 19:39:09 +00:00
Andrew Lenharth 8e66c0c8a9 this works with backedges to the existing entry block alot better
llvm-svn: 24270
2005-11-10 17:35:34 +00:00
Andrew Lenharth 4130a4f061 The pass everyone has been waiting for!
Reg2Mem

for fun you can opt -reg2mem -mem2reg

llvm-svn: 24267
2005-11-10 01:58:38 +00:00
Nate Begeman 848622f87f Add support alignment of allocation instructions.
Add support for specifying alignment and size of setjmp jmpbufs.

No targets currently do anything with this information, nor is it presrved
in the bytecode representation.  That's coming up next.

llvm-svn: 24196
2005-11-05 09:21:28 +00:00
Chris Lattner 16b29e9562 Implement Transforms/TailCallElim/return-undef.ll, a trivial case
that has been sitting in my inbox since May 18. :)

llvm-svn: 24194
2005-11-05 08:21:11 +00:00
Chris Lattner dd0c174082 Turn sdiv into udiv if both operands have a clear sign bit. This occurs
a few times in crafty:

OLD:    %tmp.36 = div int %tmp.35, 8            ; <int> [#uses=1]
NEW:    %tmp.36 = div uint %tmp.35, 8           ; <uint> [#uses=0]
OLD:    %tmp.19 = div int %tmp.18, 8            ; <int> [#uses=1]
NEW:    %tmp.19 = div uint %tmp.18, 8           ; <uint> [#uses=0]
OLD:    %tmp.117 = div int %tmp.116, 8          ; <int> [#uses=1]
NEW:    %tmp.117 = div uint %tmp.116, 8         ; <uint> [#uses=0]
OLD:    %tmp.92 = div int %tmp.91, 8            ; <int> [#uses=1]
NEW:    %tmp.92 = div uint %tmp.91, 8           ; <uint> [#uses=0]

Which all turn into shrs.

llvm-svn: 24190
2005-11-05 07:40:31 +00:00
Chris Lattner e9ff0eaf5b Turn srem -> urem when neither input has their sign bit set. This triggers
8 times in vortex, allowing the srems to be turned into shrs:

OLD:    %tmp.104 = rem int %tmp.5.i37, 16               ; <int> [#uses=1]
NEW:    %tmp.104 = rem uint %tmp.5.i37, 16              ; <uint> [#uses=0]
OLD:    %tmp.98 = rem int %tmp.5.i24, 16                ; <int> [#uses=1]
NEW:    %tmp.98 = rem uint %tmp.5.i24, 16               ; <uint> [#uses=0]
OLD:    %tmp.91 = rem int %tmp.5.i19, 8         ; <int> [#uses=1]
NEW:    %tmp.91 = rem uint %tmp.5.i19, 8                ; <uint> [#uses=0]
OLD:    %tmp.88 = rem int %tmp.5.i14, 8         ; <int> [#uses=1]
NEW:    %tmp.88 = rem uint %tmp.5.i14, 8                ; <uint> [#uses=0]
OLD:    %tmp.85 = rem int %tmp.5.i9, 1024               ; <int> [#uses=2]
NEW:    %tmp.85 = rem uint %tmp.5.i9, 1024              ; <uint> [#uses=0]
OLD:    %tmp.82 = rem int %tmp.5.i, 512         ; <int> [#uses=2]
NEW:    %tmp.82 = rem uint %tmp.5.i1, 512               ; <uint> [#uses=0]
OLD:    %tmp.48.i = rem int %tmp.5.i.i161, 4            ; <int> [#uses=1]
NEW:    %tmp.48.i = rem uint %tmp.5.i.i161, 4           ; <uint> [#uses=0]
OLD:    %tmp.20.i2 = rem int %tmp.5.i.i, 4              ; <int> [#uses=1]
NEW:    %tmp.20.i2 = rem uint %tmp.5.i.i, 4             ; <uint> [#uses=0]

it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61)
so the payoff isn't as great.

llvm-svn: 24189
2005-11-05 07:28:37 +00:00
Andrew Lenharth 662295587d make this 64 bit clean, fixed test30 of /Regression/Transforms/InstCombine/add.ll
llvm-svn: 24158
2005-11-02 18:35:40 +00:00
Chris Lattner 09efd4e5b6 Limit the search depth of MaskedValueIsZero to 6 instructions, to avoid
bad cases.  This fixes Markus's second testcase in PR639, and should
seal it for good.

llvm-svn: 24123
2005-10-31 18:35:52 +00:00
Chris Lattner 27d351f159 This pass is now obsolete since all targets have moved to the SelectionDAG
infrastructure and the simple isels have been removed.

llvm-svn: 24090
2005-10-29 05:33:46 +00:00
Chris Lattner 8f663e8bbc Pull some code out into a function, give it the ability to see through +.
This allows us to turn code like malloc(4*x+4) -> malloc int, (x+1)

llvm-svn: 24081
2005-10-29 04:36:15 +00:00
Chris Lattner 8270c33606 Remove a special case, allowing the general case to handle it. No functionality
change.

llvm-svn: 24076
2005-10-29 03:19:53 +00:00
Chris Lattner b9d3ca5c3c Fix a bit of backwards logic that broke exptree and smg2000
llvm-svn: 24056
2005-10-28 16:27:35 +00:00
Chris Lattner c4f67e67d2 Do not sink any instruction with side effects, including vaarg. This fixes
PR640

llvm-svn: 24046
2005-10-27 17:13:11 +00:00
Chris Lattner c6372cca78 Fix typo
llvm-svn: 24033
2005-10-27 06:26:26 +00:00
Chris Lattner 0fe7551bc0 Teach instcombine to promote stuff like (cast (malloc sbyte, 8*X) to int*)
into: malloc int, (2*X)

llvm-svn: 24032
2005-10-27 06:24:46 +00:00
Chris Lattner b3ecf96900 Promote cases like cast (malloc sbyte, 100) to int* into
(malloc [25 x int]) directly without having to convert to
(malloc [100 x sbyte]) first.

llvm-svn: 24031
2005-10-27 06:12:00 +00:00
Chris Lattner bb17180a23 Minor change to this file to support obscure cases with constant array amounts
llvm-svn: 24030
2005-10-27 05:53:56 +00:00
Chris Lattner 38a1b00a0f fold nested and's early to avoid inefficiencies in MaskedValueIsZero. This
fixes a very slow compile in PR639.

llvm-svn: 24011
2005-10-26 17:18:16 +00:00
Jeff Cohen 2b8cbf319c Update Visual Studio projects to reflect moved file.
llvm-svn: 23998
2005-10-26 05:36:51 +00:00
Chris Lattner 46705b2f2d Handle allocations that, even after removing dead uses, still have more than
one use (but one is a cast).  This handles the very common case of:

 X = alloc [n x byte]
 Y = cast X to somethingbetter
 seteq X, null

In order to avoid infinite looping when there are multiple casts, we only
allow this if the xform is strictly increasing the alignment of the
allocation.

llvm-svn: 23961
2005-10-24 06:35:18 +00:00
Chris Lattner 355ecc09f8 Fix a bug where we would 'promote' an allocation from one type to another
where the second has less alignment required.  If we had explicit alignment
support in the IR, we could handle this case, but we can't until we do.

llvm-svn: 23960
2005-10-24 06:26:18 +00:00
Chris Lattner ac87beb03a Before promoting a malloc type, remove dead uses. This makes instcombine
more effective at promoting these allocations, catching them earlier in the
compile process.

llvm-svn: 23959
2005-10-24 06:22:12 +00:00
Chris Lattner 216be91817 Pull some code out into a function, no functionality change
llvm-svn: 23958
2005-10-24 06:03:58 +00:00
Chris Lattner bde3845548 DONT_BUILD_RELINKED is gone and implied by BUILD_ARCHIVE now
llvm-svn: 23940
2005-10-24 02:26:13 +00:00
Chris Lattner 8c087e962c Only build .a file versions of these libraries, instead of .a and .o versions.
This should speed up build times.

llvm-svn: 23933
2005-10-24 01:59:48 +00:00
Chris Lattner bd77fac034 Make sure that anything using the ADCE pass pulls in the UnifyFunctionExitNodes
code

llvm-svn: 23931
2005-10-24 01:40:23 +00:00
Jeff Cohen 11e26b52b2 When a function takes a variable number of pointer arguments, with a zero
pointer marking the end of the list, the zero *must* be cast to the pointer
type.  An un-cast zero is a 32-bit int, and at least on x86_64, gcc will
not extend the zero to 64 bits, thus allowing the upper 32 bits to be
random junk.

The new END_WITH_NULL macro may be used to annotate a such a function
so that GCC (version 4 or newer) will detect the use of un-casted zero
at compile time.

llvm-svn: 23888
2005-10-23 04:37:20 +00:00
Chris Lattner 5df0e36e98 My previous patch was too conservative. Reject FP and void types, but do
allow pointer types.

llvm-svn: 23859
2005-10-21 05:45:41 +00:00
Chris Lattner 0c0b38bb4c Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an
inner loop like this:

LBB_RateConvertMono8AltiVec_2:  ; no_exit
        lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
        lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
        fmr f3, f3
        fadd f0, f2, f0
        fadd f3, f0, f3
        fcmpu cr0, f3, f1
        bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit

to an inner loop like this:

LBB_RateConvertMono8AltiVec_1:  ; no_exit
        fsub f2, f2, f1
        fcmpu cr0, f2, f1
        fmr f0, f2
        bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit

Doh! good catch!

llvm-svn: 23838
2005-10-20 04:47:10 +00:00
Chris Lattner da1b152c43 Make this work for FP constantexprs
llvm-svn: 23773
2005-10-17 20:18:38 +00:00
Chris Lattner 7fde91e365 Oops, X+0.0 isn't foldable, but X+-0.0 is.
llvm-svn: 23772
2005-10-17 17:56:38 +00:00
Chris Lattner 32979336a7 relax this a bit, as we only support the default rounding mode
llvm-svn: 23771
2005-10-17 17:49:32 +00:00
Chris Lattner 192cd18f53 Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling
out CSE's of base expressions it could build a result whose order was
nondet.

llvm-svn: 23698
2005-10-11 18:41:04 +00:00
Chris Lattner 5c9d63da31 Fix another problem where LSR was being nondeterminstic. Also remove elements
from the end of a vector instead of the beginning

llvm-svn: 23697
2005-10-11 18:30:57 +00:00
Chris Lattner b7a3894e7c Fix another lsr-is-nondeterministic case
llvm-svn: 23695
2005-10-11 18:17:57 +00:00
Chris Lattner 03b9eb506c Make MaskedValueIsZero a bit more aggressive
llvm-svn: 23677
2005-10-09 22:08:50 +00:00
Chris Lattner 62010c450f Fix funky xcode indentation
llvm-svn: 23674
2005-10-09 06:36:35 +00:00
Chris Lattner eb4be8b942 Hrm, you didn't see this.
llvm-svn: 23673
2005-10-09 06:24:02 +00:00
Chris Lattner 4ea0a3eaac Fix a source of non-determinism in the backend: the order of processing
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.

llvm-svn: 23672
2005-10-09 06:20:55 +00:00
Jeff Cohen 572910c9a2 Remove useless variable.
llvm-svn: 23656
2005-10-07 05:28:29 +00:00
Chris Lattner f07a587c79 Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In
particular, it should realize that phi's use their values in the pred block
not the phi block itself.  This change turns our em3d loop from this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_6    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
        addi r3, r2, 1
        blr
LBB_test_6:     ; loopexit
        or r3, r2, r2
        blr

into:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r6, r6
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        or r2, r6, r6
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r2, r2
        blr


Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow.  If it were doing its job right, it could turn the
code into this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r6, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r6, r6
        blr

... which I'll work on next. :)

llvm-svn: 23604
2005-10-03 02:50:05 +00:00
Chris Lattner e4ed42a426 Refactor some code into a function
llvm-svn: 23603
2005-10-03 01:04:44 +00:00
Chris Lattner 360928dbed This break is bogus and I have no idea why it was there. Basically it prevents
memoizing code when IV's are used by phinodes outside of loops.  In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):

        li r6, 0
        or r7, r6, r6
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r7, r7
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r2, r7, 1
        addi r7, r7, 1
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

Now we get:

        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

this was noticed in em3d.

llvm-svn: 23602
2005-10-03 00:37:33 +00:00
Chris Lattner 8fcce170cf when checking if we should move a split edge block outside of a loop,
check the presplit pred, not the post-split pred.  This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.

llvm-svn: 23601
2005-10-03 00:31:52 +00:00
Jeff Cohen f8a5e5ae6e Fix VC++ warnings.
llvm-svn: 23579
2005-10-01 03:57:14 +00:00
Chris Lattner a554c9470b Insert stores after phi nodes in the normal dest. This fixes
LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 23525
2005-09-29 17:44:20 +00:00
Chris Lattner 3b63bb375c add a note about a way to improve this code further, that I won't be getting
to right now.

llvm-svn: 23485
2005-09-27 22:44:59 +00:00
Chris Lattner e285f5ed8f Avoid spilling stack slots... to stack slots.
llvm-svn: 23478
2005-09-27 21:33:12 +00:00
Chris Lattner 87eb249300 Completely rewrite 'correct' eh support. This changes how setjmp insertion
is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function.  This patch has the following perks:

1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
   setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
   forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
   now about 4% faster than GCC.

Further improvements are also possible.

llvm-svn: 23477
2005-09-27 21:18:17 +00:00
Chris Lattner 92233d2175 Make the pass name simpler
llvm-svn: 23476
2005-09-27 21:10:32 +00:00
Chris Lattner 02ae21e1e0 Eliminate GetGEPGlobalInitializer in favor of the more powerful
ConstantFoldLoadThroughGEPConstantExpr function in the utils lib.

llvm-svn: 23446
2005-09-26 05:28:52 +00:00
Chris Lattner 0b011ec8e2 Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils
as ConstantFoldLoadThroughGEPConstantExpr.

llvm-svn: 23445
2005-09-26 05:28:06 +00:00
Chris Lattner 0b3557f54a Move MaskedValueIsZero up.
Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll

llvm-svn: 23428
2005-09-24 23:43:33 +00:00
Chris Lattner b4b2530a1a Refactor this code a bit and make it more general. This now compiles:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) { b.j += x; }

To:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        slwi r3, r3, 6
        add r3, r4, r3
        rlwimi r3, r4, 0, 26, 14
        stw r3, 0(r2)
        blr


instead of:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 26, 21, 31
        add r3, r5, r3
        rlwimi r4, r3, 6, 15, 25
        stw r4, 0(r2)
        blr

by eliminating an 'and'.

I'm pretty sure this is as small as we can go :)

llvm-svn: 23386
2005-09-18 07:22:02 +00:00
Chris Lattner 797dee7705 Compile
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) {
  b.j += x;
}

to:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        and %ECX, 131008
        mov %EDX, DWORD PTR [%ESP + 4]
        shl %EDX, 6
        add %EDX, %ECX
        and %EDX, 131008
        and %EAX, -131009
        or %EDX, %EAX
        mov DWORD PTR [b], %EDX
        ret

instead of:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        shr %ECX, 6
        and %ECX, 2047
        add %ECX, DWORD PTR [%ESP + 4]
        shl %ECX, 6
        and %ECX, 131008
        and %EAX, -131009
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23385
2005-09-18 06:30:59 +00:00
Chris Lattner 01f56c68e9 Generalize this transform, using MaskedValueIsZero, allowing us to compile:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) { b.k += x; }

To:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        add DWORD PTR [b], %EAX
        ret

instead of:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        mov %ECX, DWORD PTR [b]
        add %EAX, %ECX
        and %EAX, -131072
        and %ECX, 131071
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23384
2005-09-18 06:02:59 +00:00
Chris Lattner 4ebc8ab4e0 fix typeo
llvm-svn: 23383
2005-09-18 05:25:20 +00:00
Chris Lattner e5b23a6d67 Remove unintentionally committed code
llvm-svn: 23382
2005-09-18 05:12:51 +00:00
Chris Lattner 27cb9dbd35 implement shift.ll:test25. This compiles:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) {
  b.k += x;
}

to:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r3, 0(r2)
        rlwinm r4, r3, 0, 0, 14
        add r4, r4, r3
        rlwimi r4, r3, 0, 15, 31
        stw r4, 0(r2)
        blr

instead of:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        srwi r5, r4, 17
        add r3, r5, r3
        slwi r3, r3, 17
        rlwimi r3, r4, 0, 15, 31
        stw r3, 0(r2)
        blr

llvm-svn: 23381
2005-09-18 05:12:10 +00:00
Chris Lattner af517574ce Implement add.ll:test29. Codegening:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus1 (unsigned int x) {
  b.i += x;
}

as:
_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        add r3, r4, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

instead of:

_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 0, 26, 31
        add r3, r5, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

llvm-svn: 23379
2005-09-18 04:24:45 +00:00
Chris Lattner 027eaf01cf remove debug output
llvm-svn: 23377
2005-09-18 03:50:25 +00:00
Chris Lattner 1521298993 Implement or.ll:test21. This teaches instcombine to be able to turn this:
struct {
   unsigned int bit0:1;
   unsigned int ubyte:31;
} sdata;

void foo() {
  sdata.ubyte++;
}

into this:

foo:
        add DWORD PTR [sdata], 2
        ret

instead of this:

foo:
        mov %EAX, DWORD PTR [sdata]
        mov %ECX, %EAX
        add %ECX, 2
        and %ECX, -2
        and %EAX, 1
        or %EAX, %ECX
        mov DWORD PTR [sdata], %EAX
        ret

llvm-svn: 23376
2005-09-18 03:42:07 +00:00
Chris Lattner a393e4d4b3 Fix the regression last night compiling povray
llvm-svn: 23348
2005-09-14 17:32:56 +00:00
Chris Lattner 2a8932960d Add a simple xform to simplify array accesses with casts in the way.
This is useful for 178.galgel where resolution of dope vectors (by the
optimizer) causes the scales to become apparent.

llvm-svn: 23328
2005-09-13 18:36:04 +00:00
Chris Lattner fd018c8dfe Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.
This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.

llvm-svn: 23327
2005-09-13 02:09:55 +00:00
Chris Lattner 567b81f0d2 Add a helper function, allowing us to simplify some code a bit, changing
indentation, no functionality change

llvm-svn: 23325
2005-09-13 00:40:14 +00:00
Chris Lattner 219175c84d Implement a simple xform to turn code like this:
if () { store A -> P; } else { store B -> P; }

into a PHI node with one store, in the most trival case.  This implements
load.ll:test10.

llvm-svn: 23324
2005-09-12 23:23:25 +00:00
Chris Lattner e0bfdf1485 Another load-peephole optimization: do gcse when two loads are next to
each other.  This implements InstCombine/load.ll:test9

llvm-svn: 23322
2005-09-12 22:21:03 +00:00
Chris Lattner b990f7d8ed Implement a trivial form of store->load forwarding where the store and the
load are exactly consequtive.  This is picked up by other passes, but this
triggers thousands of times in fortran programs that use static locals
(and is thus a compile-time speedup).

llvm-svn: 23320
2005-09-12 22:00:15 +00:00
Chris Lattner 8048b85e8f Fix a regression from last night, which caused this pass to create invalid
code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.

Also use a new LoopInfo method to simplify some code.

This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll

llvm-svn: 23318
2005-09-12 17:11:27 +00:00
Chris Lattner a67648396a _test:
li r2, 0
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r2, 1
        stw r2, 0(r4)
        blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):

_test:
        li r2, 0                 **** IV starts at 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2            **** Copy for loop exit
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2           **** IV+2
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2       ****  IV+2
        stw r2, 0(r4)
        blr

And now generated code like this:

_test:
        li r2, 1               *** IV starts at 1
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701     *** IV.postinc + 0
        blt cr0, LBB_test_1
LBB_test_2:     ; loopexit.2.loopexit
        stw r2, 0(r4)          *** IV.postinc + 0
        blr

llvm-svn: 23313
2005-09-12 06:04:47 +00:00
Chris Lattner 530fe6ab30 implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.
We used to emit this code for it:

_test:
        li r2, 1     ;; Value tying up a register for the whole loop
        li r5, 0
LBB_test_1:     ; no_exit.2
        or r6, r5, r5
        li r5, 0
        stw r5, 0(r3)
        addi r5, r6, 1
        addi r3, r3, 4
        add r7, r2, r5  ;; should be addi r7, r5, 1
        cmpwi cr0, r7, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r6, 2
        stw r2, 0(r4)
        blr

now we emit this:

_test:
        li r2, 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2   ;; whoa, fold those adds!
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2
        stw r2, 0(r4)
        blr

more improvement coming.

llvm-svn: 23306
2005-09-10 01:18:45 +00:00
Chris Lattner b5e381a8cf Fix a problem that Dan Berlin noticed, where reassociation would not succeed
in building maximal expressions before simplifying them.  In particular, i
cases like this:

X-(A+B+X)

the code would consider A+B+X to be a maximal expression (not understanding
that the single use '-' would be turned into a + later), simplify it (a noop)
then later get simplified again.

Each of these simplify steps is where the cost of reassociation comes from,
so this patch should speed up the already fast pass a bit.

Thanks to Dan for noticing this!

llvm-svn: 23214
2005-09-02 07:07:58 +00:00
Chris Lattner 9fe263aa75 Avoid creating garbage instructions, just move the old add instruction
to where we need it when converting -(A+B+C) -> -A + -B + -C.

llvm-svn: 23213
2005-09-02 06:38:04 +00:00
Chris Lattner d1325da091 add some assertions and fix problems where reassociate could access the
Ops vector out of range

llvm-svn: 23211
2005-09-02 05:23:22 +00:00
Chris Lattner 8ca5b2a6d2 Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll
llvm-svn: 23019
2005-08-24 17:55:32 +00:00
Chris Lattner ea7dfd53d6 Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash
on 177.mesa

llvm-svn: 22843
2005-08-17 21:22:41 +00:00
Chris Lattner 2bf7cb5213 Use a new helper to split critical edges, making the code simpler.
Do not claim to not change the CFG.  We do change the cfg to split critical
edges.  This isn't causing us a problem now, but could likely do so in the
future.

llvm-svn: 22824
2005-08-17 06:35:16 +00:00
Chris Lattner 5cf983ee0f Fix a bad case in gzip where we put lots of things in registers across the
loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability

llvm-svn: 22798
2005-08-16 00:38:11 +00:00
Chris Lattner 47d3ec3525 Ooops, don't forget to clear this. The real inner loop is now:
.LBB_foo_3:     ; no_exit.1
        lfd f2, 0(r9)
        lfd f3, 8(r9)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r9)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfd f2, 0(r9)
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner 5949d49032 Recursively scan scev expressions for common subexpressions. This allows us
to handle nested loops much better, for example, by being able to tell that
these two expressions:

{( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>

{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

Have the following common part that can be shared:
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

This allows us to codegen an important inner loop in 168.wupwise as:

.LBB_foo_4:     ; no_exit.1
        lfd f2, 16(r9)
        fmul f3, f0, f2
        fmul f2, f1, f2
        fadd f4, f3, f2
        stfd f4, 8(r9)
        fsub f2, f3, f2
        stfd f2, 16(r9)
        addi r8, r8, 1
        addi r9, r9, 16
        cmpw cr0, r8, r4
        ble .LBB_foo_4  ; no_exit.1

instead of:

.LBB_foo_3:     ; no_exit.1
        lfdx f2, r6, r9
        add r10, r6, r9
        lfd f3, 8(r10)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r10)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfdx f2, r6, r9
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Chris Lattner 79396539d3 remove dead code. The exit block list is computed on demand, thus does not
need to be updated.  This code is a relic from when it did.

llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner 8447b49526 When splitting critical edges, make sure not to leave the new block in the
middle of the loop.  This turns a critical loop in gzip into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

instead of this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_3:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
        add r2, r11, r27
        add r8, r12, r27
        b .LBB_test_9   ; loopexit
.LBB_test_5:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
        add r2, r9, r27
        add r8, r10, r27
        b .LBB_test_9   ; loopexit
.LBB_test_7:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

Next up, improve the code for the loop.

llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner 4fec86d348 Fix a FIXME: if we are inserting code for a PHI argument, split the critical
edge so that the code is not always executed for both operands.  This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops).  On gzip, for
example, we turn this ugly code:

.LBB_test_1:    ; loopentry
        add r27, r3, r28
        lhz r27, 3(r27)
        add r26, r4, r28
        lhz r26, 3(r26)
        add r25, r30, r28    ;; Only live if exiting the loop
        add r24, r29, r28    ;; Only live if exiting the loop
        cmpw cr0, r27, r26
        bne .LBB_test_5 ; loopexit

into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_2:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_t_3:    ; shortcirc_next.0
.LBB_test_3:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


Next step: get the block out of the loop so that the loop is all
fall-throughs again.

llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner 62df798919 remove some trickiness that broke yacr2 and some other programs last night
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner f83ce5faee Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
into just Y.  This often occurs when it seperates loops that have collapsed loop
headers.  This implements LoopSimplify/phi-node-simplify.ll

llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner 677d85784a Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
constant stride.  This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll

llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner edff91a49a Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
For code like this:

void foo(float *a, float *b, int n, int stride_a, int stride_b) {
  int i;
  for (i=0; i<n; i++)
      a[i*stride_a] = b[i*stride_b];
}

we now emit:

.LBB_foo2_2:    ; no_exit
        lfs f0, 0(r4)
        stfs f0, 0(r3)
        addi r7, r7, 1
        add r4, r2, r4
        add r3, r6, r3
        cmpw cr0, r7, r5
        blt .LBB_foo2_2 ; no_exit

instead of:

.LBB_foo_2:     ; no_exit
        mullw r8, r2, r7     ;; multiply!
        slwi r8, r8, 2
        lfsx f0, r4, r8
        mullw r8, r2, r6     ;; multiply!
        slwi r8, r8, 2
        stfsx f0, r3, r8
        addi r2, r2, 1
        cmpw cr0, r2, r5
        blt .LBB_foo_2  ; no_exit

loops with variable strides occur pretty often.  For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.

Now we can allow indvars to turn functions written like this:

void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
  int i, ai = 0, bi = 0;
  for (i=0; i<n; i++)
    {
      a[ai] = b[bi];
      ai += stride_a;
      bi += stride_b;
    }
}

into code like the above for better analysis.  With this patch, they generate
identical code.

llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner dde7dc525e Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
by being more careful about updating PHI nodes

llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner c6c4d99a21 Fix some 80 column violations.
Once we compute the evolution for a GEP, tell SE about it.  This allows users
of the GEP to know it, if the users are not direct.  This allows us to compile
this testcase:

void fbSolidFillmmx(int w, unsigned char *d) {
    while (w >= 64) {
        *(unsigned long long *) (d +  0) = 0;
        *(unsigned long long *) (d +  8) = 0;
        *(unsigned long long *) (d + 16) = 0;
        *(unsigned long long *) (d + 24) = 0;
        *(unsigned long long *) (d + 32) = 0;
        *(unsigned long long *) (d + 40) = 0;
        *(unsigned long long *) (d + 48) = 0;
        *(unsigned long long *) (d + 56) = 0;
        w -= 64;
        d += 64;
    }
}

into:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r2, 0
        stw r2, 0(r4)
        stw r2, 4(r4)
        stw r2, 8(r4)
        stw r2, 12(r4)
        stw r2, 16(r4)
        stw r2, 20(r4)
        stw r2, 24(r4)
        stw r2, 28(r4)
        stw r2, 32(r4)
        stw r2, 36(r4)
        stw r2, 40(r4)
        stw r2, 44(r4)
        stw r2, 48(r4)
        stw r2, 52(r4)
        stw r2, 56(r4)
        stw r2, 60(r4)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

instead of:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r11, 0
        stw r11, 0(r4)
        stw r11, 4(r4)
        stwx r11, r10, r4
        add r12, r10, r4
        stw r11, 4(r12)
        stwx r11, r9, r4
        add r12, r9, r4
        stw r11, 4(r12)
        stwx r11, r8, r4
        add r12, r8, r4
        stw r11, 4(r12)
        stwx r11, r7, r4
        add r12, r7, r4
        stw r11, 4(r12)
        stwx r11, r6, r4
        add r12, r6, r4
        stw r11, 4(r12)
        stwx r11, r5, r4
        add r12, r5, r4
        stw r11, 4(r12)
        stwx r11, r2, r4
        add r12, r2, r4
        stw r11, 4(r12)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

llvm-svn: 22737
2005-08-09 23:39:36 +00:00
Chris Lattner 02742710f3 SCEVAddExpr::get() of an empty list is invalid.
llvm-svn: 22724
2005-08-09 01:13:47 +00:00
Chris Lattner a091ff1764 Implement: LoopStrengthReduce/share_ivs.ll
Two changes:
  * Only insert one PHI node for each stride.  Other values are live in
    values.  This cannot introduce higher register pressure than the
    previous approach, and can take advantage of reg+reg addressing modes.
  * Factor common base values out of uses before moving values from the
    base to the immediate fields.  This improves codegen by starting the
    stride-specific PHI node out at a common place for each IV use.

As an example, we used to generate this for a loop in swim:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfd f0, 0(r8)
        stfd f0, 0(r3)
        lfd f0, 0(r6)
        stfd f0, 0(r7)
        lfd f0, 0(r2)
        stfd f0, 0(r5)
        addi r9, r9, 1
        addi r2, r2, 8
        addi r5, r5, 8
        addi r6, r6, 8
        addi r7, r7, 8
        addi r8, r8, 8
        addi r3, r3, 8
        cmpw cr0, r9, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

now we emit:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfdx f0, r8, r2
        stfdx f0, r9, r2
        lfdx f0, r5, r2
        stfdx f0, r7, r2
        lfdx f0, r3, r2
        stfdx f0, r6, r2
        addi r10, r10, 1
        addi r2, r2, 8
        cmpw cr0, r10, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

As another more dramatic example, we used to emit this:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfd f0, 8(r21)
        lfd f4, 8(r3)
        lfd f5, 8(r27)
        lfd f6, 8(r22)
        lfd f7, 8(r5)
        lfd f8, 8(r6)
        lfd f9, 8(r30)
        lfd f10, 8(r11)
        lfd f11, 8(r12)
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfd f0, 8(r4)
        lfd f0, 8(r25)
        lfd f5, 8(r26)
        lfd f6, 8(r23)
        lfd f9, 8(r28)
        lfd f10, 8(r10)
        lfd f12, 8(r9)
        lfd f13, 8(r29)
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfd f0, 8(r24)
        lfd f0, 8(r8)
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfd f0, 8(r2)
        addi r20, r20, 1
        addi r2, r2, 8
        addi r8, r8, 8
        addi r10, r10, 8
        addi r12, r12, 8
        addi r6, r6, 8
        addi r29, r29, 8
        addi r28, r28, 8
        addi r26, r26, 8
        addi r25, r25, 8
        addi r24, r24, 8
        addi r5, r5, 8
        addi r23, r23, 8
        addi r22, r22, 8
        addi r3, r3, 8
        addi r9, r9, 8
        addi r11, r11, 8
        addi r30, r30, 8
        addi r27, r27, 8
        addi r21, r21, 8
        addi r4, r4, 8
        cmpw cr0, r20, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

we now emit:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfdx f0, r21, r20
        lfdx f4, r3, r20
        lfdx f5, r27, r20
        lfdx f6, r22, r20
        lfdx f7, r5, r20
        lfdx f8, r6, r20
        lfdx f9, r30, r20
        lfdx f10, r11, r20
        lfdx f11, r12, r20
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfdx f0, r4, r20
        lfdx f0, r25, r20
        lfdx f5, r26, r20
        lfdx f6, r23, r20
        lfdx f9, r28, r20
        lfdx f10, r10, r20
        lfdx f12, r9, r20
        lfdx f13, r29, r20
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfdx f0, r24, r20
        lfdx f0, r8, r20
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfdx f0, r2, r20
        addi r19, r19, 1
        addi r20, r20, 8
        cmpw cr0, r19, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

llvm-svn: 22722
2005-08-09 00:18:09 +00:00
Chris Lattner 37c24cc98c Suck the base value out of the UsersToProcess vector into the BasedUser
class to simplify the code.  Fuse two loops.

llvm-svn: 22721
2005-08-08 22:56:21 +00:00
Chris Lattner 37ed895bf1 Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The
first is a correctness thing, and the later is an optzn thing.  This also
is needed to support a future change.

llvm-svn: 22720
2005-08-08 22:32:34 +00:00
Chris Lattner 9f269e40c9 Use the new 'moveBefore' method to simplify some code. Really, which is
easier to understand?  :)

llvm-svn: 22706
2005-08-08 19:11:57 +00:00
Chris Lattner 14203e85b2 Not all constants are legal immediates in load/store instructions.
llvm-svn: 22704
2005-08-08 06:25:50 +00:00
Chris Lattner c70bbc0c41 Implement LoopStrengthReduce/share_code_in_preheader.ll by having one
rewriter for all code inserted into the preheader, which is never flushed.

llvm-svn: 22702
2005-08-08 05:47:49 +00:00
Chris Lattner 9bfa6f8784 Implement a simple optimization for the termination condition of the loop.
The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.

On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:

_foo:
        li r2, 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        cmpw cr0, r2, r4
        bne .LBB_foo_1  ; no_exit
        blr

instead of:

_foo:
        li r2, 1                ;; IV starts at 1, not 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r5, r2, 1
        cmpw cr0, r2, r4
        or r2, r5, r5           ;; Reg-reg copy, extra live range
        bne .LBB_foo_1  ; no_exit
        blr

This implements LoopStrengthReduce/exit_compare_live_range.ll

llvm-svn: 22699
2005-08-08 05:28:22 +00:00
Chris Lattner 2c14cf7b74 Add some simple folds that occur in bitfield cases. Fix a minor bug in
isHighOnes, where it would consider 0 to have high ones.

llvm-svn: 22693
2005-08-07 07:03:10 +00:00
Chris Lattner 134ebd0801 Fix typoCVS: ----------------------------------------------------------------------
llvm-svn: 22692
2005-08-07 07:00:52 +00:00
Chris Lattner f4dd8c445c * Use the new PHINode::hasConstantValue method to simplify some code
* Teach this code to move allocas out of the loop when tail call eliminating
  a call marked 'tail'.  This implements TailCallElim/move_alloca_for_tail_call.ll
* Do not perform this transformation if a call is marked 'tail' and if there
  are allocas that we cannot move out of the loop in #2.  Doing so would increase
  the stack usage of the function.  This implements fixes
  PR615 and TailCallElim/dont-tce-tail-marked-call.ll.

llvm-svn: 22690
2005-08-07 04:27:41 +00:00
Chris Lattner 11e7a5eda7 Make sure to clean CastedPointers after casts are potentially deleted.
This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas

llvm-svn: 22673
2005-08-05 01:30:11 +00:00
Chris Lattner 9f9c260b8c now that hasConstantValue defaults to only returning values that dominate
the PHI node, this ugly code can vanish.

llvm-svn: 22672
2005-08-05 01:04:30 +00:00
Chris Lattner 257efb2ad3 This code can handle non-dominating instructions
llvm-svn: 22667
2005-08-05 00:57:45 +00:00
Nate Begeman b392321cae Fix a fixme in CondPropagate.cpp by moving a PhiNode optimization into
BasicBlock's removePredecessor routine.  This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp

llvm-svn: 22664
2005-08-04 23:24:19 +00:00
Chris Lattner 45f8b6e7aa Modify how immediates are removed from base expressions to deal with the fact
that the symbolic evaluator is not always able to use subtraction to remove
expressions.  This makes the code faster, and fixes the last crash on 178.galgel.
Finally, add a statistic to see how many phi nodes are inserted.

On 178.galgel, we get the follow stats:

2562 loop-reduce  - Number of PHIs inserted
3927 loop-reduce  - Number of GEPs strength reduced

llvm-svn: 22662
2005-08-04 22:34:05 +00:00
Chris Lattner a6d7c355bc * Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase
method.
* Fix a crash on 178.galgel, where we would insert expressions before PHI
  nodes instead of into the PHI node predecessor blocks.

llvm-svn: 22657
2005-08-04 20:03:32 +00:00
Chris Lattner 0f7c0fa2a7 Fix a case that caused this to crash on 178.galgel
llvm-svn: 22653
2005-08-04 19:26:19 +00:00
Chris Lattner acc42c4df1 Teach LSR about loop-variant expressions, such as loops like this:
for (i = 0; i < N; ++i)
    A[i][foo()] = 0;

here we still want to strength reduce the A[i] part, even though foo() is
l-v.

This also simplifies some of the 'CanReduce' logic.

This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll

llvm-svn: 22652
2005-08-04 19:08:16 +00:00
Nate Begeman 456044b724 Remove some more dead code.
llvm-svn: 22650
2005-08-04 18:13:56 +00:00
Chris Lattner eaf24725b2 Refactor this code substantially with the following improvements:
1. We only analyze instructions once, guaranteed
  2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
     something much simpler.

The next step is to handle expressions that are not all indvar+loop-invariant
values (e.g. handling indvar+loopvariant).

llvm-svn: 22649
2005-08-04 17:40:30 +00:00
Chris Lattner 6f286b760f refactor some code
llvm-svn: 22643
2005-08-04 01:19:13 +00:00
Chris Lattner 6510749050 invert to if's to make the logic simpler
llvm-svn: 22641
2005-08-04 00:40:47 +00:00
Chris Lattner a0102fbc4f When processing outer loops and we find uses of an IV in inner loops, make
sure to handle the use, just don't recurse into it.

This permits us to generate this code for a simple nested loop case:

.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r29, 44(r1)
        stw r30, 40(r1)
        mflr r11
        stw r11, 56(r1)
        lis r2, ha16(L_A$non_lazy_ptr)
        lwz r30, lo16(L_A$non_lazy_ptr)(r2)
        li r29, 1
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        li r2, 1
        or r3, r30, r30
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r3)
        stfd f0, 0(r3)
        addi r4, r2, 1
        addi r3, r3, 8
        cmpwi cr0, r2, 100
        or r2, r4, r4
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r30, r30, 800
        addi r2, r29, 1
        cmpwi cr0, r29, 100
        or r29, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 40(r1)
        lwz r29, 44(r1)
        lwz r1, 0(r1)
        blr

instead of this:

_foo:
.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r28, 44(r1)                   ;; uses an extra register.
        stw r29, 40(r1)
        stw r30, 36(r1)
        mflr r11
        stw r11, 56(r1)
        li r30, 1
        li r29, 0
        or r28, r29, r29
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        mulli r2, r28, 800           ;; unstrength-reduced multiply
        lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        mulli r4, r29, 800           ;; unstrength-reduced multiply
        addi r3, r3, 8
        add r3, r4, r3
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8                 ;; multiple stride 8 IV's
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r28, r28, 1               ;;; Many IV's with stride 1
        addi r29, r29, 1
        addi r2, r30, 1
        cmpwi cr0, r30, 100
        or r30, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 36(r1)
        lwz r29, 40(r1)
        lwz r28, 44(r1)
        lwz r1, 0(r1)
        blr

llvm-svn: 22640
2005-08-04 00:14:11 +00:00
Chris Lattner fc62470466 Teach loop-reduce to see into nested loops, to pull out immediate values
pushed down by SCEV.

In a nested loop case, this allows us to emit this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        li r3, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
        stfd f0, 0(r2)
        addi r4, r3, 1
        addi r2, r2, 8
        cmpwi cr0, r3, 100
        or r3, r4, r4
        bne .LBB_foo_2  ; no_exit.1

instead of this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        addi r3, r3, 8
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1

llvm-svn: 22639
2005-08-03 23:44:42 +00:00
Chris Lattner bb78c97e24 improve debug output
llvm-svn: 22638
2005-08-03 23:30:08 +00:00
Chris Lattner db23c74e5e Move from Stage 0 to Stage 1.
Only emit one PHI node for IV uses with identical bases and strides (after
moving foldable immediates to the load/store instruction).

This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
us to generate this PPC code for test1:

        or r30, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r30)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

instead of this code:

        or r30, r3, r3
        or r29, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r29)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8        ;; Two iv's with step of 8
        addi r29, r29, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

llvm-svn: 22635
2005-08-03 22:51:21 +00:00
Chris Lattner 430d0022df Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to
unify some parallel vectors and get field names more descriptive than
"first" and "second".  This isn't lisp afterall :)

llvm-svn: 22633
2005-08-03 22:21:05 +00:00
Chris Lattner 84e9baa925 Fix a nasty dangling pointer issue. The ScalarEvolution pass would keep a
map from instruction* to SCEVHandles.  When we delete instructions, we have
to tell it about it.  We would run into nasty cases where new instructions
were reallocated at old instruction addresses and get the old map values.
Bad bad bad :(

llvm-svn: 22632
2005-08-03 21:36:09 +00:00
Chris Lattner 3de05cc930 The correct fix for PR612, which also fixes
Transforms/LowerInvoke/2005-08-03-InvokeWithPHIUse.ll

llvm-svn: 22628
2005-08-03 18:51:44 +00:00
Chris Lattner f8a81a9886 When inserting code, make sure not to insert it before PHI nodes. This
fixes PR612 and Transforms/LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 22626
2005-08-03 18:34:29 +00:00
Chris Lattner 22d00a8e90 Update to use the new MathExtras.h support for log2 computation.
Patch contributed by Jim Laskey!

llvm-svn: 22592
2005-08-02 19:16:58 +00:00
Chris Lattner 351b891cbc Like the comment says, do not insert cast instructions before phi nodes
llvm-svn: 22586
2005-08-02 03:31:14 +00:00
Chris Lattner 75a44e154e add a comment, make a check more lenient
llvm-svn: 22581
2005-08-02 02:52:02 +00:00
Chris Lattner dcce49e006 Simplify for loop, clear a per-loop map after processing each loop
llvm-svn: 22580
2005-08-02 02:44:31 +00:00
Chris Lattner 9ef1294210 Add a comment
Make LSR ignore GEP's that have loop variant base values, as we currently
cannot codegen them

llvm-svn: 22576
2005-08-02 01:32:29 +00:00
Chris Lattner 564900e5e5 Fix an iterator invalidation problem
llvm-svn: 22575
2005-08-02 00:41:11 +00:00
Jeff Cohen 546fd5944e Keep tabs and trailing spaces out.
llvm-svn: 22565
2005-07-30 18:33:25 +00:00
Jeff Cohen c500991055 Fix VC++ build problems.
llvm-svn: 22564
2005-07-30 18:22:27 +00:00
Nate Begeman 17a0e2afea Ack, typo
llvm-svn: 22560
2005-07-30 00:21:31 +00:00
Nate Begeman e68bcd1946 Commit a new LoopStrengthReduce pass that can use scalar evolutions and
target data to decide which loop induction variables to strength reduce
and how to do so.  This work is mostly by Chris Lattner, with tweaks by
me to get it working on some of MultiSource.

llvm-svn: 22558
2005-07-30 00:15:07 +00:00
Nate Begeman 2bca4d9b7b Break SCEVExpander out of IndVarSimplify into its own .h/.cpp file so that
other passes may use it.

llvm-svn: 22557
2005-07-30 00:12:19 +00:00
Jeff Cohen 5f4ef3c5a8 Eliminate all remaining tabs and trailing spaces.
llvm-svn: 22523
2005-07-27 06:12:32 +00:00
Chris Lattner 18aa4d8196 Do not let MaskedValueIsZero consider undef to be zero, for reasons
explained in the comment.

This fixes UnitTests/2003-09-18-BitFieldTest on darwin

llvm-svn: 22483
2005-07-20 18:49:28 +00:00
Chris Lattner 247aef884c When transforming &A[i] < &A[j] -> i < j, make sure to perform the comparison
as a signed compare.  This patch may fix PR597, but is correct in any case.

llvm-svn: 22465
2005-07-18 23:07:33 +00:00
Chris Lattner 4ed40f7c6f Fix a problem that instcombine would hit when dealing with unreachable code.
Because the instcombine has to scan the entire function when it starts up
to begin with, we might as well do it in DFO so we can nuke unreachable code.

This fixes: Transforms/InstCombine/2005-07-07-DeadPHILoop.ll

llvm-svn: 22348
2005-07-07 20:40:38 +00:00
Andrew Lenharth cf52eb2b99 prevent va_arg from being hoisted from a loop
llvm-svn: 22265
2005-06-20 13:36:33 +00:00
Andrew Lenharth 9144ec4764 core changes for varargs
llvm-svn: 22254
2005-06-18 18:34:52 +00:00
Reid Spencer 4fdd96c4e0 Clean up some uninitialized variables and missing return statements that
GCC 4.0.0 compiler (sometimes incorrectly) warns about under release build.

llvm-svn: 22249
2005-06-18 17:37:34 +00:00
Chris Lattner 2ceb6ee576 This is not true: (X != 13 | X < 15) -> X < 15
It is actually always true.  This fixes PR586 and
Transforms/InstCombine/2005-06-16-SetCCOrSetCCMiscompile.ll

llvm-svn: 22236
2005-06-17 03:59:17 +00:00
Chris Lattner 73bcba5f61 Don't crash when dealing with INTMIN. This fixes PR585 and
Transforms/InstCombine/2005-06-16-RangeCrash.ll

llvm-svn: 22234
2005-06-17 02:05:55 +00:00
Chris Lattner c53cb9d3ff avoid constructing out of range shift amounts.
llvm-svn: 22230
2005-06-17 01:29:28 +00:00
Chris Lattner 89dc4f16f5 Fix PR583 and testcase Transforms/InstCombine/2005-06-15-DivSelectCrash.ll
llvm-svn: 22227
2005-06-16 04:55:52 +00:00
Chris Lattner 252a845e30 Fix PR571, removing code that does just the WRONG thing :)
llvm-svn: 22225
2005-06-16 03:00:08 +00:00
Chris Lattner 104002bee3 Fix a bug in my previous patch. Do not get the shift amount type (which
is always ubyte, get the type being shifted).  This unbreaks espresso

llvm-svn: 22224
2005-06-16 01:52:07 +00:00
Chris Lattner df81539278 Fix PR582. The rewriter can move casts around, which invalidated the
BB iterator.  This fixes Transforms/IndVarsSimplify/2005-06-15-InstMoveCrash.ll

llvm-svn: 22221
2005-06-15 21:29:31 +00:00
Chris Lattner 19b57f55aa Fix PR577 and testcase InstCombine/2005-06-15-ShiftSetCCCrash.ll.
Do not perform undefined out of range shifts.

llvm-svn: 22217
2005-06-15 20:53:31 +00:00
Reid Spencer a299d6f701 Put the hack back in that removes features, causes regressions to fail, but
allows test programs to succeed. Actual fix for this is forthcoming.

llvm-svn: 22213
2005-06-15 18:25:30 +00:00
Reid Spencer 6d231e55fa Unbreak several InstCombine regression checks introduced by a hack to
fix the bzip2 test. A better hack is needed.

llvm-svn: 22209
2005-06-13 06:41:26 +00:00
Chris Lattner 1609a541cd Fix a 64-bit problem, passing (int)0 through ... instead of (void*)0
llvm-svn: 22206
2005-06-09 03:32:54 +00:00
Andrew Lenharth ffe65458e7 hack to fix bzip2 (bug 571)
llvm-svn: 22192
2005-06-04 12:43:56 +00:00
Chris Lattner 05c703ea85 preserve calling conventions when hacking on code
llvm-svn: 22024
2005-05-14 12:25:32 +00:00
Chris Lattner 61d9d81770 calling a function with the wrong CC is undefined, turn it into an unreachable
instruction.  This is useful for catching optimizers that don't preserve
calling conventions

llvm-svn: 21928
2005-05-13 07:09:09 +00:00
Chris Lattner ca968393ab When lowering invokes to calls, amke sure to preserve the calling conv. This
fixes Ptrdist/anagram with x86 llcbeta

llvm-svn: 21925
2005-05-13 06:27:02 +00:00
Chris Lattner ae186e012c Prefer int 0 instead of long 0 for GEP arguments.
llvm-svn: 21924
2005-05-13 06:10:12 +00:00
Chris Lattner 31c667e234 Fix Reassociate/shifttest.ll
llvm-svn: 21839
2005-05-10 03:39:25 +00:00
Chris Lattner bfc796f622 If a function contains no allocas, all of the calls in it are trivially
suitable for tail calls.

llvm-svn: 21836
2005-05-09 23:51:13 +00:00
Chris Lattner b62f5082c5 implement and.ll:test33
llvm-svn: 21809
2005-05-09 04:58:36 +00:00
Chris Lattner df3332660f Implement Reassociate/mul-neg-add.ll
llvm-svn: 21788
2005-05-08 21:41:35 +00:00
Chris Lattner c4f8e2b0ed Bail out earlier
llvm-svn: 21786
2005-05-08 21:33:47 +00:00
Chris Lattner 877b114037 Teach reassociate that 0-X === X*-1
llvm-svn: 21785
2005-05-08 21:28:52 +00:00
Chris Lattner 9f284e0a3c Fix PR557 and basictest[34].ll.
This makes reassociate realize that loads should be treated as unmovable, and
gives distinct ranks to distinct values defined in the same basic block, allowing
reassociate to do its thing.

llvm-svn: 21783
2005-05-08 20:57:04 +00:00
Chris Lattner 9187f3905e Add debugging information
llvm-svn: 21781
2005-05-08 20:09:57 +00:00
Chris Lattner 08582be283 eliminate gotos
llvm-svn: 21780
2005-05-08 19:48:43 +00:00
Chris Lattner 5847e5e10c Improve reassociation handling of inverses, implementing inverses.ll.
llvm-svn: 21778
2005-05-08 18:59:37 +00:00
Chris Lattner 4922118dc4 clean up and modernize this pass.
llvm-svn: 21776
2005-05-08 18:45:26 +00:00
Chris Lattner b18dbbfff5 Strength reduce SAR into SHR if there is no way sign bits could be shifted
in.  This tends to get cases like this:

  X = cast ubyte to int
  Y = shr int X, ...

Tested by: shift.ll:test24

llvm-svn: 21775
2005-05-08 17:34:56 +00:00
Chris Lattner e1850b86b6 Refactor some code
llvm-svn: 21772
2005-05-08 00:19:31 +00:00
Chris Lattner 6e2086d7e4 Handle some simple cases where we can see that values get annihilated.
llvm-svn: 21771
2005-05-08 00:08:33 +00:00
Chris Lattner 4294cec0f1 Fix a miscompilation of crafty by clobbering the "A" variable.
llvm-svn: 21770
2005-05-07 23:49:08 +00:00
Chris Lattner 1e5065052a Rewrite the guts of the reassociate pass to be more efficient and logical. Instead
of trying to do local reassociation tweaks at each level, only process an expression
tree once (at its root).  This does not improve the reassociation pass in any real way.

llvm-svn: 21768
2005-05-07 21:59:39 +00:00
Chris Lattner cea579932d Convert shifts to muls to assist reassociation. This implements
Reassociate/shifttest.ll

llvm-svn: 21761
2005-05-07 04:24:13 +00:00
Chris Lattner f43e974abd Simplify the code and rearrange it. No major functionality changes here.
llvm-svn: 21759
2005-05-07 04:08:02 +00:00
Chris Lattner 6aacb0f9da Preserve tail marker
llvm-svn: 21737
2005-05-06 06:48:21 +00:00
Chris Lattner ef298a3b8a Teach instcombine propagate zeroness through shl instructions, implementing
and.ll:test31

llvm-svn: 21717
2005-05-06 04:53:20 +00:00
Chris Lattner 873804168e Implement shift.ll:test23. If we are shifting right then immediately truncating
the result, turn signed shift rights into unsigned shift rights if possible.

This leads to later simplification and happens *often* in 176.gcc.  For example,
this testcase:

struct xxx { unsigned int code : 8; };
enum codes { A, B, C, D, E, F };
int foo(struct xxx *P) {
  if ((enum codes)P->code == A)
     bar();
}

used to be compiled to:

int %foo(%struct.xxx* %P) {
        %tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0           ; <uint*> [#uses=1]
        %tmp.2 = load uint* %tmp.1              ; <uint> [#uses=1]
        %tmp.3 = cast uint %tmp.2 to int                ; <int> [#uses=1]
        %tmp.4 = shl int %tmp.3, ubyte 24               ; <int> [#uses=1]
        %tmp.5 = shr int %tmp.4, ubyte 24               ; <int> [#uses=1]
        %tmp.6 = cast int %tmp.5 to sbyte               ; <sbyte> [#uses=1]
        %tmp.8 = seteq sbyte %tmp.6, 0          ; <bool> [#uses=1]
        br bool %tmp.8, label %then, label %UnifiedReturnBlock

Now it is compiled to:

        %tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0           ; <uint*> [#uses=1]
        %tmp.2 = load uint* %tmp.1              ; <uint> [#uses=1]
        %tmp.2 = cast uint %tmp.2 to sbyte              ; <sbyte> [#uses=1]
        %tmp.8 = seteq sbyte %tmp.2, 0          ; <bool> [#uses=1]
        br bool %tmp.8, label %then, label %UnifiedReturnBlock

which is the difference between this:

foo:
        subl $4, %esp
        movl 8(%esp), %eax
        movl (%eax), %eax
        shll $24, %eax
        sarl $24, %eax
        testb %al, %al
        jne .LBBfoo_2

and this:

foo:
        subl $4, %esp
        movl 8(%esp), %eax
        movl (%eax), %eax
        testb %al, %al
        jne .LBBfoo_2

This occurs 3243 times total in the External tests, 215x in povray,
6x in each f2c'd program, 1451x in 176.gcc, 7x in crafty, 20x in perl,
25x in gap, 3x in m88ksim, 25x in ijpeg.

Maybe this will cause a little jump on gcc tommorow :)

llvm-svn: 21715
2005-05-06 04:18:52 +00:00
Chris Lattner 7208616ec0 Implement xor.ll:test22
llvm-svn: 21713
2005-05-06 02:07:39 +00:00