Commit Graph

10777 Commits

Author SHA1 Message Date
Nate Begeman e5394d453d Fix last night's X86 regressions by putting code for SSE in the if(SSE)
block.  nur.

llvm-svn: 22788
2005-08-14 18:37:02 +00:00
Andrew Lenharth ed07233868 only build .a on alpha
llvm-svn: 22787
2005-08-14 15:14:34 +00:00
Nate Begeman 4d959f6627 Fix FP_TO_UINT with Scalar SSE2 now that the legalizer can handle it. We
now generate the relatively good code sequences:
unsigned short foo(float a) { return a; }
_foo:
        movss 4(%esp), %xmm0
        cvttss2si %xmm0, %eax
        movzwl %ax, %eax
        ret

and
unsigned bar(float a) { return a; }
_bar:
        movss .CPI_bar_0, %xmm0
        movss 4(%esp), %xmm1
        movapd %xmm1, %xmm2
        subss %xmm0, %xmm2
        cvttss2si %xmm2, %eax
        xorl $-2147483648, %eax
        cvttss2si %xmm1, %ecx
        ucomiss %xmm0, %xmm1
        cmovb %ecx, %eax
        ret

llvm-svn: 22786
2005-08-14 04:36:51 +00:00
Nate Begeman 36853ee1fd Teach the legalizer how to legalize FP_TO_UINT.
Teach the legalizer to promote FP_TO_UINT to FP_TO_SINT if the wider
  FP_TO_UINT is also illegal.  This allows us on PPC to codegen
  unsigned short foo(float a) { return a; }

as:
_foo:
.LBB_foo_0:     ; entry
        fctiwz f0, f1
        stfd f0, -8(r1)
        lwz r2, -4(r1)
        rlwinm r3, r2, 0, 16, 31
        blr

instead of:
_foo:
.LBB_foo_0:     ; entry
        fctiwz f0, f1
        stfd f0, -8(r1)
        lwz r2, -4(r1)
        lis r3, ha16(.CPI_foo_0)
        lfs f0, lo16(.CPI_foo_0)(r3)
        fcmpu cr0, f1, f0
        blt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        fsubs f0, f1, f0
        fctiwz f0, f0
        stfd f0, -16(r1)
        lwz r2, -12(r1)
        xoris r2, r2, 32768
.LBB_foo_2:     ; entry
        rlwinm r3, r2, 0, 16, 31
        blr

llvm-svn: 22785
2005-08-14 01:20:53 +00:00
Nate Begeman 83f6b98c42 Make FP_TO_UINT Illegal. This allows us to generate significantly better
codegen for FP_TO_UINT by using the legalizer's SELECT variant.

Implement a codegen improvement for SELECT_CC, selecting the false node in
the MBB that feeds the phi node.  This allows us to codegen:
void foo(int *a, int b, int c) { int d = (a < b) ? 5 : 9; *a = d; }
as:
_foo:
        li r2, 5
        cmpw cr0, r4, r3
        bgt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        li r2, 9
.LBB_foo_2:     ; entry
        stw r2, 0(r3)
        blr

insted of:
_foo:
        li r2, 5
        li r5, 9
        cmpw cr0, r4, r3
        bgt .LBB_foo_2  ; entry
.LBB_foo_1:     ; entry
        or r2, r5, r5
.LBB_foo_2:     ; entry
        stw r2, 0(r3)
        blr

llvm-svn: 22784
2005-08-14 01:17:16 +00:00
Andrew Lenharth 107a0a7690 Testing a variable before it is defined doesn't work so well. It is a fairly small thing, so just let everyone build the .a file
llvm-svn: 22783
2005-08-13 14:58:23 +00:00
Chris Lattner 47d3ec3525 Ooops, don't forget to clear this. The real inner loop is now:
.LBB_foo_3:     ; no_exit.1
        lfd f2, 0(r9)
        lfd f3, 8(r9)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r9)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfd f2, 0(r9)
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner 5949d49032 Recursively scan scev expressions for common subexpressions. This allows us
to handle nested loops much better, for example, by being able to tell that
these two expressions:

{( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>

{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

Have the following common part that can be shared:
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

This allows us to codegen an important inner loop in 168.wupwise as:

.LBB_foo_4:     ; no_exit.1
        lfd f2, 16(r9)
        fmul f3, f0, f2
        fmul f2, f1, f2
        fadd f4, f3, f2
        stfd f4, 8(r9)
        fsub f2, f3, f2
        stfd f2, 16(r9)
        addi r8, r8, 1
        addi r9, r9, 16
        cmpw cr0, r8, r4
        ble .LBB_foo_4  ; no_exit.1

instead of:

.LBB_foo_3:     ; no_exit.1
        lfdx f2, r6, r9
        add r10, r6, r9
        lfd f3, 8(r10)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r10)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfdx f2, r6, r9
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Nate Begeman dc3154ec66 Remove an unncessary argument to SimplifySelectCC and add an additional
assert when creating a select_cc node.

llvm-svn: 22780
2005-08-13 06:14:17 +00:00
Nate Begeman b6651e81a0 Fix the fabs regression on x86 by abstracting the select_cc optimization
out into SimplifySelectCC.  This allows both ISD::SELECT and ISD::SELECT_CC
to use the same set of simplifying folds.

llvm-svn: 22779
2005-08-13 06:00:21 +00:00
Nate Begeman a22bf778c9 Remove support for 64b PPC, it's been broken for a long time. It'll be
back once a DAG->DAG ISel exists.

llvm-svn: 22778
2005-08-13 05:59:16 +00:00
Andrew Lenharth 6b62b479fa Fix oversized GOT problem with gcc-4 on alpha
llvm-svn: 22777
2005-08-13 05:09:50 +00:00
Chris Lattner 89c1dfc733 Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes
a problem in LoopStrengthReduction, where it would split critical edges
then confused itself with outdated loop information.

llvm-svn: 22776
2005-08-13 01:38:43 +00:00
Chris Lattner 79396539d3 remove dead code. The exit block list is computed on demand, thus does not
need to be updated.  This code is a relic from when it did.

llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner 21381e8424 implement a couple of simple shift foldings.
e.g.  (X & 7) >> 3   -> 0

llvm-svn: 22774
2005-08-12 23:54:58 +00:00
Jim Laskey 35960708b7 Fix for 2005-08-12-rlwimi-crash.ll. Make allowance for masks being shifted to
zero.

llvm-svn: 22773
2005-08-12 23:52:46 +00:00
Jim Laskey a568700618 1. This changes handles the cases of (~x)&y and x&(~y) yielding ANDC, and
(~x)|y and x|(~y) yielding ORC.

llvm-svn: 22771
2005-08-12 23:38:02 +00:00
Chris Lattner 8447b49526 When splitting critical edges, make sure not to leave the new block in the
middle of the loop.  This turns a critical loop in gzip into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

instead of this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_3:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
        add r2, r11, r27
        add r8, r12, r27
        b .LBB_test_9   ; loopexit
.LBB_test_5:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
        add r2, r9, r27
        add r8, r10, r27
        b .LBB_test_9   ; loopexit
.LBB_test_7:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

Next up, improve the code for the loop.

llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner e09bbc800c Add a helper method
llvm-svn: 22768
2005-08-12 22:14:06 +00:00
Chris Lattner 4fec86d348 Fix a FIXME: if we are inserting code for a PHI argument, split the critical
edge so that the code is not always executed for both operands.  This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops).  On gzip, for
example, we turn this ugly code:

.LBB_test_1:    ; loopentry
        add r27, r3, r28
        lhz r27, 3(r27)
        add r26, r4, r28
        lhz r26, 3(r26)
        add r25, r30, r28    ;; Only live if exiting the loop
        add r24, r29, r28    ;; Only live if exiting the loop
        cmpw cr0, r27, r26
        bne .LBB_test_5 ; loopexit

into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_2:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_t_3:    ; shortcirc_next.0
.LBB_test_3:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


Next step: get the block out of the loop so that the loop is all
fall-throughs again.

llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner b7ebe65c56 Change break critical edges to not remove, then insert, PHI node entries.
Instead, just update the BB in-place.  This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.

llvm-svn: 22765
2005-08-12 21:58:07 +00:00
Andrew Lenharth 8c6701be6e match gcc's use of tabs, makes diffs easier
llvm-svn: 22764
2005-08-12 16:14:08 +00:00
Andrew Lenharth ca94102d3e .section cleanup, patch from Nicholas Riley
llvm-svn: 22763
2005-08-12 16:13:43 +00:00
Jim Laskey a50f770a2c 1. Added the function isOpcWithIntImmediate to simplify testing of operand with
specified opcode and an integer constant right operand.

2. Modified ISD::SHL, ISD::SRL, ISD::SRA to use rlwinm when applied after a mask.

llvm-svn: 22761
2005-08-11 21:59:23 +00:00
Chris Lattner d418d752f4 Tidied up the use of dyn_cast<ConstantSDNode> by using isIntImmediate more.
Patch by Jim Laskey.

llvm-svn: 22760
2005-08-11 17:56:50 +00:00
Chris Lattner c5e1312baa Use a more efficient method of creating integer and float virtual registers
(avoids an extra level of indirection in MakeReg).

  defined MakeIntReg using RegMap->createVirtualRegister(PPC32::GPRCRegisterClass)
  defined MakeFPReg using RegMap->createVirtualRegister(PPC32::FPRCRegisterClass)

  s/MakeReg(MVT::i32)/MakeIntReg/
  s/MakeReg(MVT::f64)/MakeFPReg/

Patch by Jim Laskey!

llvm-svn: 22759
2005-08-11 17:15:31 +00:00
Nate Begeman 5c7656fd53 Add a select_cc optimization for recognizing abs(int). This speeds up an
integer MPEG encoding loop by a factor of two.

llvm-svn: 22758
2005-08-11 02:18:13 +00:00
Nate Begeman 180b08897f Some SELECT_CC cleanups:
1. move assertions for node creation to getNode()
2. legalize the values returned in ExpandOp immediately
3. Move select_cc optimizations from SELECT's getNode() to SELECT_CC's,
   allowing them to be cleaned up significantly.

This paves the way to pick up additional optimizations on SELECT_CC, such
as sum-of-absolute-differences.

llvm-svn: 22757
2005-08-11 01:12:20 +00:00
Nate Begeman 5646b181e8 Make SELECT illegal on PPC32, switch to using SELECT_CC, which more closely
reflects what the hardware is capable of.  This significantly simplifies
the CC handling logic throughout the ISel.

llvm-svn: 22756
2005-08-10 20:52:09 +00:00
Nate Begeman e5b86d7442 Add new node, SELECT_CC. This node is for targets that don't natively
implement SELECT.

llvm-svn: 22755
2005-08-10 20:51:12 +00:00
Chris Lattner 3428b95634 Changes for PPC32ISelPattern.cpp
1. Clean up how SelectIntImmediateExpr handles use counts.
2. "Subtract from" was not clearing hi 16 bits.

Patch by Jim Laskey

llvm-svn: 22754
2005-08-10 18:11:33 +00:00
Chris Lattner 21c0fd9e8f Fix an oversight that may be causing PR617.
llvm-svn: 22753
2005-08-10 17:37:53 +00:00
Chris Lattner 62df798919 remove some trickiness that broke yacr2 and some other programs last night
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner aeedcc7fc2 Changed the XOR case to use the isOprNot predicate.
Patch by Jim Laskey!

llvm-svn: 22750
2005-08-10 16:35:46 +00:00
Chris Lattner 67d0753773 1. Refactored handling of integer immediate values for add, or, xor and sub.
New routine: ISel::SelectIntImmediateExpr
  2. Now checking use counts of large constants.  If use count is > 2 then drop
  thru so that the constant gets loaded into a register.
  Source:

int %test1(int %a) {
entry:
       %tmp.1 = add int %a,      123456789      ; <int> [#uses=1]
       %tmp.2 = or  int %tmp.1,  123456789      ; <int> [#uses=1]
       %tmp.3 = xor int %tmp.2,  123456789      ; <int> [#uses=1]
       %tmp.4 = sub int %tmp.3, -123456789      ; <int> [#uses=1]
       ret int %tmp.4
}

Did Emit:

       .machine ppc970


       .text
       .align  2
       .globl  _test1
_test1:
.LBB_test1_0:   ; entry
       addi r2, r3, -13035
       addis r2, r2, 1884
       ori r2, r2, 52501
       oris r2, r2, 1883
       xori r2, r2, 52501
       xoris r2, r2, 1883
       addi r2, r2, 52501
       addis r3, r2, 1883
       blr


Now Emits:

       .machine ppc970


       .text
       .align  2
       .globl  _test1
_test1:
.LBB_test1_0:   ; entry
       lis r2, 1883
       ori r2, r2, 52501
       add r3, r3, r2
       or r3, r3, r2
       xor r3, r3, r2
       add r3, r3, r2
       blr

Patch by Jim Laskey!

llvm-svn: 22749
2005-08-10 16:34:52 +00:00
Duraid Madina 1c2f9fdf71 sorry!! this is temporary; for some reason the nasty constmul code seems to
be an infinite loop when using g++-4.0.1*, this kills the ia64 nightly
tester. A proper fix shall be forthcoming!!! thanks for not killing me. :)

llvm-svn: 22748
2005-08-10 12:38:57 +00:00
Chris Lattner 5f56d71cd7 Fix a bug compiling: select (i32 < i32), f32, f32
llvm-svn: 22747
2005-08-10 03:40:09 +00:00
Chris Lattner f83ce5faee Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
into just Y.  This often occurs when it seperates loops that have collapsed loop
headers.  This implements LoopSimplify/phi-node-simplify.ll

llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner 677d85784a Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
constant stride.  This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll

llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner 35c0e2ee33 Fix an obvious oops
llvm-svn: 22742
2005-08-10 00:59:40 +00:00
Chris Lattner edff91a49a Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
For code like this:

void foo(float *a, float *b, int n, int stride_a, int stride_b) {
  int i;
  for (i=0; i<n; i++)
      a[i*stride_a] = b[i*stride_b];
}

we now emit:

.LBB_foo2_2:    ; no_exit
        lfs f0, 0(r4)
        stfs f0, 0(r3)
        addi r7, r7, 1
        add r4, r2, r4
        add r3, r6, r3
        cmpw cr0, r7, r5
        blt .LBB_foo2_2 ; no_exit

instead of:

.LBB_foo_2:     ; no_exit
        mullw r8, r2, r7     ;; multiply!
        slwi r8, r8, 2
        lfsx f0, r4, r8
        mullw r8, r2, r6     ;; multiply!
        slwi r8, r8, 2
        stfsx f0, r3, r8
        addi r2, r2, 1
        cmpw cr0, r2, r5
        blt .LBB_foo_2  ; no_exit

loops with variable strides occur pretty often.  For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.

Now we can allow indvars to turn functions written like this:

void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
  int i, ai = 0, bi = 0;
  for (i=0; i<n; i++)
    {
      a[ai] = b[bi];
      ai += stride_a;
      bi += stride_b;
    }
}

into code like the above for better analysis.  With this patch, they generate
identical code.

llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner dde7dc525e Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
by being more careful about updating PHI nodes

llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner c6c4d99a21 Fix some 80 column violations.
Once we compute the evolution for a GEP, tell SE about it.  This allows users
of the GEP to know it, if the users are not direct.  This allows us to compile
this testcase:

void fbSolidFillmmx(int w, unsigned char *d) {
    while (w >= 64) {
        *(unsigned long long *) (d +  0) = 0;
        *(unsigned long long *) (d +  8) = 0;
        *(unsigned long long *) (d + 16) = 0;
        *(unsigned long long *) (d + 24) = 0;
        *(unsigned long long *) (d + 32) = 0;
        *(unsigned long long *) (d + 40) = 0;
        *(unsigned long long *) (d + 48) = 0;
        *(unsigned long long *) (d + 56) = 0;
        w -= 64;
        d += 64;
    }
}

into:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r2, 0
        stw r2, 0(r4)
        stw r2, 4(r4)
        stw r2, 8(r4)
        stw r2, 12(r4)
        stw r2, 16(r4)
        stw r2, 20(r4)
        stw r2, 24(r4)
        stw r2, 28(r4)
        stw r2, 32(r4)
        stw r2, 36(r4)
        stw r2, 40(r4)
        stw r2, 44(r4)
        stw r2, 48(r4)
        stw r2, 52(r4)
        stw r2, 56(r4)
        stw r2, 60(r4)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

instead of:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r11, 0
        stw r11, 0(r4)
        stw r11, 4(r4)
        stwx r11, r10, r4
        add r12, r10, r4
        stw r11, 4(r12)
        stwx r11, r9, r4
        add r12, r9, r4
        stw r11, 4(r12)
        stwx r11, r8, r4
        add r12, r8, r4
        stw r11, 4(r12)
        stwx r11, r7, r4
        add r12, r7, r4
        stw r11, 4(r12)
        stwx r11, r6, r4
        add r12, r6, r4
        stw r11, 4(r12)
        stwx r11, r5, r4
        add r12, r5, r4
        stw r11, 4(r12)
        stwx r11, r2, r4
        add r12, r2, r4
        stw r11, 4(r12)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

llvm-svn: 22737
2005-08-09 23:39:36 +00:00
Chris Lattner b310ac4a86 implement two helper methods
llvm-svn: 22736
2005-08-09 23:36:33 +00:00
Chris Lattner 679f5b0b40 Fix spelling, fix some broken canonicalizations by my last patch
llvm-svn: 22734
2005-08-09 23:09:05 +00:00
Chris Lattner 54ee86aca7 add a optimization note
llvm-svn: 22732
2005-08-09 22:30:57 +00:00
Chris Lattner 14e060f743 add cc nodes to the AllNodes list so they show up in Graphviz output
llvm-svn: 22731
2005-08-09 20:40:02 +00:00
Chris Lattner 6ec7745e80 Update the targets to the new SETCC/CondCodeSDNode interfaces.
llvm-svn: 22729
2005-08-09 20:21:10 +00:00
Chris Lattner d47675ed24 Eliminate the SetCCSDNode in favor of a CondCodeSDNode class. This pulls the
CC out of the SetCC operation, making SETCC a standard ternary operation and
CC's a standard DAG leaf.  This will make it possible for other node to use
CC's as operands in the future...

llvm-svn: 22728
2005-08-09 20:20:18 +00:00
Chris Lattner 2035c4f7f8 Minor cleanup patch, no functionality changes. Written by Jim Laskey.
llvm-svn: 22727
2005-08-09 18:29:55 +00:00
Chris Lattner 4c62c647c2 Fix CodeGen/Generic/div-neg-power-2.ll, a regression from last night.
llvm-svn: 22726
2005-08-09 18:08:41 +00:00
Chris Lattner 02742710f3 SCEVAddExpr::get() of an empty list is invalid.
llvm-svn: 22724
2005-08-09 01:13:47 +00:00
Chris Lattner a091ff1764 Implement: LoopStrengthReduce/share_ivs.ll
Two changes:
  * Only insert one PHI node for each stride.  Other values are live in
    values.  This cannot introduce higher register pressure than the
    previous approach, and can take advantage of reg+reg addressing modes.
  * Factor common base values out of uses before moving values from the
    base to the immediate fields.  This improves codegen by starting the
    stride-specific PHI node out at a common place for each IV use.

As an example, we used to generate this for a loop in swim:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfd f0, 0(r8)
        stfd f0, 0(r3)
        lfd f0, 0(r6)
        stfd f0, 0(r7)
        lfd f0, 0(r2)
        stfd f0, 0(r5)
        addi r9, r9, 1
        addi r2, r2, 8
        addi r5, r5, 8
        addi r6, r6, 8
        addi r7, r7, 8
        addi r8, r8, 8
        addi r3, r3, 8
        cmpw cr0, r9, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

now we emit:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfdx f0, r8, r2
        stfdx f0, r9, r2
        lfdx f0, r5, r2
        stfdx f0, r7, r2
        lfdx f0, r3, r2
        stfdx f0, r6, r2
        addi r10, r10, 1
        addi r2, r2, 8
        cmpw cr0, r10, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

As another more dramatic example, we used to emit this:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfd f0, 8(r21)
        lfd f4, 8(r3)
        lfd f5, 8(r27)
        lfd f6, 8(r22)
        lfd f7, 8(r5)
        lfd f8, 8(r6)
        lfd f9, 8(r30)
        lfd f10, 8(r11)
        lfd f11, 8(r12)
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfd f0, 8(r4)
        lfd f0, 8(r25)
        lfd f5, 8(r26)
        lfd f6, 8(r23)
        lfd f9, 8(r28)
        lfd f10, 8(r10)
        lfd f12, 8(r9)
        lfd f13, 8(r29)
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfd f0, 8(r24)
        lfd f0, 8(r8)
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfd f0, 8(r2)
        addi r20, r20, 1
        addi r2, r2, 8
        addi r8, r8, 8
        addi r10, r10, 8
        addi r12, r12, 8
        addi r6, r6, 8
        addi r29, r29, 8
        addi r28, r28, 8
        addi r26, r26, 8
        addi r25, r25, 8
        addi r24, r24, 8
        addi r5, r5, 8
        addi r23, r23, 8
        addi r22, r22, 8
        addi r3, r3, 8
        addi r9, r9, 8
        addi r11, r11, 8
        addi r30, r30, 8
        addi r27, r27, 8
        addi r21, r21, 8
        addi r4, r4, 8
        cmpw cr0, r20, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

we now emit:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfdx f0, r21, r20
        lfdx f4, r3, r20
        lfdx f5, r27, r20
        lfdx f6, r22, r20
        lfdx f7, r5, r20
        lfdx f8, r6, r20
        lfdx f9, r30, r20
        lfdx f10, r11, r20
        lfdx f11, r12, r20
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfdx f0, r4, r20
        lfdx f0, r25, r20
        lfdx f5, r26, r20
        lfdx f6, r23, r20
        lfdx f9, r28, r20
        lfdx f10, r10, r20
        lfdx f12, r9, r20
        lfdx f13, r29, r20
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfdx f0, r24, r20
        lfdx f0, r8, r20
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfdx f0, r2, r20
        addi r19, r19, 1
        addi r20, r20, 8
        cmpw cr0, r19, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

llvm-svn: 22722
2005-08-09 00:18:09 +00:00
Chris Lattner 37c24cc98c Suck the base value out of the UsersToProcess vector into the BasedUser
class to simplify the code.  Fuse two loops.

llvm-svn: 22721
2005-08-08 22:56:21 +00:00
Chris Lattner 37ed895bf1 Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The
first is a correctness thing, and the later is an optzn thing.  This also
is needed to support a future change.

llvm-svn: 22720
2005-08-08 22:32:34 +00:00
Nate Begeman c92787e1f5 Factor out some common code, and be smarter about when to emit load hi/lo
code sequences.

llvm-svn: 22719
2005-08-08 22:22:56 +00:00
Chris Lattner d09a9a788b Allow tools with "consume after" options (like lli) to take more positional
opts than they take directly.  Thanks to John C for pointing this problem
out to me!

llvm-svn: 22717
2005-08-08 21:57:27 +00:00
Chris Lattner 64068eb7da Remove getImmediateForOpcode, which is now dead.
Patch by Jim Laskey.

llvm-svn: 22716
2005-08-08 21:34:13 +00:00
Chris Lattner 25388199a2 Add new immediate handling support for mul/div.
Patch by Jim Laskey!

llvm-svn: 22715
2005-08-08 21:33:23 +00:00
Chris Lattner 8e9dc31928 Add support for OR/XOR/SUB immediates that are handled with the new immediate
way.  This allows ORI/ORIS pairs, for example.

llvm-svn: 22714
2005-08-08 21:30:29 +00:00
Chris Lattner fd0fe76ba6 Modify the ISD::AND opcode case to use new immediate constant predicates.
Includes wider support for rotate and mask cases.

Patch by Jim Laskey.

I've requested that Jim add new regression tests the newly handled cases.

llvm-svn: 22712
2005-08-08 21:24:57 +00:00
Chris Lattner 81e0e3e933 Modify the ISD::ADD opcode case to use new immediate constant predicates.
Includes support for 32-bit constants using addi/addis.

Patch by Jim Laskey.

llvm-svn: 22711
2005-08-08 21:21:03 +00:00
Chris Lattner 4c54dae243 Modify existing support functions to use new immediate constant predicates.
Patch by Jim Laskey

llvm-svn: 22710
2005-08-08 21:12:35 +00:00
Chris Lattner 3cc070cc36 Add support predicates for future immediate constant changes.
Patch by Jim Laskey

llvm-svn: 22709
2005-08-08 21:10:27 +00:00
Chris Lattner f2267ed5c5 Move IsRunOfOnes to a more logical place and rename to a proper predicate form
(lowercase isXXX).

Patch by Jim Laskey.

llvm-svn: 22708
2005-08-08 21:08:09 +00:00
Nate Begeman 9a838678b0 Fix JIT encoding of ppc mfocrf instruction; the operands were reversed
llvm-svn: 22707
2005-08-08 20:04:52 +00:00
Chris Lattner 9f269e40c9 Use the new 'moveBefore' method to simplify some code. Really, which is
easier to understand?  :)

llvm-svn: 22706
2005-08-08 19:11:57 +00:00
Chris Lattner d380d8412d Reject command lines that have too many positional arguments passed (e.g.,
'opt x y').  This fixes PR493.

Patch contributed by Owen Anderson!

llvm-svn: 22705
2005-08-08 17:25:38 +00:00
Chris Lattner 14203e85b2 Not all constants are legal immediates in load/store instructions.
llvm-svn: 22704
2005-08-08 06:25:50 +00:00
Chris Lattner c70bbc0c41 Implement LoopStrengthReduce/share_code_in_preheader.ll by having one
rewriter for all code inserted into the preheader, which is never flushed.

llvm-svn: 22702
2005-08-08 05:47:49 +00:00
Chris Lattner 9bfa6f8784 Implement a simple optimization for the termination condition of the loop.
The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.

On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:

_foo:
        li r2, 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        cmpw cr0, r2, r4
        bne .LBB_foo_1  ; no_exit
        blr

instead of:

_foo:
        li r2, 1                ;; IV starts at 1, not 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r5, r2, 1
        cmpw cr0, r2, r4
        or r2, r5, r5           ;; Reg-reg copy, extra live range
        bne .LBB_foo_1  ; no_exit
        blr

This implements LoopStrengthReduce/exit_compare_live_range.ll

llvm-svn: 22699
2005-08-08 05:28:22 +00:00
Chris Lattner 24a0a43cb0 add new helper function
llvm-svn: 22698
2005-08-08 05:21:50 +00:00
Chris Lattner 88e2d2ee6b Handle 64-bit constant exprs on 64-bit targets.
llvm-svn: 22696
2005-08-08 04:26:32 +00:00
Chris Lattner 579b20b747 All stats are "Number of ..."
llvm-svn: 22694
2005-08-07 20:02:04 +00:00
Chris Lattner 2c14cf7b74 Add some simple folds that occur in bitfield cases. Fix a minor bug in
isHighOnes, where it would consider 0 to have high ones.

llvm-svn: 22693
2005-08-07 07:03:10 +00:00
Chris Lattner 134ebd0801 Fix typoCVS: ----------------------------------------------------------------------
llvm-svn: 22692
2005-08-07 07:00:52 +00:00
Chris Lattner 0c26a0b902 add a small simplification that can be exposed after promotion/expansion
llvm-svn: 22691
2005-08-07 05:00:44 +00:00
Chris Lattner f4dd8c445c * Use the new PHINode::hasConstantValue method to simplify some code
* Teach this code to move allocas out of the loop when tail call eliminating
  a call marked 'tail'.  This implements TailCallElim/move_alloca_for_tail_call.ll
* Do not perform this transformation if a call is marked 'tail' and if there
  are allocas that we cannot move out of the loop in #2.  Doing so would increase
  the stack usage of the function.  This implements fixes
  PR615 and TailCallElim/dont-tce-tail-marked-call.ll.

llvm-svn: 22690
2005-08-07 04:27:41 +00:00
Chris Lattner 983a415b6a Consolidate the GPOpt stuff to all use the Subtarget, instead of still
depending on the command line option.  Now the command line option just
sets the subtarget as appropriate.  G5 opts will now default to on on
G5-enabled nightly testers among other machines.

llvm-svn: 22688
2005-08-05 22:05:03 +00:00
Chris Lattner 158acab986 adjust to change in getSubtarget() api
llvm-svn: 22687
2005-08-05 21:54:27 +00:00
Chris Lattner 431b8d80bd Enable gp optimizations by default when available, even when a target triple
is available, since the target triple doesn't specify whether to use gpopts
or not.

llvm-svn: 22685
2005-08-05 21:25:13 +00:00
Chris Lattner 11fc319b5d add a note
llvm-svn: 22681
2005-08-05 19:18:32 +00:00
Chris Lattner 96ad31321a Change FindEarliestCallSeqEnd (used by libcall insertion) to use a set to
avoid revisiting nodes more than once.  This eliminates a source of
potentially exponential behavior.  For a small function in 191.fma3d
(hexah_stress_divergence_), this speeds up isel from taking > 20mins to
taking 0.07s.

llvm-svn: 22680
2005-08-05 18:10:27 +00:00
Chris Lattner 1095dc94a9 Fix a use-of-dangling-pointer bug, from the introduction of SrcValue's.
llvm-svn: 22679
2005-08-05 16:55:31 +00:00
Chris Lattner cabdc34563 Fix a latent bug in the libcall inserter that was exposed by Nate's patch
yesterday.  This fixes whetstone and a bunch of programs in the External tests.

llvm-svn: 22678
2005-08-05 16:23:57 +00:00
Chris Lattner 8c636bf8b2 don't crash when running the PPC backend on non-ppc hosts without specifying
a subtarget.

llvm-svn: 22677
2005-08-05 16:17:22 +00:00
Chris Lattner 6e709c1318 PHINode::hasConstantValue should never return the PHI itself, even if the
PHI is its only operand.

llvm-svn: 22676
2005-08-05 15:37:31 +00:00
Chris Lattner 1749aaa5e6 Fix an iterator invalidation problem when we decide a phi has a constant value
llvm-svn: 22675
2005-08-05 15:34:10 +00:00
Chris Lattner 11e7a5eda7 Make sure to clean CastedPointers after casts are potentially deleted.
This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas

llvm-svn: 22673
2005-08-05 01:30:11 +00:00
Chris Lattner 9f9c260b8c now that hasConstantValue defaults to only returning values that dominate
the PHI node, this ugly code can vanish.

llvm-svn: 22672
2005-08-05 01:04:30 +00:00
Chris Lattner 37774affb1 Invoke instructions do not dominate all successors
llvm-svn: 22671
2005-08-05 01:03:27 +00:00
Chris Lattner 6f58350daf Now that hasConstantValue is more careful w.r.t. returning values that only
dominate the PHI node, this code can go away.  This also makes passes more
aggressive, e.g. implementing Transforms/CondProp/phisimplify2.ll

llvm-svn: 22670
2005-08-05 01:02:04 +00:00
Chris Lattner bcd8d2c6e5 Use the bool argument to hasConstantValue to decide whether the client is
prepared to deal with return values that do not dominate the PHI.  If we
cannot prove that the result dominates the PHI node, do not return it if
the client can't cope.

llvm-svn: 22669
2005-08-05 01:00:58 +00:00
Chris Lattner 257efb2ad3 This code can handle non-dominating instructions
llvm-svn: 22667
2005-08-05 00:57:45 +00:00
Chris Lattner 1d8b24878f Mark hasConstantValue as a const method
llvm-svn: 22666
2005-08-05 00:49:06 +00:00
Nate Begeman 0a94dec78a Add an extra parameter that Chris requested
llvm-svn: 22665
2005-08-04 23:50:43 +00:00
Nate Begeman b392321cae Fix a fixme in CondPropagate.cpp by moving a PhiNode optimization into
BasicBlock's removePredecessor routine.  This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp

llvm-svn: 22664
2005-08-04 23:24:19 +00:00
Chris Lattner 45f8b6e7aa Modify how immediates are removed from base expressions to deal with the fact
that the symbolic evaluator is not always able to use subtraction to remove
expressions.  This makes the code faster, and fixes the last crash on 178.galgel.
Finally, add a statistic to see how many phi nodes are inserted.

On 178.galgel, we get the follow stats:

2562 loop-reduce  - Number of PHIs inserted
3927 loop-reduce  - Number of GEPs strength reduced

llvm-svn: 22662
2005-08-04 22:34:05 +00:00
Nate Begeman 77558da546 Fix a fixme in LegalizeDAG
llvm-svn: 22661
2005-08-04 21:43:28 +00:00
Nate Begeman e3cbe1027d Hack to naturally align doubles in the constant pool. Remove this once we
know what The Right Thing To Do is.

llvm-svn: 22660
2005-08-04 21:04:09 +00:00
Nate Begeman 295ea90634 Use the new subtarget support to automatically choose the correct ABI
and asm printer for PowerPC if one is not specified.

llvm-svn: 22659
2005-08-04 20:49:48 +00:00
Chris Lattner a6d7c355bc * Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase
method.
* Fix a crash on 178.galgel, where we would insert expressions before PHI
  nodes instead of into the PHI node predecessor blocks.

llvm-svn: 22657
2005-08-04 20:03:32 +00:00
Chris Lattner 0f7c0fa2a7 Fix a case that caused this to crash on 178.galgel
llvm-svn: 22653
2005-08-04 19:26:19 +00:00
Chris Lattner acc42c4df1 Teach LSR about loop-variant expressions, such as loops like this:
for (i = 0; i < N; ++i)
    A[i][foo()] = 0;

here we still want to strength reduce the A[i] part, even though foo() is
l-v.

This also simplifies some of the 'CanReduce' logic.

This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll

llvm-svn: 22652
2005-08-04 19:08:16 +00:00
Nate Begeman 456044b724 Remove some more dead code.
llvm-svn: 22650
2005-08-04 18:13:56 +00:00
Chris Lattner eaf24725b2 Refactor this code substantially with the following improvements:
1. We only analyze instructions once, guaranteed
  2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
     something much simpler.

The next step is to handle expressions that are not all indvar+loop-invariant
values (e.g. handling indvar+loopvariant).

llvm-svn: 22649
2005-08-04 17:40:30 +00:00
Andrew Lenharth 5adb830b30 No, IDEFs shouldn't be JITed
llvm-svn: 22648
2005-08-04 15:32:36 +00:00
Misha Brukman a54e201edf * Unbreak release build
* Add comments to #endif pragmas for readability

llvm-svn: 22647
2005-08-04 14:22:41 +00:00
Misha Brukman 41acd5e08d * Unbreak optimized build (noticed by Eric van Riet Paap)
* Comment #endif clauses for readability

llvm-svn: 22646
2005-08-04 14:16:48 +00:00
Nate Begeman 3bcfcd9474 Add Subtarget support to PowerPC. Next up, using it.
llvm-svn: 22644
2005-08-04 07:12:09 +00:00
Chris Lattner 6f286b760f refactor some code
llvm-svn: 22643
2005-08-04 01:19:13 +00:00
Chris Lattner 6510749050 invert to if's to make the logic simpler
llvm-svn: 22641
2005-08-04 00:40:47 +00:00
Chris Lattner a0102fbc4f When processing outer loops and we find uses of an IV in inner loops, make
sure to handle the use, just don't recurse into it.

This permits us to generate this code for a simple nested loop case:

.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r29, 44(r1)
        stw r30, 40(r1)
        mflr r11
        stw r11, 56(r1)
        lis r2, ha16(L_A$non_lazy_ptr)
        lwz r30, lo16(L_A$non_lazy_ptr)(r2)
        li r29, 1
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        li r2, 1
        or r3, r30, r30
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r3)
        stfd f0, 0(r3)
        addi r4, r2, 1
        addi r3, r3, 8
        cmpwi cr0, r2, 100
        or r2, r4, r4
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r30, r30, 800
        addi r2, r29, 1
        cmpwi cr0, r29, 100
        or r29, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 40(r1)
        lwz r29, 44(r1)
        lwz r1, 0(r1)
        blr

instead of this:

_foo:
.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r28, 44(r1)                   ;; uses an extra register.
        stw r29, 40(r1)
        stw r30, 36(r1)
        mflr r11
        stw r11, 56(r1)
        li r30, 1
        li r29, 0
        or r28, r29, r29
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        mulli r2, r28, 800           ;; unstrength-reduced multiply
        lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        mulli r4, r29, 800           ;; unstrength-reduced multiply
        addi r3, r3, 8
        add r3, r4, r3
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8                 ;; multiple stride 8 IV's
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r28, r28, 1               ;;; Many IV's with stride 1
        addi r29, r29, 1
        addi r2, r30, 1
        cmpwi cr0, r30, 100
        or r30, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 36(r1)
        lwz r29, 40(r1)
        lwz r28, 44(r1)
        lwz r1, 0(r1)
        blr

llvm-svn: 22640
2005-08-04 00:14:11 +00:00
Chris Lattner fc62470466 Teach loop-reduce to see into nested loops, to pull out immediate values
pushed down by SCEV.

In a nested loop case, this allows us to emit this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        li r3, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
        stfd f0, 0(r2)
        addi r4, r3, 1
        addi r2, r2, 8
        cmpwi cr0, r3, 100
        or r3, r4, r4
        bne .LBB_foo_2  ; no_exit.1

instead of this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        addi r3, r3, 8
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1

llvm-svn: 22639
2005-08-03 23:44:42 +00:00
Chris Lattner bb78c97e24 improve debug output
llvm-svn: 22638
2005-08-03 23:30:08 +00:00
Nate Begeman 8d394eb703 Scalar SSE: load +0.0 -> xorps/xorpd
Scalar SSE: a < b ? c : 0.0 -> cmpss, andps
Scalar SSE: float -> i16 needs to be promoted

llvm-svn: 22637
2005-08-03 23:26:28 +00:00
Chris Lattner db23c74e5e Move from Stage 0 to Stage 1.
Only emit one PHI node for IV uses with identical bases and strides (after
moving foldable immediates to the load/store instruction).

This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
us to generate this PPC code for test1:

        or r30, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r30)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

instead of this code:

        or r30, r3, r3
        or r29, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r29)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8        ;; Two iv's with step of 8
        addi r29, r29, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

llvm-svn: 22635
2005-08-03 22:51:21 +00:00
Andrew Lenharth 3a18a39587 Alpha ABI specifies stack is always 16 byte alligned, and gcc does it, so I will too
llvm-svn: 22634
2005-08-03 22:33:21 +00:00
Chris Lattner 430d0022df Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to
unify some parallel vectors and get field names more descriptive than
"first" and "second".  This isn't lisp afterall :)

llvm-svn: 22633
2005-08-03 22:21:05 +00:00
Chris Lattner 84e9baa925 Fix a nasty dangling pointer issue. The ScalarEvolution pass would keep a
map from instruction* to SCEVHandles.  When we delete instructions, we have
to tell it about it.  We would run into nasty cases where new instructions
were reallocated at old instruction addresses and get the old map values.
Bad bad bad :(

llvm-svn: 22632
2005-08-03 21:36:09 +00:00
Chris Lattner 8191442548 Fix PR611, codegen'ing SREM of FP operands to fmod or fmodf instead of
the sequence used for integer ops

llvm-svn: 22629
2005-08-03 20:31:37 +00:00
Chris Lattner 3de05cc930 The correct fix for PR612, which also fixes
Transforms/LowerInvoke/2005-08-03-InvokeWithPHIUse.ll

llvm-svn: 22628
2005-08-03 18:51:44 +00:00
Chris Lattner f8a81a9886 When inserting code, make sure not to insert it before PHI nodes. This
fixes PR612 and Transforms/LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 22626
2005-08-03 18:34:29 +00:00
Chris Lattner d683bdd0f8 Fix Transforms/SimplifyCFG/2005-08-03-PHIFactorCrash.ll, a problem that
occurred while bugpointing another testcase

llvm-svn: 22621
2005-08-03 17:59:45 +00:00
Chris Lattner 590642eb91 add support for Graphviz when viewing CFGs
llvm-svn: 22620
2005-08-03 17:55:05 +00:00
Misha Brukman fce3285891 Fix grammar: apostrophe-s ('s) is possessive, not plural; also iff vs. if.
llvm-svn: 22619
2005-08-03 17:29:52 +00:00
Chris Lattner 8dc82b7909 minor capitalization thing, patch by Jim Laskey
llvm-svn: 22617
2005-08-03 16:52:22 +00:00
Chris Lattner 2dbf1960ff Finally, add the required constraint checks to fix Transforms/SimplifyCFG/2005-08-01-PHIUpdateFail.ll
the right way

llvm-svn: 22615
2005-08-03 00:59:12 +00:00
Chris Lattner 908036942c Simplify some code, add the correct pred checks
llvm-svn: 22613
2005-08-03 00:38:27 +00:00
Chris Lattner 982b75c061 Refactor code out of PropagatePredecessorsForPHIs, turning it into a pure function with no side-effects
llvm-svn: 22612
2005-08-03 00:29:26 +00:00
Chris Lattner 1f047fd513 use splice instead of remove/insert to avoid some symtab operations
llvm-svn: 22611
2005-08-03 00:23:42 +00:00
Chris Lattner 76dc204488 move two functions up in the file, use SafeToMergeTerminators to eliminate
some duplicated code

llvm-svn: 22610
2005-08-03 00:19:45 +00:00
Chris Lattner 733d6704ce Rip some code out of the main SimplifyCFG function into a subfunction and
call it from the only place it is live.  No functionality changes.

llvm-svn: 22609
2005-08-03 00:11:16 +00:00
Chris Lattner ac594de8dc Disable this patch:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20050801/027345.html

This breaks real programs and only fixes an obscure regression testcase.  A
real fix is in development.

llvm-svn: 22606
2005-08-02 23:31:38 +00:00
Chris Lattner eee90f7eb4 Change a place to use an arbitrary value instead of null, when possible
llvm-svn: 22605
2005-08-02 23:29:23 +00:00
Chris Lattner 80e038c148 one more hunk that got dropped
llvm-svn: 22596
2005-08-02 19:35:29 +00:00
Chris Lattner 989d8bbda9 This hunk accidentally got dropped. Patch by Jim Laskey
llvm-svn: 22595
2005-08-02 19:30:55 +00:00
Chris Lattner 6667bdbaca Update to use the new MathExtras.h support for log2 computation.
Patch contributed by Jim Laskey!

llvm-svn: 22594
2005-08-02 19:26:06 +00:00
Chris Lattner 22d00a8e90 Update to use the new MathExtras.h support for log2 computation.
Patch contributed by Jim Laskey!

llvm-svn: 22592
2005-08-02 19:16:58 +00:00
Chris Lattner c98d48fabe add a pass name to make debugging dumps nicer
llvm-svn: 22588
2005-08-02 19:07:49 +00:00
Misha Brukman c9076e0acf Fix grammar: it's == "it is".
llvm-svn: 22587
2005-08-02 16:04:59 +00:00
Chris Lattner 351b891cbc Like the comment says, do not insert cast instructions before phi nodes
llvm-svn: 22586
2005-08-02 03:31:14 +00:00
Jeff Cohen 9aafa06bef It's dangerous coding on Mondays.
llvm-svn: 22585
2005-08-02 03:26:32 +00:00
Chris Lattner 4fd3e16cbd This code was very close, but not quite right. It did not take into
consideration the case where a reference in an unreachable block could
occur.  This fixes Transforms/SimplifyCFG/2005-08-01-PHIUpdateFail.ll,
something I ran into while bugpoint'ing another pass.

llvm-svn: 22584
2005-08-02 03:24:05 +00:00
Jeff Cohen ba7cc68594 Implement SetInterruptFunction for Windows.
llvm-svn: 22582
2005-08-02 03:04:47 +00:00
Chris Lattner 75a44e154e add a comment, make a check more lenient
llvm-svn: 22581
2005-08-02 02:52:02 +00:00
Chris Lattner dcce49e006 Simplify for loop, clear a per-loop map after processing each loop
llvm-svn: 22580
2005-08-02 02:44:31 +00:00
Chris Lattner 6a5d6ecd09 Implement sys::SetInterruptFunction on Unix, stub it on win32 so that the
build will not fail

llvm-svn: 22578
2005-08-02 02:14:22 +00:00
Chris Lattner 9ef1294210 Add a comment
Make LSR ignore GEP's that have loop variant base values, as we currently
cannot codegen them

llvm-svn: 22576
2005-08-02 01:32:29 +00:00
Chris Lattner 564900e5e5 Fix an iterator invalidation problem
llvm-svn: 22575
2005-08-02 00:41:11 +00:00
Chris Lattner fca31aee4f 200.sixtrack prints FP numbers with a very strange notation that uses D
instead of E for exponentials (e.g. 1.234D-43).  Add support for this
notation.

llvm-svn: 22574
2005-08-02 00:11:53 +00:00
Andrew Lenharth ae97fff758 update function codes to reflect /su flags that have been added since this was written
llvm-svn: 22571
2005-08-01 20:06:01 +00:00
Chris Lattner 4398daf069 Fix casts from long to sbyte on ppc
llvm-svn: 22570
2005-08-01 18:16:37 +00:00
Andrew Lenharth 33bbf15147 use llabs not abs
llvm-svn: 22569
2005-08-01 17:47:28 +00:00
Andrew Lenharth 27f1e26db7 one cannot allocate a global, until one is done initializing the global pointers
llvm-svn: 22568
2005-08-01 17:35:40 +00:00
Chris Lattner e17c5d0e59 ConstantInt::get only works for arguments < 128.
SimplifyLibCalls probably has to be audited to make sure it does not make
this mistake elsewhere.  Also, if this code knows that the type will be
unsigned, obviously one arm of this is dead.

Reid, can you take a look into this further?

llvm-svn: 22566
2005-08-01 16:52:50 +00:00
Jeff Cohen 546fd5944e Keep tabs and trailing spaces out.
llvm-svn: 22565
2005-07-30 18:33:25 +00:00
Jeff Cohen c500991055 Fix VC++ build problems.
llvm-svn: 22564
2005-07-30 18:22:27 +00:00
Chris Lattner 941d84a34d fix float->long conversions on x86
llvm-svn: 22563
2005-07-30 01:40:57 +00:00
Chris Lattner 4913457573 fix a typeo
llvm-svn: 22561
2005-07-30 00:43:00 +00:00
Nate Begeman 17a0e2afea Ack, typo
llvm-svn: 22560
2005-07-30 00:21:31 +00:00
Chris Lattner aeef51b6b7 Change the fp to integer code to not perform 2-byte stores followed by
1 byte loads and other operations.  This is bad for store-forwarding on
common CPUs.  We now do this:

fnstcw WORD PTR [%ESP]
mov %AX, WORD PTR [%ESP]

instead of:

fnstcw WORD PTR [%ESP]
mov %AL, BYTE PTR [%ESP + 1]

llvm-svn: 22559
2005-07-30 00:17:52 +00:00
Nate Begeman e68bcd1946 Commit a new LoopStrengthReduce pass that can use scalar evolutions and
target data to decide which loop induction variables to strength reduce
and how to do so.  This work is mostly by Chris Lattner, with tweaks by
me to get it working on some of MultiSource.

llvm-svn: 22558
2005-07-30 00:15:07 +00:00
Nate Begeman 2bca4d9b7b Break SCEVExpander out of IndVarSimplify into its own .h/.cpp file so that
other passes may use it.

llvm-svn: 22557
2005-07-30 00:12:19 +00:00
Chris Lattner 4738d1b5cd Use a custom expander for all FP to int conversions, as the X86 only has
FP-to-int-in-memory: this exposes the load from the stored slot to the
selection dag, allowing it to be folded into other operaions.

llvm-svn: 22556
2005-07-30 00:05:54 +00:00
Chris Lattner f59b2daddb Allow targets to have custom expanders for FP_TO_*INT conversions where
both the src and dest values are legal

llvm-svn: 22555
2005-07-30 00:04:12 +00:00
Andrew Lenharth 0940218cca support near allocations for the JIT
llvm-svn: 22554
2005-07-29 23:40:16 +00:00
Andrew Lenharth 2f9c52e194 turn off GOT on archs that didn't use it (not that it appeard to harm them much with it on)
llvm-svn: 22553
2005-07-29 23:32:02 +00:00
Chris Lattner bc85c32c73 Implement a FIXME: move a bunch of cruft for handling FP_TO_*INT operations
that the X86 does not support to the legalizer.  This allows it to be better
optimized, etc, and will help with SSE support.

llvm-svn: 22551
2005-07-29 01:00:29 +00:00
Chris Lattner 6dc60e859b Don't forget to diddle with the control word when performing an FISTP64.
llvm-svn: 22550
2005-07-29 00:54:34 +00:00
Chris Lattner 67756e2e22 Use a custom expander to compile this:
long %test4(double %X) {
        %tmp.1 = cast double %X to long         ; <long> [#uses=1]
        ret long %tmp.1
}

to this:

_test4:
        sub %ESP, 12
        fld QWORD PTR [%ESP + 16]
        fistp QWORD PTR [%ESP]
        mov %EDX, DWORD PTR [%ESP + 4]
        mov %EAX, DWORD PTR [%ESP]
        add %ESP, 12
        ret

instead of this:

_test4:
        sub %ESP, 28
        fld QWORD PTR [%ESP + 32]
        fstp QWORD PTR [%ESP]
        call ___fixdfdi
        add %ESP, 28
        ret

llvm-svn: 22549
2005-07-29 00:40:01 +00:00
Chris Lattner fe68d75aad Allow targets to define custom expanders for FP_TO_*INT
llvm-svn: 22548
2005-07-29 00:33:32 +00:00
Chris Lattner 44fe26ff07 allow a target to request that unknown FP_TO_*INT conversion be promoted to
a larger integer destination.

llvm-svn: 22547
2005-07-29 00:11:56 +00:00
Chris Lattner f99f8f9081 instead of having all conversions be handled by one case value, and then have
subcases inside, break things out earlier.

llvm-svn: 22546
2005-07-28 23:31:12 +00:00
Andrew Lenharth 1ec48e8683 support bsr, and more .td simplification
llvm-svn: 22543
2005-07-28 18:14:47 +00:00
Andrew Lenharth 3faa82219a new is not a valid default anywhere, so make this pure virtual
llvm-svn: 22542
2005-07-28 18:13:59 +00:00
Reid Spencer 29f83e0997 Fix a problem in getDirectoryContents where sub-directory names were
appended to a path string that didn't end in a slash, yielding invalid
path names.

Path contribute by Nicholas Riley.

llvm-svn: 22539
2005-07-28 16:25:57 +00:00
Andrew Lenharth b57b0baac0 get lazy JITing working. Some of shootout runs now
llvm-svn: 22538
2005-07-28 12:45:20 +00:00
Andrew Lenharth 3444cf5128 Like constants, globals on some platforms are GOT relative. This means they have to be allocated
near the GOT, which new doesn't do.  So break out the allocate into a new function.

Also move GOT index handling into JITResolver.  This lets it update the mapping when a Lazy
function is JITed.  It doesn't managed the table, just the mapping.  Note that this is
still non-ideal, as any function that takes a function address should also take a GOT
index, but that is a lot of changes.  The relocation resolve process updates any GOT entry
it sees is out of date.

llvm-svn: 22537
2005-07-28 12:44:13 +00:00
Chris Lattner a8195603e4 Eliminate an extra copy from R1 that Nate noticed on function calls that
have to write arguments to the stack

llvm-svn: 22536
2005-07-28 05:23:43 +00:00
Chris Lattner 7c38bd14ef Specify the correct number of operands
llvm-svn: 22535
2005-07-28 04:42:11 +00:00
Nate Begeman c5b6bd9a8e Fold constant adds into loads and stores to frame indices.
For the following code:
double %ext(int %A.0__, long %A.1__) {
        %A_addr = alloca %typedef.DComplex              ; <%typedef.DComplex*> [#uses=2]
        %tmp.1 = cast %typedef.DComplex* %A_addr to int*                ; <int*> [#uses=1]
        store int %A.0__, int* %tmp.1
        %tmp.2 = getelementptr %typedef.DComplex* %A_addr, int 0, uint 1                ; <double*> [#uses=2]
        %tmp.3 = cast double* %tmp.2 to long*           ; <long*> [#uses=1]
        store long %A.1__, long* %tmp.3
        %tmp.5 = load double* %tmp.2            ; <double> [#uses=1]
        ret double %tmp.5
}

We now generate:
_ext:
.LBB_ext_0:     ;
        stw r3, -12(r1)
        stw r4, -8(r1)
        stw r5, -4(r1)
        lfd f1, -8(r1)
        blr

Instead of:
_ext:
.LBB_ext_0:     ;
        stw r3, -12(r1)
        addi r2, r1, -12
        stw r4, 4(r2)
        stw r5, 8(r2)
        lfd f1, 4(r2)
        blr

This also fires hundreds of times on MultiSource.

llvm-svn: 22533
2005-07-28 03:02:05 +00:00
Nate Begeman 594a3ba40e Fix some comments
llvm-svn: 22530
2005-07-27 23:11:27 +00:00
Chris Lattner 96cbfbbeaf Fix debug info to not print out recently freed memory.
llvm-svn: 22529
2005-07-27 23:11:25 +00:00
Chris Lattner 9937713252 Print symbolic register names in debug dumps
llvm-svn: 22528
2005-07-27 23:03:38 +00:00
Jeff Cohen 5f4ef3c5a8 Eliminate all remaining tabs and trailing spaces.
llvm-svn: 22523
2005-07-27 06:12:32 +00:00
Nate Begeman 8e2411334c Implement the optimization for the Red Zone on Darwin. This removes the
unnecessary SP manipulation in leaf routines that don't need it.

llvm-svn: 22522
2005-07-27 06:06:29 +00:00
Chris Lattner a5ec193118 fix some warnings when compiled with 32-bit hosts
llvm-svn: 22521
2005-07-27 05:58:01 +00:00
Jeff Cohen 33a030e36c Eliminate tabs and trailing spaces.
llvm-svn: 22520
2005-07-27 05:53:44 +00:00
Chris Lattner 1defb7fe80 add a note about the red zone
llvm-svn: 22518
2005-07-26 19:07:51 +00:00
Chris Lattner 06721f50b0 Wrap some long lines, fix emission of weak global variables
llvm-svn: 22517
2005-07-26 19:03:27 +00:00
Nate Begeman 7b8733571e Update the PPC readme
llvm-svn: 22516
2005-07-26 18:59:06 +00:00
Chris Lattner 31d0ac2414 ConvertibleToGEP always returns 0, remove some old crufty code which
is actually dead because of this!

llvm-svn: 22515
2005-07-26 16:38:28 +00:00
Chris Lattner 54d27a49da fix a warning on 32-bit systems
llvm-svn: 22513
2005-07-25 23:42:58 +00:00
Nate Begeman b9ce12b4bb Fix an optimization put in for accessing static globals. This obviates
the need to build PIC.

llvm-svn: 22512
2005-07-25 21:15:28 +00:00
Andrew Lenharth 54b0b62ff0 fix compile error
llvm-svn: 22508
2005-07-23 07:46:48 +00:00
Chris Lattner 12f5d12b21 PowerPC no-pic code is not quite ready for prime-time
llvm-svn: 22507
2005-07-22 22:58:34 +00:00
Andrew Lenharth 7dec1f3aa6 Handle more imm forms, and load small negative i32 constants without hitting memory (should do the same for arbitrary zero extended small negative constants)
llvm-svn: 22505
2005-07-22 22:24:01 +00:00
Andrew Lenharth c32843ed2b finally found the gcc defined constants
llvm-svn: 22502
2005-07-22 21:00:30 +00:00
Andrew Lenharth 55d045190e Alpha JIT (beta)
llvm-svn: 22500
2005-07-22 20:52:16 +00:00
Andrew Lenharth 02daecc7c6 simpilfy instruction encoding (and make the lines way shorter, aka Misha happification)
llvm-svn: 22499
2005-07-22 20:50:29 +00:00
Andrew Lenharth 111e5e6490 update interface
llvm-svn: 22498
2005-07-22 20:49:37 +00:00
Andrew Lenharth 940b07c7fa the JIT memory manager will construct a GOT if you want it too. Also, it places the constants in the allocated memory, rather than a malloc area
llvm-svn: 22497
2005-07-22 20:48:12 +00:00
Nate Begeman a9443f29b0 Support building non-PIC
Remove the LoadHiAddr pseudo-instruction.
Optimization of stores to and loads from statics.
Force JIT to use new non-PIC codepaths.

llvm-svn: 22494
2005-07-21 20:44:43 +00:00
Chris Lattner 53208ecf34 revert to using 4-byte alignment for doubles, as specified by the ABI
llvm-svn: 22493
2005-07-21 19:17:18 +00:00
Nate Begeman 15527113ab Support assembling fsqrt on darwin. This will be implemented better when
PowerPC gets subtarget support up.

llvm-svn: 22489
2005-07-21 01:25:49 +00:00
Nate Begeman 8465fe8b4b Generate mfocrf when targeting g5. Generate fsqrt/fsqrts when targetin g5.
8-byte align doubles.

llvm-svn: 22486
2005-07-20 22:42:00 +00:00
Chris Lattner 18aa4d8196 Do not let MaskedValueIsZero consider undef to be zero, for reasons
explained in the comment.

This fixes UnitTests/2003-09-18-BitFieldTest on darwin

llvm-svn: 22483
2005-07-20 18:49:28 +00:00
Chris Lattner 05732c86d4 count the number of relocations performed.
llvm-svn: 22480
2005-07-20 16:29:20 +00:00
Nate Begeman 0851f1aaa1 Integrate SelectFPExpr into SelectExpr. This gets PPC32 closer to being
automatically generated from a target description.

llvm-svn: 22470
2005-07-19 16:51:05 +00:00
Nate Begeman 1ac40a1245 Remove unnecessary FP_EXTEND. This causes worse codegen for SSE.
llvm-svn: 22469
2005-07-19 16:50:03 +00:00
Reid Spencer d37d854cb2 For: memory operations -> stores
This is the first incremental patch to implement this feature. It adds no
functionality to LLVM but setup up the information needed from targets in
order to implement the optimization correctly. Each target needs to specify
the maximum number of store operations for conversion of the llvm.memset,
llvm.memcpy, and llvm.memmove intrinsics into a sequence of store operations.
The limit needs to be chosen at the threshold of performance for such an
optimization (generally smallish). The target also needs to specify whether
the target can support unaligned stores for multi-byte store operations.
This helps ensure the optimization doesn't generate code that will trap on
an alignment errors.
More patches to follow.

llvm-svn: 22468
2005-07-19 04:52:44 +00:00
Chris Lattner 247aef884c When transforming &A[i] < &A[j] -> i < j, make sure to perform the comparison
as a signed compare.  This patch may fix PR597, but is correct in any case.

llvm-svn: 22465
2005-07-18 23:07:33 +00:00
Chris Lattner b35912e421 The assertion was wrong: the code only worked for i64. While we're at it,
expand the code to work for all integer datatypes.  This should unbreak
alpha.

llvm-svn: 22464
2005-07-18 04:31:14 +00:00
Chris Lattner a5998ce94f Only get the .bss and .data sections when needed instead of unconditionally.
This allows is to not emit empty sections when .data or .bss is not used.

llvm-svn: 22457
2005-07-16 17:41:06 +00:00
Chris Lattner 363964e53d Refactor getSection() method to make it easier to use.
llvm-svn: 22455
2005-07-16 17:36:04 +00:00
Chris Lattner fd44500427 Major refactor of the ELFWriter code. Instead of building up one big
vector that represents the .o file at once, build up a vector for each
section of the .o file.  This is needed because the .o file writer needs
to be able to switch between sections as it emits them (e.g. switch
between the .text section and the .rel section when emitting code).

This patch has no functionality change.

llvm-svn: 22453
2005-07-16 08:01:13 +00:00
Nate Begeman 7e74c834c1 Teach the legalizer how to promote SINT_TO_FP to a wider SINT_TO_FP that
the target natively supports.  This eliminates some special-case code from
the x86 backend and generates better code as well.

For an i8 to f64 conversion, before & after:

_x87 before:
        subl $2, %esp
        movb 6(%esp), %al
        movsbw %al, %ax
        movw %ax, (%esp)
        filds (%esp)
        addl $2, %esp
        ret

_x87 after:
        subl $2, %esp
        movsbw 6(%esp), %ax
        movw %ax, (%esp)
        filds (%esp)
        addl $2, %esp
        ret

_sse before:
        subl $12, %esp
        movb 16(%esp), %al
        movsbl %al, %eax
        cvtsi2sd %eax, %xmm0
        addl $12, %esp
        ret

_sse after:
        subl $12, %esp
        movsbl 16(%esp), %eax
        cvtsi2sd %eax, %xmm0
        addl $12, %esp
        ret

llvm-svn: 22452
2005-07-16 02:02:34 +00:00
Nate Begeman 8293d0e232 Teach the register allocator that movaps is also a move instruction
llvm-svn: 22451
2005-07-16 02:00:20 +00:00
Nate Begeman 57b9ed522d A couple more darwinisms
llvm-svn: 22450
2005-07-16 01:59:47 +00:00
Chris Lattner 507a27592f Remove all knowledge of UINT_TO_FP from the X86 backend, relying on the
legalizer to eliminate them.  With this comes the expected code quality
improvements, such as, for this:

double foo(unsigned short X) { return X; }

we now generate this:

_foo:
        subl $4, %esp
        movzwl 8(%esp), %eax
        movl %eax, (%esp)
        fildl (%esp)
        addl $4, %esp
        ret

instead of this:

_foo:
        subl $4, %esp
        movw 8(%esp), %ax
        movzwl %ax, %eax   ;; Load not folded into this.
        movl %eax, (%esp)
        fildl (%esp)
        addl $4, %esp
        ret

-Chris

llvm-svn: 22449
2005-07-16 00:28:20 +00:00
Chris Lattner e3e847bfd7 Break the code for expanding UINT_TO_FP operations out into its own
SelectionDAGLegalize::ExpandLegalUINT_TO_FP method.

Add a new method, PromoteLegalUINT_TO_FP, which allows targets to request
that UINT_TO_FP operations be promoted to a larger input type.  This is
useful for targets that have some UINT_TO_FP or SINT_TO_FP operations but
not all of them (like X86).

The same should be done with SINT_TO_FP, but this patch does not do that
yet.

llvm-svn: 22447
2005-07-16 00:19:57 +00:00
Chris Lattner b47f5e6d54 You can't use config options without config.h
llvm-svn: 22446
2005-07-15 22:48:31 +00:00
Nate Begeman a0b5e035ea Get closer to fully working scalar FP in SSE regs. This gets singlesource
working, and Olden/power.

llvm-svn: 22441
2005-07-15 00:38:55 +00:00
Nate Begeman 0f38dc4970 Add support for printing the sse scalar comparison instruction mnemonics.
llvm-svn: 22440
2005-07-14 22:52:25 +00:00
John Criswell 3870f9d31a Fixed PR#596:
Add parenthesis around the value being negated; that way, if the value
begins with a minus sign (e.g. negative integer), we won't generate a
C predecrement operator by mistake.

llvm-svn: 22437
2005-07-14 19:41:16 +00:00
Chris Lattner 46524e2573 Make this use the new autoconf support for finding the executables for
gv and Graphviz.

llvm-svn: 22434
2005-07-14 05:33:13 +00:00
Chris Lattner fcc53ad625 As discussed on IRC, this stuff is just for debugging.
llvm-svn: 22432
2005-07-14 05:17:43 +00:00
Chris Lattner 51ded0e1ee If the Graphviz program is available, use it to visualize dot graphs.
llvm-svn: 22429
2005-07-14 01:10:55 +00:00
Reid Spencer 86df93a216 Don't call pthread_mutexattr_setpshared on FreeBSD because its implementation
of pthreads is missing that call (despite it violating the spec).

llvm-svn: 22423
2005-07-13 03:02:06 +00:00
Jeff Cohen 65806e62b1 Note to self: don't introduce memory leaks.
llvm-svn: 22422
2005-07-13 02:58:04 +00:00
Jeff Cohen 3800a28d46 Win32 support for Mutex class.
llvm-svn: 22420
2005-07-13 02:15:18 +00:00
Chris Lattner f9ddfef872 Fix Alpha/2005-07-12-TwoMallocCalls.ll and PR593.
It is not safe to call LegalizeOp on something that has already been legalized.
Instead, just force another iteration of legalization.

This could affect all platforms but X86, as this codepath is dynamically
dead on X86 (ISD::MEMSET and friends are legal).

llvm-svn: 22419
2005-07-13 02:00:04 +00:00
Chris Lattner ba08a336f0 Fix test/Regression/CodeGen/Generic/2005-07-12-memcpy-i64-length.ll
llvm-svn: 22417
2005-07-13 01:42:45 +00:00
Nate Begeman 8dd96ec769 Check in the last of the darwin-specific code necessary to get shootout
working before modifying the asm printer to use the subtarget info.

llvm-svn: 22408
2005-07-12 18:34:58 +00:00
Nate Begeman 5fc86e8314 Remove some code that moved to the generic asm printer a long time ago.
llvm-svn: 22407
2005-07-12 18:34:15 +00:00
Reid Spencer 79876f52aa For PR540:
This patch completes the changes for making lli thread-safe. Here's the list
of changes:
* The Support/ThreadSupport* files were removed and replaced with the
  MutexGuard.h file since all ThreadSupport* declared was a Mutex Guard.
  The implementation of MutexGuard.h is now based on sys::Mutex which hides
  its implementation and makes it unnecessary to have the -NoSupport.h and
  -PThreads.h versions of ThreadSupport.

* All places in ExecutionEngine that previously referred to "Mutex" now
  refer to sys::Mutex

* All places in ExecutionEngine that previously referred to "MutexLocker"
  now refer to MutexGuard (this is frivolous but I believe the technically
  correct name for such a class is "Guard" not a "Locker").

These changes passed all of llvm-test. All we need now are some test cases
that actually use multiple threads.

llvm-svn: 22404
2005-07-12 15:51:55 +00:00
Reid Spencer f404981bf2 For PR540:
Add a Mutex class for thread synchronization in a platform-independent way.
The current implementation only supports pthreads. Win32 use of Critical
Sections will be added later. The design permits other threading models to
be used if (and only if) pthreads is not available.

llvm-svn: 22403
2005-07-12 15:37:43 +00:00
Chris Lattner 298ac69934 Add support for 64-bit elf files
llvm-svn: 22400
2005-07-12 06:57:52 +00:00
Andrew Lenharth 20b534a4fd Fix povray and minor cleanups
llvm-svn: 22397
2005-07-12 04:20:52 +00:00
Jeff Cohen ddc8b78cda I don't know how this ever compiled with gcc, but VC++ correctly rejects it.
llvm-svn: 22394
2005-07-12 02:59:38 +00:00
Jeff Cohen 33b8232ce0 VC++ demands that the function returns a value
llvm-svn: 22393
2005-07-12 02:53:33 +00:00
Nate Begeman df8946dede Clean up the TargetSubtarget class a bit, removing an unnecessary argument
to the constructor.

llvm-svn: 22392
2005-07-12 02:41:19 +00:00
Chris Lattner 351817b1f9 Minor changes to improve comments and fix the build on _WIN32 systems.
llvm-svn: 22391
2005-07-12 02:36:10 +00:00
Chris Lattner f873f4d504 Add a note
llvm-svn: 22390
2005-07-12 02:35:36 +00:00
Nate Begeman f26625e1de Implement Subtarget support
Implement the X86 Subtarget.

This consolidates the checks for target triple, and setting options based
on target triple into one place.  This allows us to convert the asm printer
and isel over from being littered with "forDarwin", "forCygwin", etc. into
just having the appropriate flags for each subtarget feature controlling
the code for that feature.

This patch also implements indirect external and weak references in the
X86 pattern isel, for darwin.  Next up is to convert over the asm printers
to use this new interface.

llvm-svn: 22389
2005-07-12 01:41:54 +00:00
Nate Begeman 83b492b83c Commit some pending darwin changes before subtarget support.
llvm-svn: 22388
2005-07-12 01:37:28 +00:00
Chris Lattner fd564c6bc9 fix a warning
llvm-svn: 22385
2005-07-11 22:46:18 +00:00
Andrew Lenharth 23167c3be9 Remove glibc specific functions, and mark a couple as C99
llvm-svn: 22384
2005-07-11 20:35:20 +00:00
Andrew Lenharth c51a74cc02 because on alpha:
#   define errno (*__errno_location ())

*shakes head

llvm-svn: 22383
2005-07-11 17:41:12 +00:00