Commit Graph

602 Commits

Author SHA1 Message Date
Duncan Sands 53c954fa86 Output sinl for a long double FSIN node, not sin.
Likewise fix up a bunch of other libcalls.  While
there I remove NEG_F32 and NEG_F64 since they are
not used anywhere.  This fixes 9 Ada ACATS failures.

llvm-svn: 45833
2008-01-10 10:28:30 +00:00
Evan Cheng 0f8c7c4a73 Codegen improvement has reduced one spill.
llvm-svn: 45814
2008-01-10 02:54:40 +00:00
Chris Lattner e34d7d0e24 new testcase for PR1845
llvm-svn: 45795
2008-01-10 00:30:38 +00:00
Evan Cheng 0e400d4cb7 Special copy SUnit's do not have SDNode's.
llvm-svn: 45787
2008-01-09 23:01:55 +00:00
Evan Cheng a31824a08e Fix sse2.psrl.w and sse2.psrl.q definitions.
llvm-svn: 45772
2008-01-09 02:16:44 +00:00
Chris Lattner 51b01bf8a5 Make load->store deletion a bit smarter. This allows us to compile this:
void test(long long *P) { *P ^= 1; }

into just:

_test:
	movl	4(%esp), %eax
	xorl	$1, (%eax)
	ret

instead of code like this:

_test:
	movl	4(%esp), %ecx
        xorl    $1, (%ecx)
	movl	4(%ecx), %edx
	movl	%edx, 4(%ecx)
	ret

llvm-svn: 45762
2008-01-08 23:08:06 +00:00
Duncan Sands 7b1460cca4 Crashes llc when using Chris's new legalization logic.
llvm-svn: 45758
2008-01-08 21:51:53 +00:00
Chris Lattner b17db3afa8 remove darwin/i386 t-t
llvm-svn: 45743
2008-01-08 06:52:51 +00:00
Chris Lattner 89f36e6b21 Finally implement correct ordered comparisons for PPC, even though
the code generated is not wonderful.  This turns a miscompilation into
a code quality bug (noted in the ppc readme).  This fixes PR642, which
is over 2 years old (!).  Nate, please review this.

llvm-svn: 45742
2008-01-08 06:46:30 +00:00
Nate Begeman d3d49df3f1 Update test to catch recent x86 insert regression and improvements
llvm-svn: 45705
2008-01-07 17:49:23 +00:00
Gordon Henriksen c7e991b7c3 Setting GlobalDirective in TargetAsmInfo by default rather than
providing a misleading facility. It's used once in the MIPS backend
and hardcoded as "\t.globl\t" everywhere else.

llvm-svn: 45676
2008-01-07 02:31:11 +00:00
Gordon Henriksen 6047b6e140 With this patch, the LowerGC transformation becomes the
ShadowStackCollector, which additionally has reduced overhead with
no sacrifice in portability.

Considering a function @fun with 8 loop-local roots,
ShadowStackCollector introduces the following overhead
(x86):

; shadowstack prologue
        movl    L_llvm_gc_root_chain$non_lazy_ptr, %eax
        movl    (%eax), %ecx
        movl    $___gc_fun, 20(%esp)
        movl    $0, 24(%esp)
        movl    $0, 28(%esp)
        movl    $0, 32(%esp)
        movl    $0, 36(%esp)
        movl    $0, 40(%esp)
        movl    $0, 44(%esp)
        movl    $0, 48(%esp)
        movl    $0, 52(%esp)
        movl    %ecx, 16(%esp)
        leal    16(%esp), %ecx
        movl    %ecx, (%eax)

; shadowstack loop overhead
        (none)

; shadowstack epilogue
        movl    48(%esp), %edx
        movl    %edx, (%ecx)

; shadowstack metadata
        .align  3
___gc_fun:                              # __gc_fun
        .long   8
        .space  4

In comparison to LowerGC:

; lowergc prologue
        movl    L_llvm_gc_root_chain$non_lazy_ptr, %eax
        movl    (%eax), %ecx
        movl    %ecx, 48(%esp)
        movl    $8, 52(%esp)
        movl    $0, 60(%esp)
        movl    $0, 56(%esp)
        movl    $0, 68(%esp)
        movl    $0, 64(%esp)
        movl    $0, 76(%esp)
        movl    $0, 72(%esp)
        movl    $0, 84(%esp)
        movl    $0, 80(%esp)
        movl    $0, 92(%esp)
        movl    $0, 88(%esp)
        movl    $0, 100(%esp)
        movl    $0, 96(%esp)
        movl    $0, 108(%esp)
        movl    $0, 104(%esp)
        movl    $0, 116(%esp)
        movl    $0, 112(%esp)

; lowergc loop overhead
        leal    44(%esp), %eax
        movl    %eax, 56(%esp)
        leal    40(%esp), %eax
        movl    %eax, 64(%esp)
        leal    36(%esp), %eax
        movl    %eax, 72(%esp)
        leal    32(%esp), %eax
        movl    %eax, 80(%esp)
        leal    28(%esp), %eax
        movl    %eax, 88(%esp)
        leal    24(%esp), %eax
        movl    %eax, 96(%esp)
        leal    20(%esp), %eax
        movl    %eax, 104(%esp)
        leal    16(%esp), %eax
        movl    %eax, 112(%esp)

; lowergc epilogue
        movl    48(%esp), %edx
        movl    %edx, (%ecx)

; lowergc metadata
        (none)

llvm-svn: 45670
2008-01-07 01:30:53 +00:00
Chris Lattner 41e423a6f5 fix this to use a valid triple.
llvm-svn: 45509
2008-01-02 22:21:45 +00:00
Chris Lattner 5d998c5712 verify that aligned common support doesn't break.
llvm-svn: 45495
2008-01-02 19:48:24 +00:00
Duncan Sands 57a60f0466 Fix PR1833 - eh.exception and eh.selector return two
values, which means doing extra legalization work.
It would be easier to get this kind of thing right if
there was some documentation...

llvm-svn: 45472
2007-12-31 18:35:50 +00:00
Chris Lattner d2b8a36f0e One readme entry is done, one is really easy (Evan, want to investigate
eliminating the llvm.x86.sse2.loadl.pd intrinsic?), one shuffle optzn
may be done (if shufps is better than pinsw, Evan, please review), and
we already know about LICM of simple instructions.

llvm-svn: 45407
2007-12-29 19:31:47 +00:00
Chris Lattner 0d90c8f016 upgrade this test
llvm-svn: 45406
2007-12-29 19:24:06 +00:00
Chris Lattner 3b6a82118b Fold comparisons against a constant nan, and optimize ORD/UNORD
comparisons with a constant.  This allows us to compile isnan to:

_foo:
	fcmpu cr7, f1, f1
	mfcr r2
	rlwinm r3, r2, 0, 31, 31
	blr 

instead of:

LCPI1_0:					;  float
	.space	4
_foo:
	lis r2, ha16(LCPI1_0)
	lfs f0, lo16(LCPI1_0)(r2)
	fcmpu cr7, f1, f0
	mfcr r2
	rlwinm r3, r2, 0, 31, 31
	blr 

llvm-svn: 45405
2007-12-29 08:37:08 +00:00
Chris Lattner 33de0c6e92 this xform is implemented.
llvm-svn: 45404
2007-12-29 08:19:39 +00:00
Chris Lattner 07ccbfa64a Codegen:
as:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstps	(%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

instead of:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstpl	(%esi)
	cvtsd2ss	(%esi), %xmm0
	movss	%xmm0, (%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

llvm-svn: 45401
2007-12-29 06:57:38 +00:00
Chris Lattner 8013bd339b avoid going through a stack slot to convert from fpstack to xmm reg
if we are just going to store it back anyway.  This improves things 
like:
double foo();
void bar(double *P) { *P = foo(); }

llvm-svn: 45399
2007-12-29 06:41:28 +00:00
Chris Lattner bc13df19a8 one fewer uncond branch with my codegenprepare hack for single-mbb backedges.
llvm-svn: 45360
2007-12-26 17:23:47 +00:00
Gordon Henriksen d89e645c38 Tests for changes made in r45356, where IPO optimizations would drop
collector algorithms.

llvm-svn: 45357
2007-12-26 02:47:37 +00:00
Gordon Henriksen b969c5981b GC poses hazards to the inliner. Consider:
define void @f() {
            ...
            call i32 @g()
            ...
    }

    define void @g() {
            ...
    }

The hazards are:

  - @f and @g have GC, but they differ GC. Inlining is invalid. This
    may never occur.
  - @f has no GC, but @g does. g's GC must be propagated to @f.

The other scenarios are safe:

  - @f and @g have the same GC.
  - @f and @g have no GC.
  - @g has no GC.

This patch adds inliner checks for the former two scenarios.

llvm-svn: 45351
2007-12-25 03:10:07 +00:00
Gordon Henriksen fb56bde933 Noting and enforcing that GC intrinsics are valid only within a
function with GC.

This will catch the error when the inliner inlines a function with
GC into a caller with no GC.

llvm-svn: 45350
2007-12-25 02:31:26 +00:00
Gordon Henriksen 9157c499fc Adjusting verification of "llvm.gc*" intrinsic prototypes to match
LangRef.

llvm-svn: 45349
2007-12-25 02:02:10 +00:00
Evan Cheng ddc9af11f0 Remove xfail. This is fixed.
llvm-svn: 45254
2007-12-20 02:25:21 +00:00
Scott Michel 5f1470f03a More working CellSPU tests:
- vec_const.ll: Vector constant loads
- immed64.ll: i64, f64 constant loads

llvm-svn: 45242
2007-12-20 00:44:13 +00:00
Scott Michel 5ecac82f71 CellSPU testcase, extract_elt.ll: extract vector element.
llvm-svn: 45219
2007-12-19 21:17:42 +00:00
Scott Michel a246e09aa0 More working CellSPU test cases:
- call.ll: Function call
- ctpop.ll: Count population
- dp_farith.ll: DP arithmetic
- eqv.ll: Equivalence primitives
- fcmp.ll: SP comparisons
- fdiv.ll: SP division
- fneg-fabs.ll: SP negation, aboslute value
- int2fp.ll: Integer -> SP conversion
- rotate_ops.ll: Rotation primitives
- select_bits.ll: (a & c) | (b & ~c) bit selection
- shift_ops.ll: Shift primitives
- sp_farith.ll: SP arithmentic

llvm-svn: 45217
2007-12-19 20:50:49 +00:00
Scott Michel 098c113bc8 Two more test cases: or_ops.ll (arithmetic or operations) and vecinsert.ll
(vector insertions)

llvm-svn: 45216
2007-12-19 20:15:47 +00:00
Scott Michel 9b834469e0 Add new immed16.ll test case, fix CellSPU errata to make test case work.
llvm-svn: 45196
2007-12-19 07:35:06 +00:00
Evan Cheng 483a969ece Fix PR1872: SrcValue and SrcValueOffset should not be used to compute load / store node id.
llvm-svn: 45167
2007-12-18 19:38:14 +00:00
Evan Cheng 91e0fc9cb4 FIX for PR1799: When a load is unfolded from an instruction, check if it is a new node. If not, do not create a new SUnit.
llvm-svn: 45157
2007-12-18 08:42:10 +00:00
Scott Michel 8172f85e2f i32 immediate constant test case for CellSPU
llvm-svn: 45134
2007-12-17 23:45:52 +00:00
Scott Michel c5cccb9e60 - Restore some i8 functionality in CellSPU
- New test case: nand.ll

llvm-svn: 45130
2007-12-17 22:32:34 +00:00
Duncan Sands b5a79d0eaa Make invokes of inline asm legal. Teach codegen
how to lower them (with no attempt made to be
efficient, since they should only occur for
unoptimized code).

llvm-svn: 45108
2007-12-17 18:08:19 +00:00
Evan Cheng 23d2d4dc6c Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Scott Michel 0aa7133f82 Start committing working test cases for CellSPU.
llvm-svn: 45050
2007-12-15 00:38:50 +00:00
Evan Cheng 0e6408124e Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero.
llvm-svn: 45029
2007-12-14 08:30:15 +00:00
Evan Cheng e9fbc3f014 Implement ctlz and cttz with bsr and bsf.
llvm-svn: 45024
2007-12-14 02:13:44 +00:00
Evan Cheng 37c36ed79a Be extra careful with extension use optimation. Now turned on by default.
llvm-svn: 44981
2007-12-13 03:32:53 +00:00
Evan Cheng 827d30db19 Fold some and + shift in x86 addressing mode.
llvm-svn: 44970
2007-12-13 00:43:27 +00:00
Evan Cheng 6e68381e02 Implicit def instructions, e.g. X86::IMPLICIT_DEF_GR32, are always re-materializable and they should not be spilled.
llvm-svn: 44960
2007-12-12 23:12:09 +00:00
Dan Gohman 7a7742c2fe Allow vector integer constants to be created with
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.

llvm-svn: 44954
2007-12-12 22:21:26 +00:00
Evan Cheng 0f42730722 Use shuffles to implement insert_vector_elt for i32, i64, f32, and f64.
llvm-svn: 44929
2007-12-12 07:55:34 +00:00
Evan Cheng 0a1254f634 Add a test case for -optimize-ext-uses.
llvm-svn: 44928
2007-12-12 07:54:08 +00:00
Evan Cheng 2a98956796 Lower a build_vector with all constants into a constpool load unless it can be done with a move to low part.
llvm-svn: 44921
2007-12-12 06:45:40 +00:00
Evan Cheng 4fbf459549 - Improved v8i16 shuffle lowering. It now uses pshuflw and pshufhw as much as
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
  (i32 extract_vector_element 0) does not require a pextrw.

llvm-svn: 44836
2007-12-11 01:46:18 +00:00
Christopher Lamb d202e03fe5 Improve branch folding by recgonizing that explict successor relationships impact the value of fall-through choices.
llvm-svn: 44785
2007-12-10 07:24:06 +00:00