Duncan Sands
53c954fa86
Output sinl for a long double FSIN node, not sin.
...
Likewise fix up a bunch of other libcalls. While
there I remove NEG_F32 and NEG_F64 since they are
not used anywhere. This fixes 9 Ada ACATS failures.
llvm-svn: 45833
2008-01-10 10:28:30 +00:00
Evan Cheng
0f8c7c4a73
Codegen improvement has reduced one spill.
...
llvm-svn: 45814
2008-01-10 02:54:40 +00:00
Chris Lattner
e34d7d0e24
new testcase for PR1845
...
llvm-svn: 45795
2008-01-10 00:30:38 +00:00
Evan Cheng
0e400d4cb7
Special copy SUnit's do not have SDNode's.
...
llvm-svn: 45787
2008-01-09 23:01:55 +00:00
Evan Cheng
a31824a08e
Fix sse2.psrl.w and sse2.psrl.q definitions.
...
llvm-svn: 45772
2008-01-09 02:16:44 +00:00
Chris Lattner
51b01bf8a5
Make load->store deletion a bit smarter. This allows us to compile this:
...
void test(long long *P) { *P ^= 1; }
into just:
_test:
movl 4(%esp), %eax
xorl $1, (%eax)
ret
instead of code like this:
_test:
movl 4(%esp), %ecx
xorl $1, (%ecx)
movl 4(%ecx), %edx
movl %edx, 4(%ecx)
ret
llvm-svn: 45762
2008-01-08 23:08:06 +00:00
Duncan Sands
7b1460cca4
Crashes llc when using Chris's new legalization logic.
...
llvm-svn: 45758
2008-01-08 21:51:53 +00:00
Chris Lattner
b17db3afa8
remove darwin/i386 t-t
...
llvm-svn: 45743
2008-01-08 06:52:51 +00:00
Chris Lattner
89f36e6b21
Finally implement correct ordered comparisons for PPC, even though
...
the code generated is not wonderful. This turns a miscompilation into
a code quality bug (noted in the ppc readme). This fixes PR642, which
is over 2 years old (!). Nate, please review this.
llvm-svn: 45742
2008-01-08 06:46:30 +00:00
Nate Begeman
d3d49df3f1
Update test to catch recent x86 insert regression and improvements
...
llvm-svn: 45705
2008-01-07 17:49:23 +00:00
Gordon Henriksen
c7e991b7c3
Setting GlobalDirective in TargetAsmInfo by default rather than
...
providing a misleading facility. It's used once in the MIPS backend
and hardcoded as "\t.globl\t" everywhere else.
llvm-svn: 45676
2008-01-07 02:31:11 +00:00
Gordon Henriksen
6047b6e140
With this patch, the LowerGC transformation becomes the
...
ShadowStackCollector, which additionally has reduced overhead with
no sacrifice in portability.
Considering a function @fun with 8 loop-local roots,
ShadowStackCollector introduces the following overhead
(x86):
; shadowstack prologue
movl L_llvm_gc_root_chain$non_lazy_ptr, %eax
movl (%eax), %ecx
movl $___gc_fun, 20(%esp)
movl $0, 24(%esp)
movl $0, 28(%esp)
movl $0, 32(%esp)
movl $0, 36(%esp)
movl $0, 40(%esp)
movl $0, 44(%esp)
movl $0, 48(%esp)
movl $0, 52(%esp)
movl %ecx, 16(%esp)
leal 16(%esp), %ecx
movl %ecx, (%eax)
; shadowstack loop overhead
(none)
; shadowstack epilogue
movl 48(%esp), %edx
movl %edx, (%ecx)
; shadowstack metadata
.align 3
___gc_fun: # __gc_fun
.long 8
.space 4
In comparison to LowerGC:
; lowergc prologue
movl L_llvm_gc_root_chain$non_lazy_ptr, %eax
movl (%eax), %ecx
movl %ecx, 48(%esp)
movl $8, 52(%esp)
movl $0, 60(%esp)
movl $0, 56(%esp)
movl $0, 68(%esp)
movl $0, 64(%esp)
movl $0, 76(%esp)
movl $0, 72(%esp)
movl $0, 84(%esp)
movl $0, 80(%esp)
movl $0, 92(%esp)
movl $0, 88(%esp)
movl $0, 100(%esp)
movl $0, 96(%esp)
movl $0, 108(%esp)
movl $0, 104(%esp)
movl $0, 116(%esp)
movl $0, 112(%esp)
; lowergc loop overhead
leal 44(%esp), %eax
movl %eax, 56(%esp)
leal 40(%esp), %eax
movl %eax, 64(%esp)
leal 36(%esp), %eax
movl %eax, 72(%esp)
leal 32(%esp), %eax
movl %eax, 80(%esp)
leal 28(%esp), %eax
movl %eax, 88(%esp)
leal 24(%esp), %eax
movl %eax, 96(%esp)
leal 20(%esp), %eax
movl %eax, 104(%esp)
leal 16(%esp), %eax
movl %eax, 112(%esp)
; lowergc epilogue
movl 48(%esp), %edx
movl %edx, (%ecx)
; lowergc metadata
(none)
llvm-svn: 45670
2008-01-07 01:30:53 +00:00
Chris Lattner
41e423a6f5
fix this to use a valid triple.
...
llvm-svn: 45509
2008-01-02 22:21:45 +00:00
Chris Lattner
5d998c5712
verify that aligned common support doesn't break.
...
llvm-svn: 45495
2008-01-02 19:48:24 +00:00
Duncan Sands
57a60f0466
Fix PR1833 - eh.exception and eh.selector return two
...
values, which means doing extra legalization work.
It would be easier to get this kind of thing right if
there was some documentation...
llvm-svn: 45472
2007-12-31 18:35:50 +00:00
Chris Lattner
d2b8a36f0e
One readme entry is done, one is really easy (Evan, want to investigate
...
eliminating the llvm.x86.sse2.loadl.pd intrinsic?), one shuffle optzn
may be done (if shufps is better than pinsw, Evan, please review), and
we already know about LICM of simple instructions.
llvm-svn: 45407
2007-12-29 19:31:47 +00:00
Chris Lattner
0d90c8f016
upgrade this test
...
llvm-svn: 45406
2007-12-29 19:24:06 +00:00
Chris Lattner
3b6a82118b
Fold comparisons against a constant nan, and optimize ORD/UNORD
...
comparisons with a constant. This allows us to compile isnan to:
_foo:
fcmpu cr7, f1, f1
mfcr r2
rlwinm r3, r2, 0, 31, 31
blr
instead of:
LCPI1_0: ; float
.space 4
_foo:
lis r2, ha16(LCPI1_0)
lfs f0, lo16(LCPI1_0)(r2)
fcmpu cr7, f1, f0
mfcr r2
rlwinm r3, r2, 0, 31, 31
blr
llvm-svn: 45405
2007-12-29 08:37:08 +00:00
Chris Lattner
33de0c6e92
this xform is implemented.
...
llvm-svn: 45404
2007-12-29 08:19:39 +00:00
Chris Lattner
07ccbfa64a
Codegen:
...
as:
_bar:
pushl %esi
subl $8, %esp
movl 16(%esp), %esi
call L_foo$stub
fstps (%esi)
addl $8, %esp
popl %esi
#FP_REG_KILL
ret
instead of:
_bar:
pushl %esi
subl $8, %esp
movl 16(%esp), %esi
call L_foo$stub
fstpl (%esi)
cvtsd2ss (%esi), %xmm0
movss %xmm0, (%esi)
addl $8, %esp
popl %esi
#FP_REG_KILL
ret
llvm-svn: 45401
2007-12-29 06:57:38 +00:00
Chris Lattner
8013bd339b
avoid going through a stack slot to convert from fpstack to xmm reg
...
if we are just going to store it back anyway. This improves things
like:
double foo();
void bar(double *P) { *P = foo(); }
llvm-svn: 45399
2007-12-29 06:41:28 +00:00
Chris Lattner
bc13df19a8
one fewer uncond branch with my codegenprepare hack for single-mbb backedges.
...
llvm-svn: 45360
2007-12-26 17:23:47 +00:00
Gordon Henriksen
d89e645c38
Tests for changes made in r45356, where IPO optimizations would drop
...
collector algorithms.
llvm-svn: 45357
2007-12-26 02:47:37 +00:00
Gordon Henriksen
b969c5981b
GC poses hazards to the inliner. Consider:
...
define void @f() {
...
call i32 @g()
...
}
define void @g() {
...
}
The hazards are:
- @f and @g have GC, but they differ GC. Inlining is invalid. This
may never occur.
- @f has no GC, but @g does. g's GC must be propagated to @f.
The other scenarios are safe:
- @f and @g have the same GC.
- @f and @g have no GC.
- @g has no GC.
This patch adds inliner checks for the former two scenarios.
llvm-svn: 45351
2007-12-25 03:10:07 +00:00
Gordon Henriksen
fb56bde933
Noting and enforcing that GC intrinsics are valid only within a
...
function with GC.
This will catch the error when the inliner inlines a function with
GC into a caller with no GC.
llvm-svn: 45350
2007-12-25 02:31:26 +00:00
Gordon Henriksen
9157c499fc
Adjusting verification of "llvm.gc*" intrinsic prototypes to match
...
LangRef.
llvm-svn: 45349
2007-12-25 02:02:10 +00:00
Evan Cheng
ddc9af11f0
Remove xfail. This is fixed.
...
llvm-svn: 45254
2007-12-20 02:25:21 +00:00
Scott Michel
5f1470f03a
More working CellSPU tests:
...
- vec_const.ll: Vector constant loads
- immed64.ll: i64, f64 constant loads
llvm-svn: 45242
2007-12-20 00:44:13 +00:00
Scott Michel
5ecac82f71
CellSPU testcase, extract_elt.ll: extract vector element.
...
llvm-svn: 45219
2007-12-19 21:17:42 +00:00
Scott Michel
a246e09aa0
More working CellSPU test cases:
...
- call.ll: Function call
- ctpop.ll: Count population
- dp_farith.ll: DP arithmetic
- eqv.ll: Equivalence primitives
- fcmp.ll: SP comparisons
- fdiv.ll: SP division
- fneg-fabs.ll: SP negation, aboslute value
- int2fp.ll: Integer -> SP conversion
- rotate_ops.ll: Rotation primitives
- select_bits.ll: (a & c) | (b & ~c) bit selection
- shift_ops.ll: Shift primitives
- sp_farith.ll: SP arithmentic
llvm-svn: 45217
2007-12-19 20:50:49 +00:00
Scott Michel
098c113bc8
Two more test cases: or_ops.ll (arithmetic or operations) and vecinsert.ll
...
(vector insertions)
llvm-svn: 45216
2007-12-19 20:15:47 +00:00
Scott Michel
9b834469e0
Add new immed16.ll test case, fix CellSPU errata to make test case work.
...
llvm-svn: 45196
2007-12-19 07:35:06 +00:00
Evan Cheng
483a969ece
Fix PR1872: SrcValue and SrcValueOffset should not be used to compute load / store node id.
...
llvm-svn: 45167
2007-12-18 19:38:14 +00:00
Evan Cheng
91e0fc9cb4
FIX for PR1799: When a load is unfolded from an instruction, check if it is a new node. If not, do not create a new SUnit.
...
llvm-svn: 45157
2007-12-18 08:42:10 +00:00
Scott Michel
8172f85e2f
i32 immediate constant test case for CellSPU
...
llvm-svn: 45134
2007-12-17 23:45:52 +00:00
Scott Michel
c5cccb9e60
- Restore some i8 functionality in CellSPU
...
- New test case: nand.ll
llvm-svn: 45130
2007-12-17 22:32:34 +00:00
Duncan Sands
b5a79d0eaa
Make invokes of inline asm legal. Teach codegen
...
how to lower them (with no attempt made to be
efficient, since they should only occur for
unoptimized code).
llvm-svn: 45108
2007-12-17 18:08:19 +00:00
Evan Cheng
23d2d4dc6c
Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
...
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Scott Michel
0aa7133f82
Start committing working test cases for CellSPU.
...
llvm-svn: 45050
2007-12-15 00:38:50 +00:00
Evan Cheng
0e6408124e
Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero.
...
llvm-svn: 45029
2007-12-14 08:30:15 +00:00
Evan Cheng
e9fbc3f014
Implement ctlz and cttz with bsr and bsf.
...
llvm-svn: 45024
2007-12-14 02:13:44 +00:00
Evan Cheng
37c36ed79a
Be extra careful with extension use optimation. Now turned on by default.
...
llvm-svn: 44981
2007-12-13 03:32:53 +00:00
Evan Cheng
827d30db19
Fold some and + shift in x86 addressing mode.
...
llvm-svn: 44970
2007-12-13 00:43:27 +00:00
Evan Cheng
6e68381e02
Implicit def instructions, e.g. X86::IMPLICIT_DEF_GR32, are always re-materializable and they should not be spilled.
...
llvm-svn: 44960
2007-12-12 23:12:09 +00:00
Dan Gohman
7a7742c2fe
Allow vector integer constants to be created with
...
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.
llvm-svn: 44954
2007-12-12 22:21:26 +00:00
Evan Cheng
0f42730722
Use shuffles to implement insert_vector_elt for i32, i64, f32, and f64.
...
llvm-svn: 44929
2007-12-12 07:55:34 +00:00
Evan Cheng
0a1254f634
Add a test case for -optimize-ext-uses.
...
llvm-svn: 44928
2007-12-12 07:54:08 +00:00
Evan Cheng
2a98956796
Lower a build_vector with all constants into a constpool load unless it can be done with a move to low part.
...
llvm-svn: 44921
2007-12-12 06:45:40 +00:00
Evan Cheng
4fbf459549
- Improved v8i16 shuffle lowering. It now uses pshuflw and pshufhw as much as
...
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
(i32 extract_vector_element 0) does not require a pextrw.
llvm-svn: 44836
2007-12-11 01:46:18 +00:00
Christopher Lamb
d202e03fe5
Improve branch folding by recgonizing that explict successor relationships impact the value of fall-through choices.
...
llvm-svn: 44785
2007-12-10 07:24:06 +00:00