Chris Lattner
8be5be817c
Implement an annoying part of the Darwin/X86 abi: the callee of a struct
...
return argument pops the hidden struct pointer if present, not the caller.
For example, in this testcase:
struct X { int D, E, F, G; };
struct X bar() {
struct X a;
a.D = 0;
a.E = 1;
a.F = 2;
a.G = 3;
return a;
}
void foo(struct X *P) {
*P = bar();
}
We used to emit:
_foo:
subl $28, %esp
movl 32(%esp), %eax
movl %eax, (%esp)
call _bar
addl $28, %esp
ret
_bar:
movl 4(%esp), %eax
movl $0, (%eax)
movl $1, 4(%eax)
movl $2, 8(%eax)
movl $3, 12(%eax)
ret
This is correct on Linux/X86 but not Darwin/X86. With this patch, we now
emit:
_foo:
subl $28, %esp
movl 32(%esp), %eax
movl %eax, (%esp)
call _bar
*** addl $24, %esp
ret
_bar:
movl 4(%esp), %eax
movl $0, (%eax)
movl $1, 4(%eax)
movl $2, 8(%eax)
movl $3, 12(%eax)
*** ret $4
For the record, GCC emits (which is functionally equivalent to our new code):
_bar:
movl 4(%esp), %eax
movl $3, 12(%eax)
movl $2, 8(%eax)
movl $1, 4(%eax)
movl $0, (%eax)
ret $4
_foo:
pushl %esi
subl $40, %esp
movl 48(%esp), %esi
leal 16(%esp), %eax
movl %eax, (%esp)
call _bar
subl $4, %esp
movl 16(%esp), %eax
movl %eax, (%esi)
movl 20(%esp), %eax
movl %eax, 4(%esi)
movl 24(%esp), %eax
movl %eax, 8(%esi)
movl 28(%esp), %eax
movl %eax, 12(%esi)
addl $40, %esp
popl %esi
ret
This fixes SingleSource/Benchmarks/CoyoteBench/fftbench with LLC and the
JIT, and fixes the X86-backend portion of PR729. The CBE still needs to
be updated.
llvm-svn: 28438
2006-05-23 18:50:38 +00:00
Chris Lattner
01dd6df5f3
CSRet allows varargs
...
llvm-svn: 28409
2006-05-19 21:34:04 +00:00
Evan Cheng
8c6b234ce8
Should pass by reference.
...
llvm-svn: 28357
2006-05-17 19:07:40 +00:00
Chris Lattner
c7df70db57
Implement the custom lowering hook right, returning values for all of the
...
arguments at once.
llvm-svn: 28327
2006-05-16 17:14:26 +00:00
Chris Lattner
7b8b8bbbf9
Fix a bug I introduced yesterday, which broke functions with *no* arguments.
...
llvm-svn: 28326
2006-05-16 17:08:35 +00:00
Evan Cheng
9fee442e63
X86 integer register classes naming changes. Make them consistent with FP, vector classes.
...
llvm-svn: 28324
2006-05-16 07:21:53 +00:00
Chris Lattner
3d82699605
Add a chain to FORMAL_ARGUMENTS. This is a minimal port of the X86 backend,
...
it doesn't currently use/maintain the chain properly. Also, make the
X86ISelLowering.cpp file 80-col clean.
llvm-svn: 28320
2006-05-16 06:45:34 +00:00
Chris Lattner
22f95b74ba
Dead variable
...
llvm-svn: 28265
2006-05-12 21:12:22 +00:00
Chris Lattner
6d4a2dc4ad
Teach the X86 backend about non-i32 inline asm register classes.
...
llvm-svn: 28139
2006-05-06 00:29:37 +00:00
Chris Lattner
44a73e9fa5
Teach the code generator to use cvtss2sd as extload f32 -> f64
...
llvm-svn: 28131
2006-05-05 21:35:18 +00:00
Owen Anderson
20a631fde7
Refactor TargetMachine, pushing handling of TargetData into the target-specific subclasses. This has one caller-visible change: getTargetData() now returns a pointer instead of a reference.
...
This fixes PR 759.
llvm-svn: 28074
2006-05-03 01:29:57 +00:00
Evan Cheng
88decded82
Initial caller side support (for CCC only, not FastCC) of 128-bit vector
...
passing by value.
llvm-svn: 28015
2006-04-28 21:29:37 +00:00
Evan Cheng
3cd4362ade
Implement four-wide shuffle with 2 shufps if no more than two elements come
...
from each vector. e.g.
shuffle(G1, G2, 7, 1, 5, 2)
==>
movaps _G2, %xmm0
shufps $151, _G1, %xmm0
shufps $216, %xmm0, %xmm0
llvm-svn: 28011
2006-04-28 07:03:38 +00:00
Evan Cheng
d43c5c6046
TargetLowering::LowerArguments should return a VBIT_CONVERT of
...
FORMAL_ARGUMENTS SDOperand in the return result vector.
llvm-svn: 28009
2006-04-28 05:25:15 +00:00
Evan Cheng
f4f3f0d25f
Make x86 isel lowering produce tailcall nodes. They are match to normal calls
...
for now.
Patch contributed by Alexander Friedman.
llvm-svn: 27994
2006-04-27 08:40:39 +00:00
Evan Cheng
89001ad729
Support for passing 128-bit vector arguments via XMM registers.
...
llvm-svn: 27992
2006-04-27 08:31:10 +00:00
Evan Cheng
a0374e1bed
Oops
...
llvm-svn: 27989
2006-04-27 05:44:50 +00:00
Evan Cheng
24eb3f4765
Bug fix: not updating NumIntRegs.
...
llvm-svn: 27988
2006-04-27 05:35:28 +00:00
Evan Cheng
48940d16b2
- Clean up formal argument lowering code. Prepare for vector pass by value work.
...
- Fixed vararg support.
llvm-svn: 27985
2006-04-27 01:32:22 +00:00
Evan Cheng
1c39903297
Fix fastcc failures.
...
llvm-svn: 27980
2006-04-26 18:21:31 +00:00
Evan Cheng
e0bcfbe811
Switching over FORMAL_ARGUMENTS mechanism to lower call arguments.
...
llvm-svn: 27975
2006-04-26 01:20:17 +00:00
Evan Cheng
a9467aab0a
Separate LowerOperation() into multiple functions, one per opcode.
...
llvm-svn: 27972
2006-04-25 20:13:52 +00:00
Evan Cheng
5c2bfb069e
Special case handling two wide build_vector(0, x).
...
llvm-svn: 27961
2006-04-24 22:58:52 +00:00
Evan Cheng
b0461080e4
A little bit more build_vector enhancement for v8i16 cases.
...
llvm-svn: 27959
2006-04-24 18:01:45 +00:00
Evan Cheng
b4f31dd1a8
MOVL shuffle (i.e. movd or movss / movsd from memory) of undef, V2 == V2
...
llvm-svn: 27953
2006-04-23 06:35:19 +00:00
Nate Begeman
4ca2ea5b43
JumpTable support! What this represents is working asm and jit support for
...
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.
llvm-svn: 27947
2006-04-22 18:53:45 +00:00
Evan Cheng
e728efdfce
Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
...
llvm-svn: 27946
2006-04-22 08:34:05 +00:00
Evan Cheng
16ef94f4e8
Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
...
llvm-svn: 27945
2006-04-22 06:21:46 +00:00
Evan Cheng
14215c36b6
Revamp build_vector lowering to take advantage of movss and movd instructions.
...
movd always clear the top 96 bits and movss does so when it's loading the
value from memory.
The net result is codegen for 4-wide shuffles is much improved. It is near
optimal if one or more elements is a zero. e.g.
__m128i test(int a, int b) {
return _mm_set_epi32(0, 0, b, a);
}
compiles to
_test:
movd 8(%esp), %xmm1
movd 4(%esp), %xmm0
punpckldq %xmm1, %xmm0
ret
compare to gcc:
_test:
subl $12, %esp
movd 20(%esp), %xmm0
movd 16(%esp), %xmm1
punpckldq %xmm0, %xmm1
movq %xmm1, %xmm0
movhps LC0, %xmm0
addl $12, %esp
ret
or icc:
_test:
movd 4(%esp), %xmm0 #5.10
movd 8(%esp), %xmm3 #5.10
xorl %eax, %eax #5.10
movd %eax, %xmm1 #5.10
punpckldq %xmm1, %xmm0 #5.10
movd %eax, %xmm2 #5.10
punpckldq %xmm2, %xmm3 #5.10
punpckldq %xmm3, %xmm0 #5.10
ret #5.10
There are still room for improvement, for example the FP variant of the above example:
__m128 test(float a, float b) {
return _mm_set_ps(0.0, 0.0, b, a);
}
_test:
movss 8(%esp), %xmm1
movss 4(%esp), %xmm0
unpcklps %xmm1, %xmm0
xorps %xmm1, %xmm1
movlhps %xmm1, %xmm0
ret
The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.
llvm-svn: 27939
2006-04-21 23:03:30 +00:00
Evan Cheng
e8b5180044
Now generating perfect (I think) code for "vector set" with a single non-zero
...
scalar value.
e.g.
_mm_set_epi32(0, a, 0, 0);
==>
movd 4(%esp), %xmm0
pshufd $69, %xmm0, %xmm0
_mm_set_epi8(0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
==>
movzbw 4(%esp), %ax
movzwl %ax, %eax
pxor %xmm0, %xmm0
pinsrw $5, %eax, %xmm0
llvm-svn: 27923
2006-04-21 01:05:10 +00:00
Evan Cheng
60f0b8998e
- Added support to turn "vector clear elements", e.g. pand V, <-1, -1, 0, -1>
...
to a vector shuffle.
- VECTOR_SHUFFLE lowering change in preparation for more efficient codegen
of vector shuffle with zero (or any splat) vector.
llvm-svn: 27875
2006-04-20 08:58:49 +00:00
Evan Cheng
15c264b753
Handle v2i64 BUILD_VECTOR custom lowering correctly. v2i64 is a legal type,
...
but i64 is not. If possible, change a i64 op to a f64 (e.g. load, constant)
and then cast it back.
llvm-svn: 27849
2006-04-20 00:11:39 +00:00
Evan Cheng
4a1b0d3292
isSplatMask() bug: first element can be an undef.
...
llvm-svn: 27847
2006-04-19 23:28:59 +00:00
Evan Cheng
a3caaee503
- Added support to do aribitrary 4 wide shuffle with no more than three
...
instructions.
- Fixed a commute vector_shuff bug.
llvm-svn: 27845
2006-04-19 22:48:17 +00:00
Evan Cheng
7855e4d032
Commute vector_shuffle to match more movlhps, movlp{s|d} cases.
...
llvm-svn: 27840
2006-04-19 20:35:22 +00:00
Evan Cheng
5421206c4b
Use movss to insert_vector_elt(v, s, 0).
...
llvm-svn: 27782
2006-04-17 22:45:49 +00:00
Evan Cheng
6e5e205841
Use two pinsrw to insert an element into v4i32 / v4f32 vector.
...
llvm-svn: 27779
2006-04-17 22:04:06 +00:00
Evan Cheng
5022b3426e
Implement v8i16, v16i8 splat using unpckl + pshufd.
...
llvm-svn: 27768
2006-04-17 20:43:08 +00:00
Chris Lattner
c070c621ac
implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll
...
llvm-svn: 27767
2006-04-17 20:32:50 +00:00
Evan Cheng
b3b41c4f3d
FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly
...
llvm-svn: 27755
2006-04-17 07:24:10 +00:00
Evan Cheng
6222cf2a36
Silly bug
...
llvm-svn: 27719
2006-04-15 05:37:34 +00:00
Evan Cheng
65bb720a8b
Do not use movs{h|l}dup for a shuffle with a single non-undef node.
...
llvm-svn: 27718
2006-04-15 03:13:24 +00:00
Evan Cheng
5d247f81c1
Last few SSE3 intrinsics.
...
llvm-svn: 27711
2006-04-14 21:59:03 +00:00
Evan Cheng
e4f97ccf7f
X86 SSE2 supports v8i16 multiplication
...
llvm-svn: 27644
2006-04-13 05:10:25 +00:00
Evan Cheng
92232307d0
All "integer" logical ops (pand, por, pxor) are now promoted to v2i64.
...
Clean up and fix various logical ops issues.
llvm-svn: 27633
2006-04-12 21:21:57 +00:00
Evan Cheng
e2157c6e41
Promote v4i32, v8i16, v16i8 load to v2i64 load.
...
llvm-svn: 27612
2006-04-12 17:12:36 +00:00
Evan Cheng
12ba3e23d0
Added support for _mm_move_ss and _mm_move_sd.
...
llvm-svn: 27575
2006-04-11 00:19:04 +00:00
Evan Cheng
617a6a812e
Conditional move of vector types.
...
llvm-svn: 27556
2006-04-10 07:23:14 +00:00
Evan Cheng
ac847268c5
Code clean up.
...
llvm-svn: 27501
2006-04-07 21:53:05 +00:00
Evan Cheng
c995b45f67
- movlp{s|d} and movhp{s|d} support.
...
- Normalize shuffle nodes so result vector lower half elements come from the
first vector, the rest come from the second vector. (Except for the
exceptions :-).
- Other minor fixes.
llvm-svn: 27474
2006-04-06 23:23:56 +00:00