Commit Graph

45 Commits

Author SHA1 Message Date
Chris Lattner 44c2b90556 Fix x86-64 byval passing to specify the alignment even when the code
generator will give it something sufficient.  This is important because
the mid-level optimizer doesn't know what alignment is required otherwise.

llvm-svn: 131879
2011-05-22 23:21:23 +00:00
John McCall e0fda7377e The 0.98 revision of the x86-64 ABI clarified a lot of things, some
of which break strict compatibility with previous compilers.  Implement
one of them and then immediately opt out on Darwin.

llvm-svn: 129899
2011-04-21 01:20:55 +00:00
Chris Lattner 69e683fb35 vector of long and ulong are also classified as INTEGER in x86-64 abi,
this fixes rdar://8358475 a failure of the gcc.dg/compat/vector_1 abi
test.

llvm-svn: 112205
2010-08-26 18:13:50 +00:00
Chris Lattner 46830f2fd6 1 x ulonglong needs to be classified as INTEGER, just like 1 x longlong,
this fixes a miscompilation on the included testcase, rdar://8359248

llvm-svn: 112201
2010-08-26 18:03:20 +00:00
Chris Lattner 51e1cc2fe2 tame an assertion, fixing rdar://8357396
llvm-svn: 112174
2010-08-26 06:28:35 +00:00
Chris Lattner 9f8b451876 Finally pass "two floats in a 64-bit unit" as a <2 x float> instead of
as a double in the x86-64 ABI.  This allows us to generate much better
code for certain things, e.g.:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

Used to compile into (look at the integer silliness!):

_f32:                                   ## @f32
## BB#0:                                ## %entry
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	movd	%xmm0, %rcx
	movd	%ecx, %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %edx
	shrq	$32, %rax
	movd	%eax, %xmm0
	shrq	$32, %rcx
	movd	%ecx, %xmm1
	addss	%xmm0, %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rax
	addq	%rdx, %rax
	movd	%rax, %xmm0
	ret

Now we get:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

and compile stuff like:

extern float _Complex ccoshf( float _Complex ) ;
float _Complex ccosf ( float _Complex z ) {
 float _Complex iz;
 (__real__ iz) = -(__imag__ z);
 (__imag__ iz) = (__real__ z);
 return ccoshf(iz);
}

into:

_ccosf:                                 ## @ccosf
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm1
	xorps	LCPI4_0(%rip), %xmm1
	unpcklps	%xmm0, %xmm1
	movaps	%xmm1, %xmm0
	jmp	_ccoshf                 ## TAILCALL

instead of:

_ccosf:                                 ## @ccosf
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movq	%rax, %rcx
	shlq	$32, %rcx
	shrq	$32, %rax
	xorl	$-2147483648, %eax      ## imm = 0xFFFFFFFF80000000
	addq	%rcx, %rax
	movd	%rax, %xmm0
	jmp	_ccoshf                 ## TAILCALL


There is still "stuff to be done" here for the struct case,
but this resolves rdar://6379669 - [x86-64 ABI] Pass and return 
_Complex float / double efficiently

llvm-svn: 112111
2010-08-25 23:39:14 +00:00
Chris Lattner 7f4b81af7a fix rdar://8251384, another case where we could access beyond the
end of a struct.  This improves the case when the struct being passed
contains 3 floats, either due to a struct or array of 3 things.  Before
we'd generate this IR for the testcase:

define float @bar(double %X.coerce0, double %X.coerce1) nounwind {
entry:
  %X = alloca %struct.foof, align 8               ; <%struct.foof*> [#uses=2]
  %0 = bitcast %struct.foof* %X to %1*            ; <%1*> [#uses=2]
  %1 = getelementptr %1* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %X.coerce0, double* %1
  %2 = getelementptr %1* %0, i32 0, i32 1         ; <double*> [#uses=1]
  store double %X.coerce1, double* %2
  %tmp = getelementptr inbounds %struct.foof* %X, i32 0, i32 2 ; <float*> [#uses=1]
  %tmp1 = load float* %tmp                        ; <float> [#uses=1]
  ret float %tmp1
}

which compiled (with optimization) to:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movd	%xmm1, %rax
	movd	%eax, %xmm0
	ret

Now we produce:

define float @bar(double %X.coerce0, float %X.coerce1) nounwind {
entry:
  %X = alloca %struct.foof, align 8               ; <%struct.foof*> [#uses=2]
  %0 = bitcast %struct.foof* %X to %0*            ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %X.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <float*> [#uses=1]
  store float %X.coerce1, float* %2
  %tmp = getelementptr inbounds %struct.foof* %X, i32 0, i32 2 ; <float*> [#uses=1]
  %tmp1 = load float* %tmp                        ; <float> [#uses=1]
  ret float %tmp1
}

and:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movaps	%xmm1, %xmm0
	ret

llvm-svn: 109776
2010-07-29 18:13:09 +00:00
Chris Lattner 3f76342cfc handle a case where we could access off the end of a function
that Eli pointed out, rdar://8249586

llvm-svn: 109762
2010-07-29 17:34:39 +00:00
Chris Lattner 44f9c3b3f1 in release mode, irbuilder doesn't add names to instructions,
this will hopefully fix the osuosl clang-i686-darwin10 builder.

llvm-svn: 109760
2010-07-29 17:14:05 +00:00
Chris Lattner 98076a25ce This is a little bit far, but optimize cases like:
struct a {
  struct c {
    double x;
    int y;
  } x[1];
};

void foo(struct a A) {
}

into:

define void @foo(double %A.coerce0, i32 %A.coerce1) nounwind {
entry:
  %A = alloca %struct.a, align 8                  ; <%struct.a*> [#uses=1]
  %0 = bitcast %struct.a* %A to %struct.c*        ; <%struct.c*> [#uses=2]
  %1 = getelementptr %struct.c* %0, i32 0, i32 0  ; <double*> [#uses=1]
  store double %A.coerce0, double* %1
  %2 = getelementptr %struct.c* %0, i32 0, i32 1  ; <i32*> [#uses=1]
  store i32 %A.coerce1, i32* %2

instead of:

define void @foo(double %A.coerce0, i64 %A.coerce1) nounwind {
entry:
  %A = alloca %struct.a, align 8                  ; <%struct.a*> [#uses=1]
  %0 = bitcast %struct.a* %A to %0*               ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %A.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <i64*> [#uses=1]
  store i64 %A.coerce1, i64* %2

I only do this now because I never want to look at this code again :)
 

llvm-svn: 109738
2010-07-29 07:43:55 +00:00
Chris Lattner c8b7b53a1e implement a todo: pass a eight-byte that consists of a
small integer + padding as that small integer.  On code
like:

struct c { double x; int y; };
void bar(struct c C) { }

This means that we compile to:

define void @bar(double %C.coerce0, i32 %C.coerce1) nounwind {
entry:
  %C = alloca %struct.c, align 8                  ; <%struct.c*> [#uses=2]
  %0 = getelementptr %struct.c* %C, i32 0, i32 0  ; <double*> [#uses=1]
  store double %C.coerce0, double* %0
  %1 = getelementptr %struct.c* %C, i32 0, i32 1  ; <i32*> [#uses=1]
  store i32 %C.coerce1, i32* %1

instead of:

define void @bar(double %C.coerce0, i64 %C.coerce1) nounwind {
entry:
  %C = alloca %struct.c, align 8                  ; <%struct.c*> [#uses=3]
  %0 = bitcast %struct.c* %C to %0*               ; <%0*> [#uses=2]
  %1 = getelementptr %0* %0, i32 0, i32 0         ; <double*> [#uses=1]
  store double %C.coerce0, double* %1
  %2 = getelementptr %0* %0, i32 0, i32 1         ; <i64*> [#uses=1]
  store i64 %C.coerce1, i64* %2

which gives SRoA heartburn.

This implements rdar://5711709, a nice low number :)

llvm-svn: 109737
2010-07-29 07:30:00 +00:00
Chris Lattner fe34c1d53e Kill off the 'coerce' ABI passing form. Now 'direct' and 'extend' always
have a "coerce to" type which often matches the default lowering of Clang
type to LLVM IR type, but the coerce case can be handled by making them
not be the same.

This simplifies things and fixes issues where X86-64 abi lowering would 
return coerce after making preferred types exactly match up.  This caused
us to compile:

typedef float v4f32 __attribute__((__vector_size__(16)));
v4f32 foo(v4f32 X) {
  return X+X;
}

into this code at -O0:

define <4 x float> @foo(<4 x float> %X.coerce) nounwind {
entry:
  %retval = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=2]
  %coerce = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=2]
  %X.addr = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=3]
  store <4 x float> %X.coerce, <4 x float>* %coerce
  %X = load <4 x float>* %coerce                  ; <<4 x float>> [#uses=1]
  store <4 x float> %X, <4 x float>* %X.addr
  %tmp = load <4 x float>* %X.addr                ; <<4 x float>> [#uses=1]
  %tmp1 = load <4 x float>* %X.addr               ; <<4 x float>> [#uses=1]
  %add = fadd <4 x float> %tmp, %tmp1             ; <<4 x float>> [#uses=1]
  store <4 x float> %add, <4 x float>* %retval
  %0 = load <4 x float>* %retval                  ; <<4 x float>> [#uses=1]
  ret <4 x float> %0
}

Now we get:

define <4 x float> @foo(<4 x float> %X) nounwind {
entry:
  %X.addr = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=3]
  store <4 x float> %X, <4 x float>* %X.addr
  %tmp = load <4 x float>* %X.addr                ; <<4 x float>> [#uses=1]
  %tmp1 = load <4 x float>* %X.addr               ; <<4 x float>> [#uses=1]
  %add = fadd <4 x float> %tmp, %tmp1             ; <<4 x float>> [#uses=1]
  ret <4 x float> %add
}

This implements rdar://8248065

llvm-svn: 109733
2010-07-29 06:26:06 +00:00
Chris Lattner 9fa15c3608 ignore structs that wrap vectors in IR, the abstraction shouldn't add penalty.
Before we'd compile the example into something like:

  %coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
  %1 = bitcast <4 x float>* %coerce.dive2 to <2 x double>* ; <<2 x double>*> [#uses=1]
  %2 = load <2 x double>* %1, align 1             ; <<2 x double>> [#uses=1]
  ret <2 x double> %2

Now we produce:

  %coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
  %0 = load <4 x float>* %coerce.dive2, align 1   ; <<4 x float>> [#uses=1]
  ret <4 x float> %0

llvm-svn: 109732
2010-07-29 05:02:29 +00:00
Chris Lattner 4200fe4e50 move the 'pretty 16-byte vector' inferring code up to be shared
with return values, improving stuff that returns __m128 etc.

llvm-svn: 109731
2010-07-29 04:56:46 +00:00
Chris Lattner 3a44c7e55d now that we have CGT around, we can start using preferred types
for return values too.  Instead of compiling something like:

struct foo {
  int *X;
  float *Y;
};

struct foo test(struct foo *P) { return *P; }

to:

%1 = type { i64, i64 }

define %1 @test(%struct.foo* %P) nounwind {
entry:
  %retval = alloca %struct.foo, align 8           ; <%struct.foo*> [#uses=2]
  %P.addr = alloca %struct.foo*, align 8          ; <%struct.foo**> [#uses=2]
  store %struct.foo* %P, %struct.foo** %P.addr
  %tmp = load %struct.foo** %P.addr               ; <%struct.foo*> [#uses=1]
  %tmp1 = bitcast %struct.foo* %retval to i8*     ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.foo* %tmp to i8*        ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 16, i32 8, i1 false)
  %0 = bitcast %struct.foo* %retval to %1*        ; <%1*> [#uses=1]
  %1 = load %1* %0, align 1                       ; <%1> [#uses=1]
  ret %1 %1
}

We now get the result more type safe, with:

define %struct.foo @test(%struct.foo* %P) nounwind {
entry:
  %retval = alloca %struct.foo, align 8           ; <%struct.foo*> [#uses=2]
  %P.addr = alloca %struct.foo*, align 8          ; <%struct.foo**> [#uses=2]
  store %struct.foo* %P, %struct.foo** %P.addr
  %tmp = load %struct.foo** %P.addr               ; <%struct.foo*> [#uses=1]
  %tmp1 = bitcast %struct.foo* %retval to i8*     ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.foo* %tmp to i8*        ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 16, i32 8, i1 false)
  %0 = load %struct.foo* %retval                  ; <%struct.foo> [#uses=1]
  ret %struct.foo %0
}

That memcpy is completely terrible, but I don't know how to fix it.

llvm-svn: 109729
2010-07-29 04:46:19 +00:00
Chris Lattner f4ba08aeaf pass argument vectors in a type that corresponds to the user type if
possible.  This improves the example to pass <4 x float> instead of
<2 x double> but we still get awful code, and still don't get the
return value right.

llvm-svn: 109700
2010-07-28 23:47:21 +00:00
Chris Lattner 31faff5d58 use Get8ByteTypeAtOffset for the return value path as well so we
don't get errors similar to PR7714 on the return path.

llvm-svn: 109689
2010-07-28 23:06:14 +00:00
Chris Lattner 4c1e484f39 fix PR7714 by not referencing off the end of a struct when passed by value in
x86-64 abi.  This also improves codegen as well.  Some refactoring is needed of
this code.

llvm-svn: 109681
2010-07-28 22:15:08 +00:00
Chris Lattner c401de9998 in the "coerce" case, the ABI handling code ends up making the
alloca for an argument.  Make sure the argument gets the proper
decl alignment, which may be different than the type alignment.

This fixes PR7567

llvm-svn: 107627
2010-07-05 20:21:00 +00:00
Chris Lattner 22a931e3bb Change X86_64ABIInfo to have ASTContext and TargetData ivars to
avoid passing ASTContext down through all the methods it has.

When classifying an argument, or argument piece, as INTEGER, check
to see if we have a pointer at exactly the same offset in the 
preferred type.  If so, use that pointer type instead of i64.  This
allows us to compile A function taking a stringref into something
like this:

define i8* @foo(i64 %D.coerce0, i8* %D.coerce1) nounwind ssp {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=4]
  %0 = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  store i64 %D.coerce0, i64* %0
  %1 = getelementptr %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  store i8* %D.coerce1, i8** %1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}

instead of this:

define i8* @foo(i64 %D.coerce0, i64 %D.coerce1) nounwind ssp {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  %0 = insertvalue %0 undef, i64 %D.coerce0, 0    ; <%0> [#uses=1]
  %1 = insertvalue %0 %0, i64 %D.coerce1, 1       ; <%0> [#uses=1]
  %2 = bitcast %struct.DeclGroup* %D to %0*       ; <%0*> [#uses=1]
  store %0 %1, %0* %2, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}

This implements rdar://7375902 - [codegen quality] clang x86-64 ABI lowering code punishing StringRef

llvm-svn: 107123
2010-06-29 06:01:59 +00:00
Chris Lattner 9e748e9d6e add IR names to coerced arguments.
llvm-svn: 107105
2010-06-29 00:14:52 +00:00
Chris Lattner 3dd716c3c3 Change CGCall to handle the "coerce" case where the coerce-to type
is a FCA to pass each of the elements as individual scalars.  This
produces code fast isel is less likely to reject and is easier on
the optimizers.

For example, before we would compile:
struct DeclGroup { long NumDecls; char * Y; };
char * foo(DeclGroup D) {
  return D.NumDecls+D.Y;
}

to:
%struct.DeclGroup = type { i64, i64 }

define i64 @_Z3foo9DeclGroup(%struct.DeclGroup) nounwind {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  store %struct.DeclGroup %0, %struct.DeclGroup* %D, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i64*> [#uses=1]
  %tmp3 = load i64* %tmp2                         ; <i64> [#uses=1]
  %add = add nsw i64 %tmp1, %tmp3                 ; <i64> [#uses=1]
  ret i64 %add
}

Now we get:

%0 = type { i64, i64 }
%struct.DeclGroup = type { i64, i8* }

define i8* @_Z3foo9DeclGroup(i64, i64) nounwind {
entry:
  %D = alloca %struct.DeclGroup, align 8          ; <%struct.DeclGroup*> [#uses=3]
  %2 = insertvalue %0 undef, i64 %0, 0            ; <%0> [#uses=1]
  %3 = insertvalue %0 %2, i64 %1, 1               ; <%0> [#uses=1]
  %4 = bitcast %struct.DeclGroup* %D to %0*       ; <%0*> [#uses=1]
  store %0 %3, %0* %4, align 1
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i64*> [#uses=1]
  %tmp1 = load i64* %tmp                          ; <i64> [#uses=1]
  %tmp2 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 1 ; <i8**> [#uses=1]
  %tmp3 = load i8** %tmp2                         ; <i8*> [#uses=1]
  %add.ptr = getelementptr inbounds i8* %tmp3, i64 %tmp1 ; <i8*> [#uses=1]
  ret i8* %add.ptr
}

Elimination of the FCA inside the function is still-to-come.

llvm-svn: 107099
2010-06-28 23:44:11 +00:00
Chris Lattner a7d81ab7f3 X86-64:
pass/return structs of float/int as float/i32 instead of double/i64
to make the code generated for ABI cleaner.  Passing in the low part
of a double is the same as passing in a float.

For example, we now compile:

struct DeclGroup { float NumDecls; };
float foo(DeclGroup D);
void bar(DeclGroup *D) {
 foo(*D);
}

into:

%struct.DeclGroup = type { float }

define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) nounwind {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <float*> [#uses=1]
  %0 = load float* %coerce.dive, align 1          ; <float> [#uses=1]
  %call = call float @_Z3foo9DeclGroup(float %0)  ; <float> [#uses=0]
  ret void
}

instead of:

%struct.DeclGroup = type { float }

define void @_Z3barP9DeclGroup(%struct.DeclGroup* %D) nounwind {
entry:
  %D.addr = alloca %struct.DeclGroup*, align 8    ; <%struct.DeclGroup**> [#uses=2]
  %agg.tmp = alloca %struct.DeclGroup, align 4    ; <%struct.DeclGroup*> [#uses=2]
  %tmp3 = alloca double                           ; <double*> [#uses=2]
  store %struct.DeclGroup* %D, %struct.DeclGroup** %D.addr
  %tmp = load %struct.DeclGroup** %D.addr         ; <%struct.DeclGroup*> [#uses=1]
  %tmp1 = bitcast %struct.DeclGroup* %agg.tmp to i8* ; <i8*> [#uses=1]
  %tmp2 = bitcast %struct.DeclGroup* %tmp to i8*  ; <i8*> [#uses=1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* %tmp2, i64 4, i32 4, i1 false)
  %coerce.dive = getelementptr %struct.DeclGroup* %agg.tmp, i32 0, i32 0 ; <float*> [#uses=1]
  %0 = bitcast double* %tmp3 to float*            ; <float*> [#uses=1]
  %1 = load float* %coerce.dive                   ; <float> [#uses=1]
  store float %1, float* %0, align 1
  %2 = load double* %tmp3                         ; <double> [#uses=1]
  %call = call float @_Z3foo9DeclGroup(double %2) ; <float> [#uses=0]
  ret void
}

which is this machine code (at -O0):

__Z3barP9DeclGroup:
	subq	$24, %rsp
	movq	%rdi, 16(%rsp)
	movq	16(%rsp), %rdi
	leaq	8(%rsp), %rax
	movl	(%rdi), %ecx
	movl	%ecx, (%rax)
	movss	8(%rsp), %xmm0
	callq	__Z3foo9DeclGroup
	addq	$24, %rsp
	ret

vs this:

__Z3barP9DeclGroup:
	subq	$24, %rsp
	movq	%rdi, 16(%rsp)
	movq	16(%rsp), %rdi
	leaq	8(%rsp), %rax
	movl	(%rdi), %ecx
	movl	%ecx, (%rax)
	movss	8(%rsp), %xmm0
	movss	%xmm0, (%rsp)
	movsd	(%rsp), %xmm0
	callq	__Z3foo9DeclGroup
	addq	$24, %rsp
	ret

At -O3, it is the difference between this now:

__Z3barP9DeclGroup:
	movss	(%rdi), %xmm0
	jmp	__Z3foo9DeclGroup  # TAILCALL

vs this before:

__Z3barP9DeclGroup:
	movl	(%rdi), %eax
	movd	%rax, %xmm0
	jmp	__Z3foo9DeclGroup  # TAILCALL

llvm-svn: 107048
2010-06-28 19:56:59 +00:00
Chris Lattner 055097f024 If coercing something from int or pointer type to int or pointer type
(potentially after unwrapping it from a struct) do it without going through
memory.  We now compile:

struct DeclGroup {
  unsigned NumDecls;
};

int foo(DeclGroup D) {
  return D.NumDecls;
}

into:

%struct.DeclGroup = type { i32 }

define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %coerce.dive = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %coerce.val.ii = trunc i64 %0 to i32            ; <i32> [#uses=1]
  store i32 %coerce.val.ii, i32* %coerce.dive
  %tmp = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp1 = load i32* %tmp                          ; <i32> [#uses=1]
  ret i32 %tmp1
}

instead of:

%struct.DeclGroup = type { i32 }

define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %tmp = alloca i64                               ; <i64*> [#uses=2]
  %coerce.dive = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  store i64 %0, i64* %tmp
  %1 = bitcast i64* %tmp to i32*                  ; <i32*> [#uses=1]
  %2 = load i32* %1, align 1                      ; <i32> [#uses=1]
  store i32 %2, i32* %coerce.dive
  %tmp1 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp2 = load i32* %tmp1                         ; <i32> [#uses=1]
  ret i32 %tmp2
}

... which is quite a bit less terrifying.

llvm-svn: 106975
2010-06-27 06:26:04 +00:00
Chris Lattner 895c52ba8b Same patch as the previous on the store side. Before we compiled this:
struct DeclGroup {
  unsigned NumDecls;
};

int foo(DeclGroup D) {
  return D.NumDecls;
}

to:

%struct.DeclGroup = type { i32 }

define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %tmp = alloca i64                               ; <i64*> [#uses=2]
  store i64 %0, i64* %tmp
  %1 = bitcast i64* %tmp to %struct.DeclGroup*    ; <%struct.DeclGroup*> [#uses=1]
  %2 = load %struct.DeclGroup* %1, align 1        ; <%struct.DeclGroup> [#uses=1]
  store %struct.DeclGroup %2, %struct.DeclGroup* %D
  %tmp1 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp2 = load i32* %tmp1                         ; <i32> [#uses=1]
  ret i32 %tmp2
}

which caused fast isel bailouts due to the FCA load/store of %2.  Now
we generate this just blissful code:

%struct.DeclGroup = type { i32 }

define i32 @_Z3foo9DeclGroup(i64) nounwind ssp noredzone {
entry:
  %D = alloca %struct.DeclGroup, align 4          ; <%struct.DeclGroup*> [#uses=2]
  %tmp = alloca i64                               ; <i64*> [#uses=2]
  %coerce.dive = getelementptr %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  store i64 %0, i64* %tmp
  %1 = bitcast i64* %tmp to i32*                  ; <i32*> [#uses=1]
  %2 = load i32* %1, align 1                      ; <i32> [#uses=1]
  store i32 %2, i32* %coerce.dive
  %tmp1 = getelementptr inbounds %struct.DeclGroup* %D, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp2 = load i32* %tmp1                         ; <i32> [#uses=1]
  ret i32 %tmp2
}

This avoids fastisel bailing out and is groundwork for future patch.
This reduces bailouts on CGStmt.ll to 911 from 935.

llvm-svn: 106974
2010-06-27 06:04:18 +00:00
Daniel Dunbar 53fac692fa ABI/x86-32 & x86-64: Alignment on 'byval' must be set when when the alignment
exceeds the minimum ABI alignment.

llvm-svn: 102019
2010-04-21 19:49:55 +00:00
Daniel Dunbar 14ec60024c Convert test to FileCheck.
llvm-svn: 102016
2010-04-21 19:10:54 +00:00
Chris Lattner 9cffdf1331 don't slap noalias attribute on stret result arguments.
This mirror's Dan's patch for llvm-gcc in r97989, and
fixes the miscompilation in PR6525.  There is some contention
over whether this is the right thing to do, but it is the
conservative answer and demonstrably fixes a miscompilation.

llvm-svn: 101877
2010-04-20 05:44:43 +00:00
Daniel Dunbar 8fbe78f6fc Update tests to use %clang_cc1 instead of 'clang-cc' or 'clang -cc1'.
- This is designed to make it obvious that %clang_cc1 is a "test variable"
   which is substituted. It is '%clang_cc1' instead of '%clang -cc1' because it
   can be useful to redefine what gets run as 'clang -cc1' (for example, to set
   a default target).

llvm-svn: 91446
2009-12-15 20:14:24 +00:00
Daniel Dunbar 34546ce43d Remove RUN: true lines.
llvm-svn: 86432
2009-11-08 01:47:25 +00:00
Daniel Dunbar 8b57697954 Eliminate &&s in tests.
- 'for i in $(find . -type f); do sed -e 's#\(RUN:.*[^ ]\) *&& *$#\1#g' $i | FileUpdate $i; done', for the curious.

llvm-svn: 86430
2009-11-08 01:45:36 +00:00
Daniel Dunbar 87db734400 Fix a few tests to be -Asserts agnostic.
- Ugh.

llvm-svn: 79860
2009-08-23 19:28:59 +00:00
Daniel Dunbar e5515289fa Update test
llvm-svn: 78877
2009-08-13 01:27:45 +00:00
Mike Stump 5e7869f63e Prep for new warning.
llvm-svn: 76638
2009-07-21 20:52:43 +00:00
Daniel Dunbar 4be99ff767 ABI handling: Fix nasty thinko where IRgen could generate an out-of-bounds read
when generating a coercion for ABI handling purposes.
 - This may only manifest itself when building at -O0, but the practical effect
   is that other arguments may get clobbered.

 - <rdar://problem/6930451> [irgen] ABI coercion clobbers other arguments

llvm-svn: 72932
2009-06-05 07:58:54 +00:00
Daniel Dunbar 1518b64ddc When trying to pass an argument on the stack, assume LLVM will do the right
thing for non-aggregate types.
 - Otherwise we unnecessarily pin values to the stack and currently end up
   triggering a backend bug in one case.

 - This loose cooperation with LLVM to implement the ABI is pretty ugly.

 - <rdar://problem/6918722> [irgen] clang miscompile of many pointer varargs on
   x86-64

llvm-svn: 72419
2009-05-26 16:37:37 +00:00
Daniel Dunbar 499f3f9c5d x86_64 ABI: Account for sret parameters consuming an integer register.
- PR4242.

llvm-svn: 72268
2009-05-22 17:33:44 +00:00
Daniel Dunbar ffdb8439d7 ABI handling: Fix invalid assertion, it is possible for a valid
coercion to be specified which truncates padding bits. It would be
nice to still have the assert, but we don't have any API call for the
unpadding size of a type yet.

llvm-svn: 71695
2009-05-13 18:54:26 +00:00
Daniel Dunbar 203e2e8dd8 x86-64 ABI: clang incorrectly passes union { long double, float } in
register.
 - Merge algorithm was returning MEMORY as it should.

llvm-svn: 71556
2009-05-12 15:22:40 +00:00
Daniel Dunbar b997f3bcc3 x86_64 ABI: Ignore padding bit-fields during classification.
- {return-types,single-args}-{32,64} pass the first 1k ABI tests with
   bit-fields enabled.

llvm-svn: 71272
2009-05-08 22:26:44 +00:00
Daniel Dunbar a45cf5b6b0 Rename clang to clang-cc.
Tests and drivers updated, still need to shuffle dirs.

llvm-svn: 67602
2009-03-24 02:24:46 +00:00
Daniel Dunbar 94911a91d5 x86_64 ABI: Handle long double in union when upper eightbyte results
in a lone X87 class.
 - PR3735.

llvm-svn: 66277
2009-03-06 17:50:25 +00:00
Mike Stump 92000e54ac Add end of line at end.
llvm-svn: 65557
2009-02-26 19:00:14 +00:00
Anders Carlsson 43e8a2b36e Add test for enum types
llvm-svn: 65540
2009-02-26 17:38:19 +00:00
Daniel Dunbar 019ef0bbfe x86_64 ABI: Pass simple types directly when possible. This is
important for both keeping the generated LLVM simple and for ensuring
that integer types are passed/promoted correctly.

llvm-svn: 64529
2009-02-14 02:09:24 +00:00