This is a patch for PR22563 ( http://llvm.org/bugs/show_bug.cgi?id=22563 ).
We were not correctly unwrapping a single 256-bit AVX vector that was defined as an array of 1 inside a struct.
We would generate a <4 x float> param/return value instead of <8 x float> and lose half of the vector.
Differential Revision: http://reviews.llvm.org/D7614
llvm-svn: 229408
We were emitting calls to blocks as if all arguments were
required --- i.e. with signature (A,B,C,D,...) rather than
(A,B,...). This patch fixes that and accounts for the
implicit block-context argument as a required argument.
In addition, this patch changes the function type under which
we call unprototyped functions on platforms like x86-64 that
guarantee compatibility of variadic functions with unprototyped
function types; previously we would always call such functions
under the LLVM type T (...)*, but now we will call them under
the type T (A,B,C,D,...)*. This last change should have no
material effect except for making the type conventions more
explicit; it was a side-effect of the most convenient implementation.
llvm-svn: 169588
the original parameter or return type.
Since we do not accurately represent the data fields of a union, we should not
directly load or store a union type.
As an exmple, if we have i8,i8, i32, i32 as one field type and i32,i32 as
another field type, the first field type will be chosen to represent the union.
If we load with the union's type, the 3rd byte and the 4th byte will be skipped.
rdar://12723368
llvm-svn: 168820
- We do this when it is easy to determine that the backend will pass them on
the stack properly by itself.
Currently LLVM codegen is really bad in some cases with byval, for example, on
the test case here (which is derived from Sema code, which likes to pass
SourceLocations around)::
struct s47 { unsigned a; };
void f47(int,int,int,int,int,int,struct s47);
void test47(int a, struct s47 b) { f47(a, a, a, a, a, a, b); }
we used to emit code like this::
...
movl %esi, -8(%rbp)
movl -8(%rbp), %ecx
movl %ecx, (%rsp)
...
to handle moving the struct onto the stack, which is just appalling.
Now we generate::
movl %esi, (%rsp)
which seems better, no?
llvm-svn: 152462
emit call results into potentially aliased slots. This allows us
to properly mark indirect return slots as noalias, at the cost
of requiring an extra memcpy when assigning an aggregate call
result into a l-value. It also brings us into compliance with
the x86-64 ABI.
llvm-svn: 138599
stuff like this:
typedef struct {
int x, y, z;
} foo_t;
foo_t g;
into:
%"struct.<anonymous>" = type { i32, i32, i32 }
we now get:
%struct.foo_t = type { i32, i32, i32 }
This doesn't change the behavior of the compiler, but makes the IR much easier to read.
llvm-svn: 134969
generator will give it something sufficient. This is important because
the mid-level optimizer doesn't know what alignment is required otherwise.
llvm-svn: 131879
have a "coerce to" type which often matches the default lowering of Clang
type to LLVM IR type, but the coerce case can be handled by making them
not be the same.
This simplifies things and fixes issues where X86-64 abi lowering would
return coerce after making preferred types exactly match up. This caused
us to compile:
typedef float v4f32 __attribute__((__vector_size__(16)));
v4f32 foo(v4f32 X) {
return X+X;
}
into this code at -O0:
define <4 x float> @foo(<4 x float> %X.coerce) nounwind {
entry:
%retval = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%coerce = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=2]
%X.addr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=3]
store <4 x float> %X.coerce, <4 x float>* %coerce
%X = load <4 x float>* %coerce ; <<4 x float>> [#uses=1]
store <4 x float> %X, <4 x float>* %X.addr
%tmp = load <4 x float>* %X.addr ; <<4 x float>> [#uses=1]
%tmp1 = load <4 x float>* %X.addr ; <<4 x float>> [#uses=1]
%add = fadd <4 x float> %tmp, %tmp1 ; <<4 x float>> [#uses=1]
store <4 x float> %add, <4 x float>* %retval
%0 = load <4 x float>* %retval ; <<4 x float>> [#uses=1]
ret <4 x float> %0
}
Now we get:
define <4 x float> @foo(<4 x float> %X) nounwind {
entry:
%X.addr = alloca <4 x float>, align 16 ; <<4 x float>*> [#uses=3]
store <4 x float> %X, <4 x float>* %X.addr
%tmp = load <4 x float>* %X.addr ; <<4 x float>> [#uses=1]
%tmp1 = load <4 x float>* %X.addr ; <<4 x float>> [#uses=1]
%add = fadd <4 x float> %tmp, %tmp1 ; <<4 x float>> [#uses=1]
ret <4 x float> %add
}
This implements rdar://8248065
llvm-svn: 109733
Before we'd compile the example into something like:
%coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
%1 = bitcast <4 x float>* %coerce.dive2 to <2 x double>* ; <<2 x double>*> [#uses=1]
%2 = load <2 x double>* %1, align 1 ; <<2 x double>> [#uses=1]
ret <2 x double> %2
Now we produce:
%coerce.dive2 = getelementptr %struct.v4f32wrapper* %retval, i32 0, i32 0 ; <<4 x float>*> [#uses=1]
%0 = load <4 x float>* %coerce.dive2, align 1 ; <<4 x float>> [#uses=1]
ret <4 x float> %0
llvm-svn: 109732
possible. This improves the example to pass <4 x float> instead of
<2 x double> but we still get awful code, and still don't get the
return value right.
llvm-svn: 109700
alloca for an argument. Make sure the argument gets the proper
decl alignment, which may be different than the type alignment.
This fixes PR7567
llvm-svn: 107627