generator will give it something sufficient. This is important because
the mid-level optimizer doesn't know what alignment is required otherwise.
llvm-svn: 131879
(__m128){ p[0], p[1], p[2], p[3] }
which produces really bad code. This could be done in instcombine, but it's
probably better to do it in the front-end instead.
<rdar://problem/9424836>
llvm-svn: 131237
Adjacent bit fields are packed into the same 1-, 2-, or
4-byte allocation unit if the integral types are the same
size. // rdar://8823265.
llvm-svn: 130851
it down. we effectively were compile the testcase into:
void test14(int x) {
switch (x) {
case 11: break;
case 42: test14(97); // fallthrough
default: test14(42); break;
which is not the same thing at all. This fixes a miscompilation of
MallocBench/gs seen on the clang-x86_64-linux-fnt buildbot.
llvm-svn: 129679
are trivial. This exposes opportunities earlier, and allows fastisel
to do good things with these at -O0.
This addresses rdar://9289468 - clang doesn't fold memset_chk at -O0
llvm-svn: 129651
by making the isCheapEnoughToEvaluateUnconditionally predicate handle anything that folds to a constant. In particular, we now fold enums.
llvm-svn: 129649
AAPCS+VFP), similar to fastcall / stdcall / whatevercall seen on x86.
In particular, all library functions should always be AAPCS regardless of floating point ABI used.
llvm-svn: 129534
the array alignment to the array access.
- This is more or less the best we can do without having alignment present in
the type system, but is a long way from truly matching how GCC handles this.
llvm-svn: 128691
platform implies default visibility. To achieve these, refactor our
lookup of explicit visibility so that we search for both an explicit
VisibilityAttr and an appropriate AvailabilityAttr, favoring the
VisibilityAttr if it is present.
llvm-svn: 128336
clobber with the 'y' constraint. Otherwise, we get the wrong return type and an
assert, because it created a '<1 x i64>' vector type instead of the x86_mmx
type.
llvm-svn: 127185
they are known to be exact multiples of the width of the char type. Add a
test case to CodeGen/union.c that would have caught the problem with the
previous attempt. No change in functionality intended.
llvm-svn: 126628
live case of a switch statement when switching on a constant. This is terribly
limited, but enough to handle the trivial example included. Before we would
emit:
define void @test1(i32 %i) nounwind {
entry:
%i.addr = alloca i32, align 4
store i32 %i, i32* %i.addr, align 4
switch i32 1, label %sw.epilog [
i32 1, label %sw.bb
]
sw.bb: ; preds = %entry
%tmp = load i32* %i.addr, align 4
%inc = add nsw i32 %tmp, 1
store i32 %inc, i32* %i.addr, align 4
br label %sw.epilog
sw.epilog: ; preds = %sw.bb, %entry
switch i32 0, label %sw.epilog3 [
i32 1, label %sw.bb1
]
sw.bb1: ; preds = %sw.epilog
%tmp2 = load i32* %i.addr, align 4
%add = add nsw i32 %tmp2, 2
store i32 %add, i32* %i.addr, align 4
br label %sw.epilog3
sw.epilog3: ; preds = %sw.bb1, %sw.epilog
ret void
}
now we emit:
define void @test1(i32 %i) nounwind {
entry:
%i.addr = alloca i32, align 4
store i32 %i, i32* %i.addr, align 4
%tmp = load i32* %i.addr, align 4
%inc = add nsw i32 %tmp, 1
store i32 %inc, i32* %i.addr, align 4
ret void
}
This improves -O0 compile time (less IR to generate and shove through the code
generator) and the clever linux kernel people found a way to fail to build if we
don't do this optimization. This step isn't enough to handle the kernel case
though.
llvm-svn: 126597
When the mismatch is due to a larger input operand that is
a constant, truncate it down to the size of the output. This
allows us to accept some cases in the linux kernel and elsewhere.
Pedantically speaking, we generate different code than GCC, though
I can't imagine how it would matter:
Clang:
movb $-1, %al
frob %al
GCC:
movl $255, %eax
frob %al
llvm-svn: 126148
designators: allowing codegen when the element initializer is a
constant or something else without a side effect. This unblocks
enough to let process.c in the linux kernel build, PR9257.
llvm-svn: 126056
without defining them. This should be an error, but I'm paranoid about
"uses" that end up not actually requiring a definition. I'll revisit later.
Also, teach IR generation to not set internal linkage on variable
declarations, just for safety's sake. Doing so produces an invalid module
if the variable is not ultimately defined.
Also, fix several places in the test suite where we were using internal
functions without definitions.
llvm-svn: 126016
- BlockDeclRefExprs always store VarDecls
- BDREs no longer store copy expressions
- BlockDecls now store a list of captured variables, information about
how they're captured, and a copy expression if necessary
With that in hand, change IR generation to use the captures data in
blocks instead of walking the block independently.
Additionally, optimize block layout by emitting fields in descending
alignment order, with a heuristic for filling in words when alignment
of the end of the block header is insufficient for the most aligned
field.
llvm-svn: 125005
compatible, not having the same type.
Fix rdar://8183908 in which compatible vector types weren't initialized properly leading to a crash.
llvm-svn: 124637
delete the block we began emitting into if it had no predecessors. We never
want to do this, because there are several valid cases during statement
emission where an existing block has no known predecessors but will acquire
some later. The case in my test case doesn't inherently fall into this
category, because we could safely emit the case-range code before the statement
body, but there are examples with labels that can't be fallen into
that would also demonstrate this bug.
rdar://problem/8837067
llvm-svn: 123303
Fix the width and align of bool type on Darwin to be 32bits
while keeping it 8 everywhere else.
Change the definition of va_list to default to SV4 ABI one
and let darwin subtarget override this.
Both changes submitted by Nathan Whitehorn and reviewed
by Rafael Espindola.
llvm-svn: 122956
in asm statements:
register int foo asm("rdi");
asm("..." : ... "r" (foo) ...
We also only accept these variables if the constraint in the asm statement is "r".
This fixes most of PR3933.
llvm-svn: 122643
16-bits in size. Implement this by splitting WChar into two enums, like we have
for char. This fixes a miscompmilation of XULRunner, PR8856.
llvm-svn: 122558
not actually frequently used, because ImpCastExprToType only creates a node
if the types differ. So explicitly create an ICE in the lvalue-to-rvalue
conversion code in DefaultFunctionArrayLvalueConversion() as well as several
other new places, and consistently deal with the consequences throughout the
compiler.
In addition, introduce a new cast kind for loading an ObjCProperty l-value,
and make sure we emit those nodes whenever an ObjCProperty l-value appears
that's not on the LHS of an assignment operator.
This breaks a couple of rewriter tests, which I've x-failed until future
development occurs on the rewriter.
Ted Kremenek kindly contributed the analyzer workarounds in this patch.
llvm-svn: 120890
when an initializer is variable (I handled the constant case in a previous
patch). This has three pieces:
1. Enhance AggValueSlot to have a 'isZeroed' bit to tell CGExprAgg that
the memory being stored into has previously been memset to zero.
2. Teach CGExprAgg to not emit stores of zero to isZeroed memory.
3. Teach CodeGenFunction::EmitAggExpr to scan initializers to determine
whether they are profitable to emit a memset + inividual stores vs
stores for everything.
The heuristic used is that a global has to be more than 16 bytes and
has to be 3/4 zero to be candidate for this xform. The two testcases
are illustrative of the scenarios this catches. We now codegen test9 into:
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 400, i32 4, i1 false)
%.array = getelementptr inbounds [100 x i32]* %Arr, i32 0, i32 0
%tmp = load i32* %X.addr, align 4
store i32 %tmp, i32* %.array
and test10 into:
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 392, i32 8, i1 false)
%tmp = getelementptr inbounds %struct.b* %S, i32 0, i32 0
%tmp1 = getelementptr inbounds %struct.a* %tmp, i32 0, i32 0
%tmp2 = load i32* %X.addr, align 4
store i32 %tmp2, i32* %tmp1, align 4
%tmp5 = getelementptr inbounds %struct.b* %S, i32 0, i32 3
%tmp10 = getelementptr inbounds %struct.a* %tmp5, i32 0, i32 4
%tmp11 = load i32* %X.addr, align 4
store i32 %tmp11, i32* %tmp10, align 4
Previously we produced 99 stores of zero for test9 and also tons for test10.
This xforms should substantially speed up -O0 builds when it kicks in as well
as reducing code size and optimizer heartburn on insane cases. This resolves
PR279.
llvm-svn: 120692
a global is larger than 32 bytes and has fewer than 6 non-zero values in the
initializer. Previously we'd turn something like this:
char test8(int X) {
char str[10000] = "abc";
into a 10K global variable which we then memcpy'd from. Now we generate:
%str = alloca [10000 x i8], align 16
%tmp = getelementptr inbounds [10000 x i8]* %str, i64 0, i64 0
call void @llvm.memset.p0i8.i64(i8* %tmp, i8 0, i64 10000, i32 16, i1 false)
store i8 97, i8* %tmp, align 16
%0 = getelementptr [10000 x i8]* %str, i64 0, i64 1
store i8 98, i8* %0, align 1
%1 = getelementptr [10000 x i8]* %str, i64 0, i64 2
store i8 99, i8* %1, align 2
Which is much smaller in space and also likely faster.
This is part of PR279
llvm-svn: 120645
I mistakenly thought that this was checking for vector name mangling, but
it is not. Since we're no longer wrapping Neon vectors in structs, this
test can just return a vector directly. There are already other tests for
that, so just to make this interesting, change the test to return a struct
of two vectors.
llvm-svn: 119434
assignment to volatiles in C. This in effect reverts some of mjs's
work in and around r72572. Basically, the C++ standard is quite
clear, except that it lies about volatile behavior approximating
C's, whereas the C standard is almost actively misleading.
llvm-svn: 119344
Return the result of a complex assignment with the original values,
not by performing a load from the l-value; this is the correct
semantics in C, although not in C++.
llvm-svn: 119037
- tags with C linkage should ignore visibility=hidden
- functions and variables with explicit visibility attributes should
ignore the linkage of their types
Either of these should be sufficient to fix PR8457.
Also, FileCheck-ize a test case.
llvm-svn: 117351
completes support for C1X anonymous struct/union init features:
* Indexed anonymous member initializers should not be expanded. Doing so makes
little sense and would cause unresolvable semantic ambiguity in valid code
(regression introduced by r69153).
* Subobject initialization of (possibly nested) anonymous members are now
referred to with paths relative to the naming record context, eliminating the
synthesis of incorrect implicit InitListExprs that caused CodeGen to assert.
* Field lookup was missing a null check in IdentifierInfo comparison which
caused lookup for a known (already resolved) field to match the first unnamed
data member it encountered leading to silent miscompilation.
* Subobject paths are no longer built using the general purpose
Sema::BuildAnonymousStructUnionMemberPath(). If any corner cases crop up, we
will now assert earlier in Sema instead of passing invalid InitListExprs
through to CodeGen.
Fixes PR6955, from Alp Toker!
llvm-svn: 116098
- Therefore, we can lower out the NEON wrapper structs and pass the vectors
directly. This makes a huge difference in the cleanliness of the IR after
optimization.
- I will trust, but verify, via future ABITest testing (for APCS-GNU, at
least).
llvm-svn: 114618
LHS and when conditional expression is an array. Since
it will be decayed, saved expression must be saved with
decayed expression. This is necessary to preserve semantics
of this extension (and prevent an IRGen crash which expects
an array to always be decayed). I am sure there will be other
cases in c++ (aggregate conditionals for example) when saving of the
expression must happen after some transformation on conditional
expression has happened.
Doug, please review. Fixes // rdar://8446940
llvm-svn: 114296