Commit Graph

35 Commits

Author SHA1 Message Date
JF Bastien 6508929da9 CodeGen: use non-zero memset when possible for automatic variables
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.

This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.

<rdar://problem/42563091>

Reviewers: dexonsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D49771

llvm-svn: 337887
2018-07-25 04:29:03 +00:00
Richard Smith 4c6568869e Fix typo causing assert in self-host.
llvm-svn: 337508
2018-07-19 23:24:41 +00:00
Richard Smith 83497d9ead When we choose to use zeroinitializer for a trailing portion of an array
constant, don't convert the rest into a packed struct.

If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.

llvm-svn: 337498
2018-07-19 21:38:56 +00:00
JF Bastien 9aab85a6a0 CodeGen: specify alignment + inbounds for automatic variable initialization
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.

Subscribers: dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D49209

llvm-svn: 337041
2018-07-13 20:33:23 +00:00
Richard Smith 3e268632cf Use zeroinitializer for (trailing zero portion of) large array initializers
more reliably.

This re-commits r333044 with a fix for PR37560.

llvm-svn: 333141
2018-05-23 23:41:38 +00:00
Hans Wennborg 156349fa10 Revert r333044 "Use zeroinitializer for (trailing zero portion of) large array initializers"
It caused asserts, see PR37560.

> Use zeroinitializer for (trailing zero portion of) large array initializers
> more reliably.
>
> Clang has two different ways it emits array constants (from InitListExprs and
> from APValues), and both had some ability to emit zeroinitializer, but neither
> was able to catch all cases where we could use zeroinitializer reliably. In
> particular, emitting from an APValue would fail to notice if all the explicit
> array elements happened to be zero. In addition, for large arrays where only an
> initial portion has an explicit initializer, we would emit the complete
> initializer (which could be huge) rather than emitting only the non-zero
> portion. With this change, when the element would have a suffix of more than 8
> zero elements, we emit the array constant as a packed struct of its initial
> portion followed by a zeroinitializer constant for the trailing zero portion.
>
> In passing, I found a bug where SemaInit would sometimes walk the entire array
> when checking an initializer that only covers the first few elements; that's
> fixed here to unblock testing of the rest.
>
> Differential Revision: https://reviews.llvm.org/D47166

llvm-svn: 333067
2018-05-23 08:24:01 +00:00
Richard Smith 9062bbf419 Use zeroinitializer for (trailing zero portion of) large array initializers
more reliably.

Clang has two different ways it emits array constants (from InitListExprs and
from APValues), and both had some ability to emit zeroinitializer, but neither
was able to catch all cases where we could use zeroinitializer reliably. In
particular, emitting from an APValue would fail to notice if all the explicit
array elements happened to be zero. In addition, for large arrays where only an
initial portion has an explicit initializer, we would emit the complete
initializer (which could be huge) rather than emitting only the non-zero
portion. With this change, when the element would have a suffix of more than 8
zero elements, we emit the array constant as a packed struct of its initial
portion followed by a zeroinitializer constant for the trailing zero portion.

In passing, I found a bug where SemaInit would sometimes walk the entire array
when checking an initializer that only covers the first few elements; that's
fixed here to unblock testing of the rest.

Differential Revision: https://reviews.llvm.org/D47166

llvm-svn: 333044
2018-05-23 00:09:29 +00:00
Ivan A. Kosarev e0ef348cb9 [CodeGen] Initialize large arrays by copying from a global
Currently, clang compiles explicit initializers for array
elements into series of store instructions. For large arrays of
built-in types this results in bloated output code and
significant amount of time spent on the instruction selection
phase. This patch fixes the issue by initializing such arrays
with global constants that store the binary image of the
initializer.

Differential Revision: https://reviews.llvm.org/D43181

llvm-svn: 325478
2018-02-19 09:49:11 +00:00
Alexey Bataev 86a489e4f3 Fixed processing of GNU extensions to C99 designated initializers
Clang did not handles correctly inner parts of arrays/structures initializers in GNU extensions to C99 designated initializers. 

llvm-svn: 258668
2016-01-25 05:14:03 +00:00
David Blaikie bdf40a62a7 Test case updates for explicit type parameter to the gep operator
llvm-svn: 232187
2015-03-13 18:21:46 +00:00
David Blaikie 81e96f8192 Update test case to make it easier to automatically port to typeless pointer gep operator changes coming soon
llvm-svn: 232185
2015-03-13 18:21:11 +00:00
Richard Smith 00db2f13f8 PR20473: Don't "deduplicate" string literals with the same value but different
lengths! In passing, simplify string literal deduplication by relying on LLVM
to deduplicate the underlying constant values.

llvm-svn: 214222
2014-07-29 21:20:12 +00:00
Chandler Carruth ff0e3a1e1c Rework the bitfield access IR generation to address PR13619 and
generally support the C++11 memory model requirements for bitfield
accesses by relying more heavily on LLVM's memory model.

The primary change this introduces is to move from a manually aligned
and strided access pattern across the bits of the bitfield to a much
simpler lump access of all bits in the bitfield followed by math to
extract the bits relevant for the particular field.

This simplifies the code significantly, but relies on LLVM to
intelligently lowering these integers.

I have tested LLVM's lowering both synthetically and in benchmarks. The
lowering appears to be functional, and there are no really significant
performance regressions. Different code patterns accessing bitfields
will vary in how this impacts them. The only real regressions I'm seeing
are a few patterns where the LLVM code generation for loads that feed
directly into a mask operation don't take advantage of the x86 ability
to do a smaller load and a cheap zero-extension. This doesn't regress
any benchmark in the nightly test suite on my box past the noise
threshold, but my box is quite noisy. I'll be watching the LNT numbers,
and will look into further improvements to the LLVM lowering as needed.

llvm-svn: 169489
2012-12-06 11:14:44 +00:00
Benjamin Kramer 46cbe77b49 CodeGen: When emitting stores for an initializer, only emit a GEP if we really need the store.
This avoids emitting many dead GEPs for large zero-initialized arrays.

llvm-svn: 162701
2012-08-27 21:35:58 +00:00
Eli Friedman cb3785e450 Fix a stupid mistake in r151133. Reported to me by Joerg Sonnenberger.
llvm-svn: 151407
2012-02-24 23:53:49 +00:00
Chris Lattner 8806e32f16 Rename CGT::VerifyFuncTypeComplete to isFuncTypeConvertible since
it is a predicate, not an action.  Change the return type to be a bool,
not the incomplete member.  Enhace it to detect the recursive compilation
case, allowing us to compile Eli's testcase on llvmdev:

struct T {
 struct T (*p)(void);
} t;

into:

%struct.T = type { {}* }

@t = common global %struct.T zeroinitializer, align 8

llvm-svn: 134853
2011-07-10 00:18:59 +00:00
Chris Lattner b0ed51da10 implement a tiny amount of codegen support for gnu array range
designators: allowing codegen when the element initializer is a
constant or something else without a side effect.  This unblocks
enough to let process.c in the linux kernel build, PR9257.

llvm-svn: 126056
2011-02-19 22:28:58 +00:00
Chris Lattner 27a3631bac Improve codegen for initializer lists to use memset more aggressively
when an initializer is variable (I handled the constant case in a previous
patch).  This has three pieces:

1. Enhance AggValueSlot to have a 'isZeroed' bit to tell CGExprAgg that
   the memory being stored into has previously been memset to zero.
2. Teach CGExprAgg to not emit stores of zero to isZeroed memory.
3. Teach CodeGenFunction::EmitAggExpr to scan initializers to determine
   whether they are profitable to emit a memset + inividual stores vs
   stores for everything.

The heuristic used is that a global has to be more than 16 bytes and
has to be 3/4 zero to be candidate for this xform.  The two testcases
are illustrative of the scenarios this catches.  We now codegen test9 into:

 call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 400, i32 4, i1 false)
 %.array = getelementptr inbounds [100 x i32]* %Arr, i32 0, i32 0
 %tmp = load i32* %X.addr, align 4
 store i32 %tmp, i32* %.array

and test10 into:

  call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 392, i32 8, i1 false)
  %tmp = getelementptr inbounds %struct.b* %S, i32 0, i32 0
  %tmp1 = getelementptr inbounds %struct.a* %tmp, i32 0, i32 0
  %tmp2 = load i32* %X.addr, align 4
  store i32 %tmp2, i32* %tmp1, align 4
  %tmp5 = getelementptr inbounds %struct.b* %S, i32 0, i32 3
  %tmp10 = getelementptr inbounds %struct.a* %tmp5, i32 0, i32 4
  %tmp11 = load i32* %X.addr, align 4
  store i32 %tmp11, i32* %tmp10, align 4

Previously we produced 99 stores of zero for test9 and also tons for test10.
This xforms should substantially speed up -O0 builds when it kicks in as well
as reducing code size and optimizer heartburn on insane cases.  This resolves
PR279.

llvm-svn: 120692
2010-12-02 07:07:26 +00:00
Chris Lattner e6af88628f Enhance the init generation logic to emit a memset followed by a few stores when
a global is larger than 32 bytes and has fewer than 6 non-zero values in the
initializer.  Previously we'd turn something like this:

char test8(int X) {
  char str[10000] = "abc";

into a 10K global variable which we then memcpy'd from.  Now we generate:

  %str = alloca [10000 x i8], align 16
  %tmp = getelementptr inbounds [10000 x i8]* %str, i64 0, i64 0
  call void @llvm.memset.p0i8.i64(i8* %tmp, i8 0, i64 10000, i32 16, i1 false)
  store i8 97, i8* %tmp, align 16
  %0 = getelementptr [10000 x i8]* %str, i64 0, i64 1
  store i8 98, i8* %0, align 1
  %1 = getelementptr [10000 x i8]* %str, i64 0, i64 2
  store i8 99, i8* %1, align 2

Which is much smaller in space and also likely faster.

This is part of PR279

llvm-svn: 120645
2010-12-02 01:58:41 +00:00
Chris Lattner 001b29ccc1 Allow a string literal to initialize a tail array (PR8217), patch
by Pierre Habouzit!

llvm-svn: 116165
2010-10-10 17:49:49 +00:00
John McCall 11086fcb65 Don't consider casted non-global pointers to be evaluatable.
Fixes rdar://problem/8154689

llvm-svn: 107755
2010-07-07 05:08:32 +00:00
Nuno Lopes 247a138ec6 recommit r101568 to fix PR6766
as a side-effect, remove two FIXMEs now fixed

llvm-svn: 101726
2010-04-18 19:06:43 +00:00
Chris Lattner b714a4b4a0 revert r101568, which miscompiles this testcase, distilled from ldecod:
void exit_picture()
{
  char yuv_types[4][6]= {"4:0:0","4:2:0","4:2:2","4:4:4"};
  foo(yuv_types);
}

llvm-svn: 101623
2010-04-17 06:53:44 +00:00
Nuno Lopes 74b595256a fix PR6766: codegen of var initialized with wide char
llvm-svn: 101568
2010-04-16 23:19:41 +00:00
Chris Lattner e18aaf2c4b add a codegen hack to work around an AST bug, allowing us to compile the
code in PR6537.  This should be reverted when the ast bug is fixed.

llvm-svn: 97981
2010-03-08 21:08:07 +00:00
Daniel Dunbar 8fbe78f6fc Update tests to use %clang_cc1 instead of 'clang-cc' or 'clang -cc1'.
- This is designed to make it obvious that %clang_cc1 is a "test variable"
   which is substituted. It is '%clang_cc1' instead of '%clang -cc1' because it
   can be useful to redefine what gets run as 'clang -cc1' (for example, to set
   a default target).

llvm-svn: 91446
2009-12-15 20:14:24 +00:00
Daniel Dunbar 8b57697954 Eliminate &&s in tests.
- 'for i in $(find . -type f); do sed -e 's#\(RUN:.*[^ ]\) *&& *$#\1#g' $i | FileUpdate $i; done', for the curious.

llvm-svn: 86430
2009-11-08 01:45:36 +00:00
Daniel Dunbar a45cf5b6b0 Rename clang to clang-cc.
Tests and drivers updated, still need to shuffle dirs.

llvm-svn: 67602
2009-03-24 02:24:46 +00:00
Daniel Dunbar f9f039865f Set constant bit on static block vars as well. Patch by Anders Johnson!q
llvm-svn: 64502
2009-02-13 22:58:39 +00:00
Douglas Gregor d42a0fb41b Upgrade the "excess elements in array initializer" warning to an
error, since both C99 and C++ consider it an error. For reference, GCC
makes this a warning while G++ makes it an error.

llvm-svn: 63435
2009-01-30 22:26:29 +00:00
Chris Lattner 033352b036 add a testcase
llvm-svn: 50608
2008-05-04 00:56:25 +00:00
Lauro Ramos Venancio e2162c6549 Simplify aggregate initilizer implementation. Use the CodeGenModule::EmitConstantExpr method when
possible.
Fix mediabench/mpeg2/mpeg2dec test.

llvm-svn: 47336
2008-02-19 19:27:31 +00:00
Lauro Ramos Venancio dec89733a7 Implement multi-dimension array initalizer.
Fix McCat/08-main test.

llvm-svn: 47286
2008-02-18 22:44:02 +00:00
Anders Carlsson 0674a7417f Correctly handle constants that refer to enums.
llvm-svn: 46481
2008-01-29 01:28:48 +00:00
Anders Carlsson 6f2a10e8c9 Correctly handle scalars in braces.
llvm-svn: 46480
2008-01-29 01:15:48 +00:00