a union containing a vector and an array whose elements were smaller than
the vector elements. this means we need to compile the load of the
array elements into an extract element plus a truncate.
llvm-svn: 47752
or getTypeSizeInBits as appropriate in ScalarReplAggregates.
The right change to make was not always obvious, so it would
be good to have an sroa guru review this. While there I noticed
some bugs, and fixed them: (1) arrays of x86 long double have
holes due to alignment padding, but this wasn't being spotted
by HasStructPadding (renamed to HasPadding). The same goes
for arrays of oddly sized ints. Vectors also suffer from this,
in fact the problem for vectors is much worse because basic
vector assumptions seem to be broken by vectors of type with
alignment padding. I didn't try to fix any of these vector
problems. (2) The code for extracting smaller integers from
larger ones (in the "int union" case) was wrong on big-endian
machines for integers with size not a multiple of 8, like i1.
Probably this is impossible to hit via llvm-gcc, but I fixed
it anyway while there and added a testcase. I also got rid of
some trailing whitespace and changed a function name which
had an obvious typo in it.
llvm-svn: 43672
copies from a constant global, then we can change the reads to read from the
global instead of from the alloca. This eliminates the alloca and the memcpy,
and promotes secondary optimizations (because the loads are now loads from
a constant global).
This is important for a common C idiom:
void foo() {
int A[] = {1,2,3,4,5,6,7,8,9...};
... only reads of A ...
}
For some reason, people forget to mark the array static or const.
This triggers on these multisource benchmarks:
JM/ldecode: block_pos, [3 x [4 x [4 x i32]]]
FreeBench/mason: m, [18 x i32], inlined 4 times
MiBench/office-stringsearch: search_strings, [1332 x i8*]
MiBench/office-stringsearch: find_strings, [1333 x i8*]
Prolangs-C++/city: dirs, [9 x i8*], inlined 4 places
and these spec benchmarks:
177.mesa: message, [8 x [32 x i8]]
186.crafty: bias_rl45, [64 x i32]
186.crafty: diag_sq, [64 x i32]
186.crafty: empty, [9 x i8]
186.crafty: xlate, [15 x i8]
186.crafty: status, [13 x i8]
186.crafty: bdinfo, [25 x i8]
445.gobmk: routines, [16 x i8*]
458.sjeng: piece_rep, [14 x i8*]
458.sjeng: t, [13 x i32], inlined 4 places.
464.h264ref: block8x8_idx, [3 x [4 x [4 x i32]]]
464.h264ref: block_pos, [3 x [4 x [4 x i32]]]
464.h264ref: j_off_tab, [12 x i32]
This implements Transforms/ScalarRepl/memcpy-from-global.ll
llvm-svn: 36429
We now tolerate small amounts of undefined behavior, better emulating what
would happen if the transaction actually occurred in memory. This fixes
SingleSource/UnitTests/2007-04-10-BitfieldTest.c on PPC, at least until
Devang gets a chance to fix the CFE from doing undefined things with bitfields :)
llvm-svn: 35875
scalarrepl things down to elements, but mem2reg can't promote elements that
are memset/memcpy'd. Until then, the code is disabled "0 &&".
llvm-svn: 34924
This feature is needed in order to support shifts of more than 255 bits
on large integer types. This changes the syntax for llvm assembly to
make shl, ashr and lshr instructions look like a binary operator:
shl i32 %X, 1
instead of
shl i32 %X, i8 1
Additionally, this should help a few passes perform additional optimizations.
llvm-svn: 33776