on the example in PR4216. This doesn't trigger in the testsuite,
so I'd really appreciate someone scrutinizing the logic for
correctness.
llvm-svn: 92458
when a consequtive sequence of elements all satisfies the
predicate. Like the double compare case, this generates better
code than the magic constant case and generalizes to more than
32/64 element array lookups.
Here are some examples where it triggers. From 403.gcc, most
accesses to the rtx_class array are handled, e.g.:
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=547]
%142 = icmp eq i8 %141, 105
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=543]
%165 = icmp eq i8 %164, 60
Also, most of the 59-element arrays (mode_class/rid_to_yy, etc)
optimized before are actually range compares. This lets 32-bit
machines optimize them.
400.perlbmk has stuff like this:
400.perlbmk: PL_regkind, even for 32-bit:
@PL_regkind = constant [62 x i8] c"\00\00\02\02\02\06\06\06\06\09\09\0B\0B\0D\0E\0E\0E\11\12\12\14\14\16\16\18\18\1A\1A\1C\1C\1E\1F !!!$$&'((((,-.///88886789:;8$", align 32 ; <[62 x i8]*> [#uses=4]
%811 = icmp ne i8 %810, 33
@PL_utf8skip = constant [256 x i8] c"\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\04\04\04\04\04\04\04\04\05\05\05\05\06\06\07\0D", align 32 ; <[256 x i8]*> [#uses=94]
%12 = icmp ult i8 %10, 2
etc.
llvm-svn: 92426
two elements match or don't match with two comparisons. For
example, the testcase compiles into:
define i1 @test5(i32 %X) {
%1 = icmp eq i32 %X, 2 ; <i1> [#uses=1]
%2 = icmp eq i32 %X, 7 ; <i1> [#uses=1]
%R = or i1 %1, %2 ; <i1> [#uses=1]
ret i1 %R
}
This generalizes the previous xforms when the array is larger than
64 elements (and this case matches) and generates better code for
cases where it overlaps with the magic bitshift case.
This generalizes more cases than you might expect. For example,
400.perlbmk has:
@PL_utf8skip = constant [256 x i8] c"\01\01\01\...
%15 = icmp ult i8 %7, 7
403.gcc has:
@rid_to_yy = internal constant [114 x i16] [i16 259, i16 260, ...
%18 = icmp eq i16 %16, 295
and xalancbmk has a bunch of examples, such as
_ZN11xercesc_2_5L15gCombiningCharsE and _ZN11xercesc_2_5L10gBaseCharsE.
llvm-svn: 92417
arrays with variable indices into a comparison of the index
with a constant. The most common occurrence of this that
I see by far is stuff like:
if ("foobar"[i] == '\0') ...
which we compile into: if (i == 6), saving a load and
materialization of the global address. This also exposes
loop trip count information to later passes in many cases.
This triggers hundreds of times in xalancbmk, which is where I first
noticed it, but it also triggers in many other apps. Here are a few
interesting ones from various apps:
@must_be_connected_without = internal constant [8 x i8*] [i8* getelementptr inbounds ([3 x i8]* @.str64320, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str27283, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str71327, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str72328, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str18274, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8]* @.str11267, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str32288, i64 0, i64 0), i8* null], align 32 ; <[8 x i8*]*> [#uses=2]
%scevgep.i = getelementptr [8 x i8*]* @must_be_connected_without, i64 0, i64 %indvar.i ; <i8**> [#uses=1]
%17 = load ...
%18 = icmp eq i8* %17, null ; <i1> [#uses=1]
-> icmp eq i64 %indvar.i, 7
@yytable1095 = internal constant [84 x i8] c"\12\01(\05\06\07\08\09\0A\0B\0C\0D\0E1\0F\10\11266\1D: \10\11,-,0\03'\10\11B6\04\17&\18\1945\05\06\07\08\09\0A\0B\0C\0D\0E\1E\0F\10\11*\1A\1B\1C$3+>#%;<IJ=ADFEGH9KL\00\00\00C", align 32 ; <[84 x i8]*> [#uses=2]
%57 = getelementptr inbounds [84 x i8]* @yytable1095, i64 0, i64 %56 ; <i8*> [#uses=1]
%mode.0.in = getelementptr inbounds [9 x i32]* @mb_mode_table, i64 0, i64 %.pn ; <i32*> [#uses=1]
load ...
%64 = icmp eq i8 %58, 4 ; <i1> [#uses=1]
-> icmp eq i64 %.pn, 35 ; <i1> [#uses=0]
@gsm_DLB = internal constant [4 x i16] [i16 6554, i16 16384, i16 26214, i16 32767]
%scevgep.i = getelementptr [4 x i16]* @gsm_DLB, i64 0, i64 %indvar.i ; <i16*> [#uses=1]
%425 = load %scevgep.i
%426 = icmp eq i16 %425, -32768 ; <i1> [#uses=0]
-> false
llvm-svn: 92411
pointer to int casts that confuse later optimizations. See PR3351
for details.
This improves but doesn't complete fix 483.xalancbmk because llvm-gcc
does this xform in GCC's "fold" routine as well. Clang++ will do
better I guess.
llvm-svn: 92408
positive and negative forms of constants together. This
allows us to compile:
int foo(int x, int y) {
return (x-y) + (x-y) + (x-y);
}
into:
_foo: ## @foo
subl %esi, %edi
leal (%rdi,%rdi,2), %eax
ret
instead of (where the 3 and -3 were not factored):
_foo:
imull $-3, 8(%esp), %ecx
imull $3, 4(%esp), %eax
addl %ecx, %eax
ret
this started out as:
movl 12(%ebp), %ecx
imull $3, 8(%ebp), %eax
subl %ecx, %eax
subl %ecx, %eax
subl %ecx, %eax
ret
This comes from PR5359.
llvm-svn: 92381
non-templated IRBuilderBase class. Move that large CreateGlobalString
out of line, eliminating the need to #include GlobalVariable.h in IRBuilder.h
llvm-svn: 92227
SDISel. This optimization was causing simplifylibcalls to
introduce type-unsafe nastiness. This is the first step, I'll be
expanding the memcmp optimizations shortly, covering things that
we really really wouldn't want simplifylibcalls to do.
llvm-svn: 92098
load is needed when we have a small store into a large alloca (at which
point we get a load/insert/store sequence), but when you do a full-sized
store, this load ends up being dead.
This dead load is bad in really large nasty testcases where the load ends
up causing mem2reg to insert large chains of dependent phi nodes which only
ADCE can delete. Instead of doing this, just don't insert the dead load.
This fixes rdar://6864035
llvm-svn: 91917
missing check that an array reference doesn't go past the end of the array,
and remove some redundant checks for in-bound array and vector references
that are no longer needed.
llvm-svn: 91897
by merging all returns in a function into a single one, but simplifycfg
currently likes to duplicate the return (an unfortunate choice!)
llvm-svn: 91890
instead of stored. This reduces memdep memory usage, and also eliminates a bunch of
weakvh's. This speeds up gvn on gcc.c-torture/20001226-1.c from 23.9s to 8.45s (2.8x)
on a different machine than earlier.
llvm-svn: 91885
load to avoid even messing around with SSAUpdate at all. In this case (which
is very common, we can just use the input value directly).
This speeds up GVN time on gcc.c-torture/20001226-1.c from 36.4s to 16.3s,
which still isn't great, but substantially better and this is a simple speedup
that applies to lots of different cases.
llvm-svn: 91851
two-element arrays. After restructuring the SROA code, it was not safe to
do this without adding more checking. It is not clear that this special-case
has really been useful, and removing this simplifies the code quite a bit.
llvm-svn: 91828
implement some optimizations for MIN(MIN()) and MAX(MAX()) and
MIN(MAX()) etc. This substantially improves the code in PR5822 but
doesn't kick in much elsewhere. 2 max's were optimized in
pairlocalalign and one in smg2000.
llvm-svn: 91814
Use the presence of NSW/NUW to fold "icmp (x+cst), x" to a constant in
cases where it would otherwise be undefined behavior.
Surprisingly (to me at least), this triggers hundreds of the times in
a few benchmarks: lencode, ldecode, and 466.h264ref seem to *really*
like this.
llvm-svn: 91812
where instcombine would have to split a critical edge due to a
phi node of an invoke. Since instcombine can't change the CFG,
it has to bail out from doing the transformation.
llvm-svn: 91763
* change FindElementAndOffset to return a uint64_t instead of unsigned, and
to identify the type to be used for that result in a GEP instruction.
* move "isa<ConstantInt>" to be first in conditional.
* replace some dyn_casts with casts.
* add a comment about handling mem intrinsics.
llvm-svn: 91762
contains another loop, or an instruction. The loop form is
substantially more efficient on large loops than the typical
code it replaces.
llvm-svn: 91654
of 91296 that caused trouble -- the Processed list needs to be
preserved for the livetime of the pass, as AddUsersIfInteresting
is called from other passes.
llvm-svn: 91641
problem", this broke llvm-gcc bootstrap for release builds on
x86_64-apple-darwin10.
This reverts commit db22309800b224a9f5f51baf76071d7a93ce59c9.
llvm-svn: 91534
found last time. Instead of trying to modify the IR while iterating over it,
I've change it to keep a list of WeakVH references to dead instructions, and
then delete those instructions later. I also added some special case code to
detect and handle the situation when both operands of a memcpy intrinsic are
referencing the same alloca.
llvm-svn: 91459
isPodLike type trait. This is a generally useful type trait for
more than just DenseMap, and we really care about whether something
acts like a pod, not whether it really is a pod.
llvm-svn: 91421
While scanning through the uses of an alloca, keep track of the current offset
relative to the start of the alloca, and check memory references to see if
the offset & size correspond to a component within the alloca. This has the
nice benefit of unifying much of the code from isSafeUseOfAllocation,
isSafeElementUse, and isSafeUseOfBitCastedAllocation. The code to rewrite
the uses of a promoted alloca, after it is determined to be safe, is
reorganized in the same way.
Also, when rewriting GEP instructions, mark them as "in-bounds" since all the
indices are known to be safe.
llvm-svn: 91184
value size. This only manifested when memdep inprecisely returns clobber,
which is do to a caching issue in the PR5744 testcase. We can 'efficiently
emulate' this by using '-no-aa'
llvm-svn: 91004
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
llvm-svn: 90926
handle cases like this:
void test(int N, double* G) {
long j;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
where G[1] isn't live into the loop.
llvm-svn: 90041
array indexes. The "complex" case of SRoA still handles them, and correctly.
This fixes a weirdness where we'd correctly avoid transforming A[0][42] if
the 42 was too large, but we'd only do it if it was one gep, not two separate
ones.
llvm-svn: 90007
generates store to undef and some generates store to null as the idiom
for undefined behavior. Since simplifycfg zaps both, don't remove the
undefined behavior in instcombine.
llvm-svn: 89971
tests/Transforms/InstCombine/shufflemask-undef.ll. If
anyone cares, the use of 2*e here (and the equivalent
all over the place in instcombine) seems wrong, though
harmless: it should really be twice the length of the
input vector. I think shufflevector used to require
that the mask have the same length as the input, but I
don't think that's true any more. I don't care enough
about vectors to do anything about this...
llvm-svn: 89456
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
llvm-svn: 89374
cannot be folded into target cmp instruction.
- Avoid a phase ordering issue where early cmp optimization would prevent the
later count-to-zero optimization.
- Add missing checks which could cause LSR to reuse stride that does not have
users.
- Fix a bug in count-to-zero optimization code which failed to find the pre-inc
iv's phi node.
- Remove, tighten, loosen some incorrect checks disable valid transformations.
- Quite a bit of code clean up.
llvm-svn: 86969
making the new LVI stuff smart enough to subsume some special
cases in the old code. Disable them when LVI is around, the
testcase still passes.
llvm-svn: 86951
start using them in a trivial way when -enable-jump-threading-lvi
is passed. enable-jump-threading-lvi will be my playground for
awhile.
llvm-svn: 86789
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
llvm-svn: 86722
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
except that the result may not be a constant. Switch jump threading to
use it so that it gets things like (X & 0) -> 0, which occur when phi preds
are deleted and the remaining phi pred was a zero.
llvm-svn: 86637
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
graphs being produced. The cause was that we were incorrectly marking sigma instructions as
processed after handling the sigma-specific constraints for them, potentially neglecting to
process them as normal instructions as well.
Unfortunately, the testcase that inspired this still doesn't work because of a bug in the solver,
which is next on the list to debug.
llvm-svn: 86486
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
to EmitGEPOffset.
Implement some new transforms for optimizing
subtracts of two pointer to ints into the same vector. This happens
for C++ iterator idioms for example, stringmap takes a const char*
that points to the start and end of a string. Once inlined, we want
the pointer difference to turn back into a length.
This is rdar://7362831.
llvm-svn: 86021
functions that don't have local linkage. Basically, we need to be more
careful about propagating argument information to functions whose results
we aren't tracking. This fixes a miscompilation of
LLVMCConfigurationEmitter.cpp when built with an llvm-gcc that has ipsccp
enabled.
llvm-svn: 85923
function to calls of that function, regardless of whether it has local
linkage or has its address taken. Not escaping should only affect
whether we make an aggressive assumption about the arguments to a
function, not whether we can track the result of it.
llvm-svn: 85795
a DenseMap. Doing this required being aware of subtle iterator
invalidation issues, but it provides a big speedup. In a
release-asserts build, this sped up optimizing 403.gcc from
1.34s -> 0.79s (IPSCCP) and 1.11s -> 0.44s (SCCP).
This commit also conflates in a bunch of general cleanups, sorry.
llvm-svn: 85788
phis, it didn't preserve the alignment of the load. This is a missed
optimization of the alignment is high and a miscompilation when the
alignment is low.
llvm-svn: 85736
PHI operands by the predecessor order, sort them by the order used by the
first PHI in the block. This is still suffucient to expose duplicates.
llvm-svn: 85634
Checks on Demand algorithm which looks at arbitrary branches instead of loop
iterations. This is GSoC work by Andre Tavares with only editorial changes
applied!
llvm-svn: 85382