plus some variations of this. According to my auto-simplifier this occurs a lot
but usually in combination with max/min idioms. Because max/min aren't handled
yet this unfortunately doesn't have much effect in the testsuite.
llvm-svn: 125462
It caused a crash in MultiSource/Benchmarks/Bullet.
Opt hit an assertion with "opt -std-compile-opts" because
Constant::getAllOnesValue doesn't know how to handle floats.
This patch added a test to reproduce the problem and a check that the
destination vector is of integer type.
Thank you Benjamin!
llvm-svn: 125459
gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
llvm-svn: 125271
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
llvm-svn: 125267
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
llvm-svn: 125266
could end up removing a different function than we intended because it was
functionally equivalent, then end up with a comparison of a function against
itself in the next round of comparisons (the one in the function set and the
one on the deferred list). To fix this, I introduce a choice in the form of
comparison for ComparableFunctions, either normal or "pointer only" used to
find exact Function*'s in lookups.
Also add some debugging statements.
llvm-svn: 125180
auto-simplifier). This has a big impact on Ada code, but not much else.
Unfortunately the impact is mostly negative! This is due to PR9004 (aka
SCCP failing to resolve conditional branch conditions in the destination
blocks of the branch), in which simple correlated expressions are not
resolved but complicated ones are, so simplifying has a bad effect!
llvm-svn: 124788
overflow (nsw flag), which was disabled because it breaks 254.gap. I have
informed the GAP authors of the mistake in their code, and arranged for the
testsuite to use -fwrapv when compiling this benchmark.
llvm-svn: 124746
This makes the job of the later optzn passes easier, allowing the vast amount of
icmp transforms to chew on it.
We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting
binary on i386-linux.
The testcase from README.txt now compiles into
decl %edi
cmpl $3, %edi
sbbl %eax, %eax
andl $1, %eax
ret
llvm-svn: 124724
to do this and more, but would only do it if X/Y had only one use. Spotted as the
most common missed simplification in SPEC by my auto-simplifier, now that it knows
about nuw/nsw/exact flags. This removes a bunch of multiplications from 447.dealII
and 483.xalancbmk. It also removes a lot from tramp3d-v4, which results in much
more inlining.
llvm-svn: 124560
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.
llvm-svn: 124287
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
llvm-svn: 124183
occurs because instcombine sinks loads and inserts phis. This kicks in
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.
This resolves the last of rdar://7339113.
llvm-svn: 124090
common cases. This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this. For example, code from the
STL ends up looking like this to SRoA:
%202 = load i64* %__old_size, align 8, !tbaa !3
%203 = load i64* %__old_size, align 8, !tbaa !3
%204 = load i64* %__n, align 8, !tbaa !3
%205 = icmp ult i64 %203, %204
%storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
%206 = load i64* %storemerge.i, align 8, !tbaa !3
We can now promote both the __n and the __old_size allocas.
This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.
llvm-svn: 124088
that have PHI or select uses of their element pointers. This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.
With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted. This is still a win for large aggregates where only one element
is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).
llvm-svn: 124070
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.
llvm-svn: 123963
While here, I'd like to complain about how vector is not an aggregate type
according to llvm::Type::isAggregateType(), but they're listed under aggregate
types in the LangRef and zero vectors are stored as ConstantAggregateZero.
llvm-svn: 123956
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this). At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite. I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs. One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.
llvm-svn: 123911
These were not recommended by my auto-simplifier since they don't fire often enough.
However they do fire from time to time, for example they remove one subtraction from
the final bitcode for 483.xalancbmk.
llvm-svn: 123755
simplification in fully optimized code. It occurs sporadically in the testsuite, and
many times in 403.gcc: the final bitcode has 131 fewer subtractions after this change.
The reason that the multiplies are not eliminated is the same reason that instcombine
did not catch this: they are used by other instructions (instcombine catches this with
a more general transform which in general is only profitable if the operands have only
one use).
llvm-svn: 123754
This fixes the original testcase in PR8927. It also causes a clang
binary built with a patched clang to increase in size by 0.21%.
We can probably get some of the size back by writing a pass that
detects that a global never has its pointer compared and adds
unnamed_addr to it (maybe extend global opt). It is also possible that
there are some other cases clang could add unnamed_addr to.
I will investigate extending globalopt next.
llvm-svn: 123584
then don't try to decimate it into its individual pieces. This will just make a mess of the
IR and is pointless if none of the elements are individually accessed. This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.
The testcase now is optimized to:
define i64 @test2(i64 %X) {
br label %L2
L2: ; preds = %0
ret i64 %X
}
before we generated:
define i64 @test2(i64 %X) {
%sroa.store.elt = lshr i64 %X, 56
%1 = trunc i64 %sroa.store.elt to i8
%sroa.store.elt8 = lshr i64 %X, 48
%2 = trunc i64 %sroa.store.elt8 to i8
%sroa.store.elt9 = lshr i64 %X, 40
%3 = trunc i64 %sroa.store.elt9 to i8
%sroa.store.elt10 = lshr i64 %X, 32
%4 = trunc i64 %sroa.store.elt10 to i8
%sroa.store.elt11 = lshr i64 %X, 24
%5 = trunc i64 %sroa.store.elt11 to i8
%sroa.store.elt12 = lshr i64 %X, 16
%6 = trunc i64 %sroa.store.elt12 to i8
%sroa.store.elt13 = lshr i64 %X, 8
%7 = trunc i64 %sroa.store.elt13 to i8
%8 = trunc i64 %X to i8
br label %L2
L2: ; preds = %0
%9 = zext i8 %1 to i64
%10 = shl i64 %9, 56
%11 = zext i8 %2 to i64
%12 = shl i64 %11, 48
%13 = or i64 %12, %10
%14 = zext i8 %3 to i64
%15 = shl i64 %14, 40
%16 = or i64 %15, %13
%17 = zext i8 %4 to i64
%18 = shl i64 %17, 32
%19 = or i64 %18, %16
%20 = zext i8 %5 to i64
%21 = shl i64 %20, 24
%22 = or i64 %21, %19
%23 = zext i8 %6 to i64
%24 = shl i64 %23, 16
%25 = or i64 %24, %22
%26 = zext i8 %7 to i64
%27 = shl i64 %26, 8
%28 = or i64 %27, %25
%29 = zext i8 %8 to i64
%30 = or i64 %29, %28
ret i64 %30
}
In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off. It's better to not generate this stuff
in the first place.
llvm-svn: 123571
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
llvm-svn: 123568
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
llvm-svn: 123526
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
llvm-svn: 123524
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
llvm-svn: 123442
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
llvm-svn: 123441
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
llvm-svn: 123417
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.
llvm-svn: 123381
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.
llvm-svn: 123380
is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way
llvm-gcc converts gcc's conditional expressions. Add this, and a few other
similar transforms for completeness.
llvm-svn: 123372
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.
llvm-svn: 123206
larger memsets. Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:
_mad_synth_mute: ## @mad_synth_mute
## BB#0: ## %entry
pushq %rax
movl $4096, %esi ## imm = 0x1000
callq ___bzero
popq %rax
ret
llvm-svn: 123089
1. Rip out LoopRotate's domfrontier updating code. It isn't
needed now that LICM doesn't use DF and it is super complex
and gross.
2. Make DomTree updating code a lot simpler and faster. The
old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
SplitCriticalEdge instead of doing an overcomplex
reimplementation of it.
No behavior change, except for the name of the inserted preheader.
llvm-svn: 123072
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.
Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.
llvm-svn: 123060
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
llvm-svn: 123034
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)
to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.
llvm-svn: 122950
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
llvm-svn: 122806
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
llvm-svn: 122712
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
llvm-svn: 122695
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
llvm-svn: 122685
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
llvm-svn: 122654
the original instruction, half the cases were missed (making it not
wrong but suboptimal). Also correct a typo (A <-> B) in the second
chunk.
llvm-svn: 122414
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
llvm-svn: 122399
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
llvm-svn: 122378
a couple of existing transforms. This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.
llvm-svn: 122326
(they had just been forgotten before). Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.
llvm-svn: 122245
argument. The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.
llvm-svn: 122234
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
llvm-svn: 122182
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:
unsigned char X(char a, char b) {
int res = a+b;
if ((unsigned )(res+128) > 255U)
abort();
return res;
}
llvm-svn: 122178
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.
This is likely to be the fix for 8816.
llvm-svn: 122177
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.
llvm-svn: 122164
on the DragonEgg self-host bot. Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.
llvm-svn: 122072
a null endptr argument, because they may write to errno.
This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.
llvm-svn: 122014
dragonegg self-host buildbot. Original commit message:
Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
llvm-svn: 121965
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
Fixes <rdar://problem/8558713>.
llvm-svn: 121905
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.
llvm-svn: 121859
which is simpler than finding a place to insert in BB.
- Don't perform the 'if condition hoisting' xform on certain
i1 PHIs, as it interferes with switch formation.
This re-fixes "example 7", without breaking the world hopefully.
llvm-svn: 121764
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.
second, it doesn't clean up the "if diamond" that it just
eliminated away. This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.
llvm-svn: 121762
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
llvm-svn: 121690
(x & 2^n) ? 2^m+C : C
we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.
llvm-svn: 121609
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.
llvm-svn: 121111
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
llvm-svn: 120974
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
llvm-svn: 120406
contains "ref".
Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.
llvm-svn: 120368
fairly systematic way in instcombine. Some of these cases were already dealt
with, in which case I removed the existing code. The case of Add has a bunch of
funky logic which covers some of this plus a few variants (considers shifts to be
a form of multiplication), which I didn't touch. The simplification performed is:
A*B+A*C -> A*(B+C). The improvement is to do this in cases that were not already
handled [such as A*B-A*C -> A*(B-C), which was reported on the mailing list], and
also to do it more often by not checking for "only one use" if "B+C" simplifies.
llvm-svn: 120024
folding improvements: if P points to a type of size zero, turn "gep P, N" into "P".
More generally, if a gep index type has size zero, instcombine could replace the
index with zero, but that is not done here.
llvm-svn: 119942
allowing the memcpy to be eliminated.
Unfortunately, the requirements on byval's without explicit
alignment are really weak and impossible to predict in the
mid-level optimizer, so this doesn't kick in much with current
frontends. The fix is to change clang to set alignment on all
byval arguments.
llvm-svn: 119916
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed. This led to really bad performance on tall, narrow CFGs.
We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree. Because there are typically few duplicates of a given value, this scan tends to be
quite fast. Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.
This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.
The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack. If you see conditional propagation that's not happening, please file bugs against LVI or CVP.
llvm-svn: 119714
refusing to optimize two memcpy's like this:
copy A <- B
copy C <- A
if it couldn't prove that noalias(B,C). We can eliminate
the copy by producing a memmove instead of memcpy.
llvm-svn: 119694
if it is passed as a byval argument. The byval argument will just be a
read, so it is safe to read from the original global instead. This allows
us to promote away the %agg.tmp alloca in PR8582
llvm-svn: 119686
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this). So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.
llvm-svn: 119347
offload the work to hasConstantValue rather than do something more
complicated (such handling mutually recursive phis) because (1) it is
not clear it is worth it; and (2) if it is worth it, maybe such logic
would be better placed in hasConstantValue. Adjust some GVN tests
which are now cleaned up much further (eg: all phi nodes are removed).
llvm-svn: 119043
SimplifyAssociativeOrCommutative) "(A op C1) op C2" -> "A op (C1 op C2)",
which previously was only done if C1 and C2 were constants, to occur whenever
"C1 op C2" simplifies (a la InstructionSimplify). Since the simplifying operand
combination can no longer be assumed to be the right-hand terms, consider all of
the possible permutations. When compiling "gcc as one big file", transform 2
(i.e. using right-hand operands) fires about 4000 times but it has to be said
that most of the time the simplifying operands are both constants. Transforms
3, 4 and 5 each fired once. Transform 6, which is an existing transform that
I didn't change, never fired. With this change, the testcase is now optimized
perfectly with one run of instcombine (previously it required instcombine +
reassociate + instcombine, and it may just have been luck that this worked).
llvm-svn: 119002
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer. Teach it how to reason about GEPs
with simple non-zero indices.
Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.
llvm-svn: 118840
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
nodes can be used in loops, this could result in infinite looping
if there is no recursion limit, so add such a limit. It is also
used for the SelectInst case because in theory there could be an
infinite loop there too if the basic block is unreachable.
llvm-svn: 118694
to optionally look for constant or local (alloca) memory.
Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.
Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.
llvm-svn: 118412
of a select instruction, see if doing the compare with the
true and false values of the select gives the same result.
If so, that can be used as the value of the comparison.
llvm-svn: 118378
consider it to be readonly. In fact, don't even consider it to be
readonly if it does a volatile load from an AllocaInst either (it
is debatable as to whether readonly would be correct or not in this
case; play safe for the moment). This fixes PR8279.
llvm-svn: 117783
This code had previously used 2*N, where N is the mask length, to represent
undef. That is not safe because the shufflevector operands may have more
than N elements -- they don't have to match the result type.
llvm-svn: 117721
Allow splats even if they don't match either of the original shuffles,
possibly due to undef entries in the shuffles masks. Radar 8597790.
Also fix some 80-column violations.
llvm-svn: 117719
it isn't unreachable and should not be zapped. The check for the entry block
was missing in one case: a block containing a unwind instruction. While there,
do some small cleanups: "M" is not a great name for a Function* (it would be
more appropriate for a Module*), change it to "Fn"; use Fn in more places.
llvm-svn: 117224
does normal initialization and normal chaining. Change the default
AliasAnalysis implementation to NoAlias.
Update StandardCompileOpts.h and friends to explicitly request
BasicAliasAnalysis.
Update tests to explicitly request -basicaa.
llvm-svn: 116720
logic to use the new APInt methods. Among other things this
implements rdar://8501501 - llvm.smul.with.overflow.i32 should constant fold
which comes from "clang -ftrapv", originally brought to my attention from PR8221.
llvm-svn: 116457
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.
llvm-svn: 115337
Because of this, we cannot use the Simplify* APIs, as they can assert-fail on unreachable code. Since it's not easy to determine
if a given threading will cause a block to become unreachable, simply defer simplifying simplification to later InstCombine and/or
DCE passes.
llvm-svn: 115082
Usually we wouldn't do this anyway because llvm_fenv_testexcept would return an
exception, but we have seen some cases where neither errno nor fenv detect an
exception on arm-linux.
llvm-svn: 114893
Splitting critical edges at the merge point only addressed part of the issue; it is also possible for non-post-domination
to occur when the path from the load to the merge has branches in it. Unfortunately, full anticipation analysis is
time-consuming, so for now approximate it. This is strictly more conservative than real anticipation, so we will miss
some cases that real PRE would allow, but we also no longer insert loads into paths where they didn't exist before. :-)
This is a very slight net positive on SPEC for me (0.5% on average). Most of the benchmarks are largely unaffected, but
when it pays off it pays off decently: 181.mcf improves by 4.5% on my machine.
llvm-svn: 114785
a Constant into a ConstantRange. Handle this conservatively for now, rather than asserting. The testcase is
more complex that I would like, but the manifestation of the problem is sensitive to iteration orders and the state of the
LVI cache, and I have not been able to reproduce it with manually constructed or simplified cases.
Fixes PR8162.
llvm-svn: 114103
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
llvm-svn: 113763
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
llvm-svn: 113679
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042