gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
llvm-svn: 125271
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
llvm-svn: 125267
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
llvm-svn: 125266
and generally tidying things up. Only very trivial functionality changes
like now doing (-1 - A) -> (~A) for vectors too.
InstCombineAddSub.cpp | 296 +++++++++++++++++++++-----------------------------
1 file changed, 126 insertions(+), 170 deletions(-)
llvm-svn: 125264
Natural Loop Information
Loop Pass Manager
Canonicalize natural loops
Scalar Evolution Analysis
Loop Pass Manager
Induction Variable Users
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
into this:
Scalar Evolution Analysis
Loop Pass Manager
Canonicalize natural loops
Induction Variable Users
Loop Strength Reduction
This fixes <rdar://problem/8869639>. I also filed PR9184 on doing this sort of
thing automatically, but it seems easier to just change the ordering of the
passes if this is the only case.
llvm-svn: 125254
versions of creation functions. Eventually, the "insertion point" versions
of these should just be removed, we do have IRBuilder afterall.
Do a massive rewrite of much of pattern match. It is now shorter and less
redundant and has several other widgets I will be using in other patches.
Among other changes, m_Div is renamed to m_IDiv (since it only matches
integer divides) and m_Shift is gone (it used to match all binops!!) and
we now have m_LogicalShift for the one client to use.
Enhance IRBuilder to have "isExact" arguments to things like CreateUDiv
and reduce redundancy within IRbuilder by having these methods chain to
each other more instead of duplicating code.
llvm-svn: 125194
could end up removing a different function than we intended because it was
functionally equivalent, then end up with a comparison of a function against
itself in the next round of comparisons (the one in the function set and the
one on the deferred list). To fix this, I introduce a choice in the form of
comparison for ComparableFunctions, either normal or "pointer only" used to
find exact Function*'s in lookups.
Also add some debugging statements.
llvm-svn: 125180
the active loop. This is generally desirable, and it avoids trouble
in situations such as the testcase in PR9123, though the failure
mode depends on use-list order, so it is infeasible to test.
llvm-svn: 125065
This makes the job of the later optzn passes easier, allowing the vast amount of
icmp transforms to chew on it.
We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting
binary on i386-linux.
The testcase from README.txt now compiles into
decl %edi
cmpl $3, %edi
sbbl %eax, %eax
andl $1, %eax
ret
llvm-svn: 124724
that might have changed been affected by a merge elsewhere will have been
removed from the function set, and it isn't needed for performance because we
call grow() ahead of time to prevent reallocations.
llvm-svn: 124717
Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
llvm-svn: 124515
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
llvm-svn: 124487
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.
llvm-svn: 124287
merge vector<intptr_t>::push_back() and vector<void*>::push_back() because
Enumerate() doesn't realize that "i64* null" and "i8** null" are equivalent.
llvm-svn: 124285
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
llvm-svn: 124134
occurs because instcombine sinks loads and inserts phis. This kicks in
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.
This resolves the last of rdar://7339113.
llvm-svn: 124090
common cases. This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this. For example, code from the
STL ends up looking like this to SRoA:
%202 = load i64* %__old_size, align 8, !tbaa !3
%203 = load i64* %__old_size, align 8, !tbaa !3
%204 = load i64* %__n, align 8, !tbaa !3
%205 = icmp ult i64 %203, %204
%storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
%206 = load i64* %storemerge.i, align 8, !tbaa !3
We can now promote both the __n and the __old_size allocas.
This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.
llvm-svn: 124088
clang's -Wuninitialized-experimental warning.
While these don't look like real bugs, clang's
-Wuninitialized-experimental analysis is stricter
than GCC's, and these fixes have the benefit
of being general nice cleanups.
llvm-svn: 124073
that have PHI or select uses of their element pointers. This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.
With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted. This is still a win for large aggregates where only one element
is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).
llvm-svn: 124070
handle the "Transformation preventing inst" printing,
so that -scalarrepl -debug will always print the rejected
instruction. No functionality change.
llvm-svn: 124066
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.
llvm-svn: 123963
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this). At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite. I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs. One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.
llvm-svn: 123911
without whatever this was trying to do. When/if someone has the time to do some empirical
evaluations, it might be worth it to figure out what this code was trying to do and see if
it's worth resurrecting/fixing.
llvm-svn: 123684
checks enabled:
1) Use '<' to compare integers in a comparison function rather than '<='.
2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.
The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.
llvm-svn: 123662
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.
Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.
llvm-svn: 123609