size but different element types, so that it filters out the cases
that CreateShuffleVectorCast doesn't handle. This fixes rdar://9786827.
llvm-svn: 135721
For -disable-iv-rewrite, perform LFTR without generating a new
"canonical" induction variable. Instead find the "best" existing
induction variable for use in the loop exit test and compute the final
value of that IV for use in the new loop exit test. In short,
convert to a simple eq/ne exit test as long as it's cheap to do so.
llvm-svn: 135420
patch brings numerous advantages to LLVM. One way to look at it
is through diffstat:
109 files changed, 3005 insertions(+), 5906 deletions(-)
Removing almost 3K lines of code is a good thing. Other advantages
include:
1. Value::getType() is a simple load that can be CSE'd, not a mutating
union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
uniques them. This means that the compiler doesn't merge them structurally
which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead
"const Type *" everywhere.
Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.
"LLVM 3.0" is the right time to do this.
There are still some cleanups pending after this, this patch is large enough
as-is.
llvm-svn: 134829
The promotion code lost any alignment information, when hoisting loads and
stores out of the loop. This lead to incorrect aligned memory accesses. We now
use the largest alignment we can prove to be correct.
llvm-svn: 134520
alloca that only holds a copy of a global and we're going to replace the users
of the alloca with that global, just nuke the lifetime intrinsics. Part of
PR10121.
llvm-svn: 133905
ops.
This is a rewrite of the IV simplification algorithm used by
-disable-iv-rewrite. To avoid perturbing the default mode, I
temporarily split the driver and created SimplifyIVUsersNoRewrite. The
idea is to avoid doing opcode/pattern matching inside
IndVarSimplify. SCEV already does it. We want to optimize with the
full generality of SCEV, but optimize def-use chains top down on-demand rather
than rewriting the entire expression bottom-up. This was easy to do
for operations that SCEV can prove are identity function. So we're now
eliminating bitmasks and zero extends this way.
A result of this rewrite is that indvars -disable-iv-rewrite no longer
requires IVUsers.
llvm-svn: 133502
In cases such as the attached test, where the case value for a switch
destination is used in a phi node that follows the destination, it
might be better to replace that value with the condition value of the
switch, so that more blocks can be folded away with
TryToSimplifyUncondBranchFromEmptyBlock because there are less
conflicts in the phi node.
llvm-svn: 133344
type's bitwidth matches the (allocated) size of the alloca. This severely
pessimizes vector scalar replacement when the only vector type being used is
something like <3 x float> on x86 or ARM whose allocated size matches a
<4 x float>.
I hope to fix some of the flawed assumptions about allocated size throughout
scalar replacement and reenable this in most cases.
llvm-svn: 133338
for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.
As usual, updating the testsuite is a PITA.
llvm-svn: 133337
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
llvm-svn: 132926
pad, separating the exception and selector calls from the new lpad. Teaching
it not to do that, or to properly adjust the CFG afterwards, is out of
scope because it would require the other edges to the landing pad to be split
as well (effectively). Instead, just recover from the most likely cases
during inlining. The best long-term solution is to change the exception
representation and commit to either requiring or not requiring the more
complex edge-splitting logic; this is just a shorter-term hack.
llvm-svn: 132799
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.
llvm-svn: 132766
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.
llvm-svn: 132759
then we don't want to set the destination in the indirect branch to the
destination. This is because the indirect branch needs its destinations to have
had their block addresses taken. This isn't so of the new critical edge that's
split during this process. If it turns out that the destination block has only
one predecessor, and that being a BB with an indirect branch, then it won't be
marked as 'used' and may be removed.
PR10072
llvm-svn: 132638
redundant with partially-aliasing loads.
When computing what portion of a clobbering load value is needed,
it doesn't consider phi-translation which may have occurred
between the clobbing load and the redundant load.
llvm-svn: 132631
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!
llvm-svn: 132535
In the given testcase, the "Clobber" was pointing to a load, and GVN was incorrectly assuming that meant that the "Clobber" load overlapped the load being analyzed (when they are actually unrelated).
The included testcase tests both this commit and r132434.
Part two of rdar://9429882. (r132434 was mislabeled.)
llvm-svn: 132442
variable. Noticed by inspection.
Simulate memset in EvaluateFunction where the target of the memset and the
value we're setting are both the null value. Fixes PR10047!
llvm-svn: 132288
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
- the selector for the landing pad must provide all available information
about the handlers, filters, and cleanups within that landing pad
- calls to _Unwind_Resume must be converted to branches to the enclosing
lpad so as to avoid re-entering the unwinder when the lpad claimed it
was going to handle the exception in some way
This is quite specific to libUnwind-based unwinding. In an effort to not
interfere too badly with other unwinders, and with existing hacks in frontends,
this only triggers on _Unwind_Resume (not _Unwind_Resume_or_Rethrow) and does
nothing with selectors if it cannot find a selector call for either lpad.
llvm-svn: 132200
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.
llvm-svn: 132103
aligned.
Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval. If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.
This addresses PR9794. We now compile the example into:
define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
%call = call i32 @g(%struct.p* byval align 8 %q) nounwind
ret i32 %call
}
in both x86-64 and x86-32 mode. We still don't get a tailcall though,
because tailcalls apparently can't handle byval.
llvm-svn: 131884
result is non-zero. Implement an example optimization (PR9814), which allows us to
transform:
A / ((1 << B) >>u 2)
into:
A >>u (B-2)
which we compile into:
_divu3: ## @divu3
leal -2(%rsi), %ecx
shrl %cl, %edi
movl %edi, %eax
ret
instead of:
_divu3: ## @divu3
movb %sil, %cl
movl $1, %esi
shll %cl, %esi
shrl $2, %esi
movl %edi, %eax
xorl %edx, %edx
divl %esi, %eax
ret
llvm-svn: 131860
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops. Instead of doign a hack, just
do it the right way.
llvm-svn: 131858
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.
llvm-svn: 131855