In cases such as the attached test, where the case value for a switch
destination is used in a phi node that follows the destination, it
might be better to replace that value with the condition value of the
switch, so that more blocks can be folded away with
TryToSimplifyUncondBranchFromEmptyBlock because there are less
conflicts in the phi node.
llvm-svn: 133344
type's bitwidth matches the (allocated) size of the alloca. This severely
pessimizes vector scalar replacement when the only vector type being used is
something like <3 x float> on x86 or ARM whose allocated size matches a
<4 x float>.
I hope to fix some of the flawed assumptions about allocated size throughout
scalar replacement and reenable this in most cases.
llvm-svn: 133338
spartan right now, but I plan to encode more information in this enum to improve
the correctness and reliability of SRoA. At least this first pass makes it
possible to make VectorTy an actual VectorType.
llvm-svn: 132937
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
llvm-svn: 132926
intrinsics. In fact, we'll optimize a bitcast to that when possible. Detect it
when looking for the lifetime intrinsics.
No test case, noticed by inspection.
llvm-svn: 132906
pad, separating the exception and selector calls from the new lpad. Teaching
it not to do that, or to properly adjust the CFG afterwards, is out of
scope because it would require the other edges to the landing pad to be split
as well (effectively). Instead, just recover from the most likely cases
during inlining. The best long-term solution is to change the exception
representation and commit to either requiring or not requiring the more
complex edge-splitting logic; this is just a shorter-term hack.
llvm-svn: 132799
assuming that all offsets are legal vector accesses, and thus trying to access
the float member of { <2 x float>, float } as the 3rd element of the first
member.
llvm-svn: 132766
former was using the size of the entire alloca, whereas the latter was correctly using
the allocated size of the immediate type being converted (which may differ from the size
of the alloca). This fixes PR10082.
llvm-svn: 132759
then we don't want to set the destination in the indirect branch to the
destination. This is because the indirect branch needs its destinations to have
had their block addresses taken. This isn't so of the new critical edge that's
split during this process. If it turns out that the destination block has only
one predecessor, and that being a BB with an indirect branch, then it won't be
marked as 'used' and may be removed.
PR10072
llvm-svn: 132638
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!
llvm-svn: 132535
variable. Noticed by inspection.
Simulate memset in EvaluateFunction where the target of the memset and the
value we're setting are both the null value. Fixes PR10047!
llvm-svn: 132288
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
- the selector for the landing pad must provide all available information
about the handlers, filters, and cleanups within that landing pad
- calls to _Unwind_Resume must be converted to branches to the enclosing
lpad so as to avoid re-entering the unwinder when the lpad claimed it
was going to handle the exception in some way
This is quite specific to libUnwind-based unwinding. In an effort to not
interfere too badly with other unwinders, and with existing hacks in frontends,
this only triggers on _Unwind_Resume (not _Unwind_Resume_or_Rethrow) and does
nothing with selectors if it cannot find a selector call for either lpad.
llvm-svn: 132200
This looks like it flagged an actual bug. Devang, please review. I added
the parentheses that change behavior, but make the behavior more closely
match commit log's intent.
llvm-svn: 132165
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.
llvm-svn: 132103
case of a switch instruction. Back off this optimization when this would
eliminate all of the predecessors to the latch.
Sorry, I am unable to reduce a reasonably sized test case.
rdar://9486843
llvm-svn: 132022
aligned.
Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval. If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.
This addresses PR9794. We now compile the example into:
define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
%call = call i32 @g(%struct.p* byval align 8 %q) nounwind
ret i32 %call
}
in both x86-64 and x86-32 mode. We still don't get a tailcall though,
because tailcalls apparently can't handle byval.
llvm-svn: 131884
result is non-zero. Implement an example optimization (PR9814), which allows us to
transform:
A / ((1 << B) >>u 2)
into:
A >>u (B-2)
which we compile into:
_divu3: ## @divu3
leal -2(%rsi), %ecx
shrl %cl, %edi
movl %edi, %eax
ret
instead of:
_divu3: ## @divu3
movb %sil, %cl
movl $1, %esi
shll %cl, %esi
shrl $2, %esi
movl %edi, %eax
xorl %edx, %edx
divl %esi, %eax
ret
llvm-svn: 131860
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops. Instead of doign a hack, just
do it the right way.
llvm-svn: 131858
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.
llvm-svn: 131855
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.
llvm-svn: 131744
As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
llvm-svn: 131516
This adds functionality to remove size/zero extension during indvars
without generating a canonical IV and rewriting all IV users. It's
disabled by default so should have no effect on codegen. Work in progress.
llvm-svn: 130829
Only create a canonical IV for backedge taken count if it will
actually be used by LinearFunctionTestReplace. And some related
cleanup, preparing to reduce dependence on canonical IVs.
No significant effect on x86 or arm in the test-suite.
llvm-svn: 130799
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.
llvm-svn: 130743
This obviously helps a lot if the division would be turned into a libcall
(think i64 udiv on i386), but div is also one of the few remaining instructions
on modern CPUs that become more expensive when the bitwidth gets bigger.
This also helps register pressure on i386 when dividing chars, divb needs
two 8-bit parts of a 16 bit register as input where divl uses two registers.
int foo(unsigned char a) { return a/10; }
int bar(unsigned char a, unsigned char b) { return a/b; }
compiles into (x86_64)
_foo:
imull $205, %edi, %eax
shrl $11, %eax
ret
_bar:
movzbl %dil, %eax
divb %sil, %al
movzbl %al, %eax
ret
llvm-svn: 130615
This shouldn't happen in practice because the icmp would be a constant.
Add a check so we don't miscompile code if something goes wrong.
llvm-svn: 130446
between two reads (threading).
Fix an off-by-one in the indirect counter table that I meant to revert after an
earlier experiment. Whoops!
Implement GCOV_PREFIX. Doesn't handle GCOV_PREFIX_STRIP yet.
Fix an off-by-one in string emission. Extra whoops!
Tolerate DISubprograms that have null Function*'s attached to them. I don't yet
understand what this means, but it happens when you have a global static with
a non-trivial constructor/destructor.
Fix a crash on switch statements with a single successor (default-only).
llvm-svn: 130443
a nice and tidy:
%x1 = load i32* %0, align 4
%1 = icmp eq i32 %x1, 1179403647
br i1 %1, label %if.then, label %if.end
instead of doing lots of loads and branches. May the FreeBSD bootloader
long fit in its allocated space.
llvm-svn: 130416
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type. This eliminates a ton of loads on
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.
This is yet another step along the way towards resolving PR6627.
llvm-svn: 130390
Modified LinearFunctionTestReplace to push the condition on the dead
list instead of eagerly deleting it. This can cause unnecessary
IV rewrites, which should have no effect on codegen and will not be an
issue once we stop generating canonical IVs.
llvm-svn: 130340
effective in avoiding recomputation of LCSSA form; the widespread
use of instsimplify (which looks through phi nodes) means it was
not preserving LCSSA form anyway; and instcombine is no longer
scheduled in the middle of the loop passes so this doesn't matter
anymore.
llvm-svn: 130301
when X has multiple uses. This is useful for exposing secondary optimizations,
but the X86 backend isn't ready for this when X has a single use. For example,
this can disable load folding.
This is inching towards resolving PR6627.
llvm-svn: 130238
Add support for switch and indirectbr edges. This works by densely numbering
all blocks which have such terminators, and then separately numbering the
possible successors. The predecessors write down a number, the successor knows
its own number (as a ConstantInt) and sends that and the pointer to the number
the predecessor wrote down to the runtime, who looks up the counter in a
per-function table.
Coverage data should now be functional, but I haven't tested it on anything
other than my 2-file synthetic test program for coverage.
llvm-svn: 130186
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
necessary since gcov counts transitions between blocks. It can't see if you've
run every line in a straight-line function, so we add an edge for it to notice.
llvm-svn: 129905
Break the arc-profile code out to a function like the notes emission code is,
and reorder the functions in the file.
The only functionality change is that we no longer modify the Module when the
Module has no debug info to use.
llvm-svn: 129631
instruction around, reducing work.
Greatly simplify handling of debug instructions. There is no need to
build up a vector of them and then move them into the one predecessor
if we're processing a block. Instead just rescan the block and *copy*
them into the pred. If a block gets merged into multiple preds, this
will retain more debug info.
llvm-svn: 129502
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>). When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.
llvm-svn: 129472
will allow multiple context with different loop unroll parameters to run. This is a minor change and no effect
on existing application.
llvm-svn: 129449
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
llvm-svn: 129401
Use debug info in the IR to find the directory/file:line:col. Each time that location changes, bump a counter.
Unlike the existing profiling system, we don't try to look at argv[], and thusly don't require main() to be present in the IR. This matches GCC's technique where you specify the profiling flag when producing each .o file.
The runtime library is minimal, currently just calling printf at program shutdown time. The API is designed to make it possible to emit GCOV data later on.
llvm-svn: 129340
reassociation opportunities are exposed. This fixes a bug where
the nested reassociation expects to be the IR to be consistent,
but it isn't, because the outer reassociation has disconnected
some of the operands. rdar://9167457
llvm-svn: 129324
mean that it has to be ConstantArray of ConstantStruct. We might have
ConstantAggregateZero, at either level, so don't crash on that.
Also, semi-deprecate the sentinal value. The linker isn't aware of sentinals so
we end up with the two lists appended, each with their "sentinals" on them.
Different parts of LLVM treated sentinals differently, so make them all just
ignore the single entry and continue on with the rest of the list.
llvm-svn: 129307
is equivalent to any other relevant value; it isn't true in general.
If it is equivalent, the LoopPromoter will tell the AST the equivalence.
Also, delete the PreheaderLoad if it is unused.
Chris, since you were the last one to make major changes here, can you check
that this is sane?
llvm-svn: 129049
space info. We crash with an assert in this case. This change checks that the
address space of the bitcasted pointer is the same as the gep ptr.
llvm-svn: 128884
after the given instruction; make sure to handle that case correctly.
(It's difficult to trigger; the included testcase involves a dead
block, but I don't think that's a requirement.)
While I'm here, get rid of the unnecessary warning about
SimplifyInstructionsInBlock, since it should work correctly as far as I know.
llvm-svn: 128782
It's possible to craft an input that hits the recursion limits in a way
that SimplifyDemandedBits doesn't simplify the icmp but ComputeMaskedBits
can infer which bits are zero.
No test case as it depends on too many other things. Fixes PR9609.
llvm-svn: 128777
- Localize the check if an icmp has one use to a place where we know we're
introducing something that's likely more expensive than a sext from i1.
- Add an assert to make sure a case that would lead to a miscompilation is
folded away earlier.
- Fix a typo.
llvm-svn: 128744