the cast. If we do, we can end up with
inst1
--------------- < Insertion point
dbg inst
new inst
instead of the desired
inst1
new inst
--------------- < Insertion point
dbg inst
Another option would be for InsertNoopCastOfTo (or its callers) to move the
insertion point and we would end up with
inst1
dbg inst
new inst
--------------- < Insertion point
but that complicates the callers. This fixes PR12018 (and firefox's build).
llvm-svn: 150884
metadata may still unwind, but only in ways that the ARC
optimizer doesn't need to consider. This permits more
aggressive optimization.
llvm-svn: 150829
useful to represent a variable that is const in the source but can't be constant
in the IR because of a non-trivial constructor. If globalopt evaluates the
constructor, and there was an invariant.start with no matching invariant.end
possible, it will mark the global constant afterwards.
llvm-svn: 150794
This folds a simple loop tail into a loop latch. It covers the common (in fortran) case of postincrement loops. It's a "free" way to expose this type of loop to downstream loop optimizations that bail out on non-canonical loops (getLoopLatch is a heavily used check).
llvm-svn: 150439
This allows BBVectorize to check the "unknown instruction" list in the
alias sets. This is important to prevent instruction fusing from reordering
function calls. Resolves PR11920.
llvm-svn: 150250
is that patterns no longer match for vectors of booleans, because you only get
ConstantDataVector when the vector element type is i8, i16, etc, not when it is
i1). Original commit message:
Remove some dead code and tidy things up now that vectors use ConstantDataVector
instead of always using ConstantVector.
llvm-svn: 150246
GlobalOpt runs early in the pipeline (before inlining) and complex class
hierarchies often introduce bitcasts or GEPs which weren't optimized away.
Teach it to ignore side-effect free instructions instead of depending on
other passes to remove them.
llvm-svn: 150174
* Most of the transforms come through intact by having each transformed load or
store copy the ordering and synchronization scope of the original.
* The transform that turns a global only accessed in main() into an alloca
(since main is non-recursive) with a store of the initial value uses an
unordered store, since it's guaranteed to be the first thing to happen in main.
(Threads may have started before main (!) but they can't have the address of a
function local before the point in the entry block we insert our code.)
* The heap-SRoA transforms are disabled in the face of atomic operations. This
can probably be improved; it seems odd to have atomic accesses to an alloca
that doesn't have its address taken.
AnalyzeGlobal keeps track of the strongest ordering found in any use of the
global. This is more information than we need right now, but it's cheap to
compute and likely to be useful.
llvm-svn: 149847
logic by half: isOnlyReachableViaThisEdge was trying to be clever and
handle the case of a branch to a basic block which is contained in a
loop. This costs a domtree lookup and is completely useless due to
GVN's position in the pass pipeline: all loops have preheaders at this
point, which means it is enough for isOnlyReachableViaThisEdge to check
that Dst has only one predecessor. (I checked this theoretical argument
by running over the entire nightly testsuite, and indeed it is so!).
llvm-svn: 149838
By default, boost the chain depth contribution of loads and stores. This will allow a load/store pair to vectorize even when it would not otherwise be long enough to satisfy the chain depth requirement.
llvm-svn: 149761
PHI nodes which were matched, rather than climbing up the
original PHI node's operands to rediscover PHI nodes for
recording, since the PHI nodes found that are not
necessarily part of the matched set.
This fixes rdar://10589171.
llvm-svn: 149654
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).
llvm-svn: 149468
Changing arguments from being passed as fixed to varargs is unsafe, as
the ABI may require they be handled differently (stack vs. register, for
example).
Remove two tests which rely on the bitcast being folded into the direct
call, which is exactly the transformation that's unsafe.
llvm-svn: 149457
savings from a pointer argument becoming an alloca. Sometimes callees will even
compare a pointer to null and then branch to an otherwise unreachable block!
Detect these cases and compute the number of saved instructions, instead of
bailing out and reporting no savings.
llvm-svn: 148941
returns false in the event the computation feeding into the pointer is
unreachable, which maybe ought to be true -- but this is at least consistent
with undef->isDereferenceablePointer().) Fixes PR11825!
llvm-svn: 148671
can't handle. Also don't produce non-zero results for things which won't be
transformed by SROA at all just because we saw the loads/stores before we saw
the use of the address.
llvm-svn: 148536
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:
- Always used properlyDominates to check safe code hoisting.
- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.
- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.
- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.
Fixes PR11783: SCEVExpander assert.
llvm-svn: 148535
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.
llvm-svn: 148288
the optimizer doesn't eliminate objc_retainBlock calls which are needed
for their side effect of copying blocks onto the heap.
This implements rdar://10361249.
llvm-svn: 148076
1. Size heuristics changed. Now we calculate number of unswitching
branches only once per loop.
2. Some checks was moved from UnswitchIfProfitable to
processCurrentLoop, since it is not changed during processCurrentLoop
iteration. It allows decide to skip some loops at an early stage.
Extended statistics:
- Added total number of instructions analyzed.
llvm-svn: 147935
with other symbols.
An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>
llvm-svn: 147899
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.
Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.
Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.
On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.
llvm-svn: 147826
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.
In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.
llvm-svn: 147801