Problem: LLVM needs more function attributes than currently available (32 bits).
One such proposed attribute is "address_safety", which shows that a function is being checked for address safety (by AddressSanitizer, SAFECode, etc).
Solution:
- extend the Attributes from 32 bits to 64-bits
- wrap the object into a class so that unsigned is never erroneously used instead
- change "unsigned" to "Attributes" throughout the code, including one place in clang.
- the class has no "operator uint64 ()", but it has "uint64_t Raw() " to support packing/unpacking.
- the class has "safe operator bool()" to support the common idiom: if (Attributes attr = getAttrs()) useAttrs(attr);
- The CTOR from uint64_t is marked explicit, so I had to add a few explicit CTOR calls
- Add the new attribute "address_safety". Doing it in the same commit to check that attributes beyond first 32 bits actually work.
- Some of the functions from the Attribute namespace are worth moving inside the class, but I'd prefer to have it as a separate commit.
Tested:
"make check" on Linux (32-bit and 64-bit) and Mac (10.6)
built/run spec CPU 2006 on Linux with clang -O2.
This change will break clang build in lib/CodeGen/CGCall.cpp.
The following patch will fix it.
llvm-svn: 148553
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:
- Always used properlyDominates to check safe code hoisting.
- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.
- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.
- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.
Fixes PR11783: SCEVExpander assert.
llvm-svn: 148535
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.
llvm-svn: 148288
Message for r148132:
LoopUnswitch: All helper data that is collected during loop-unswitch iterations was moved to separated class (LUAnalysisCache).
llvm-svn: 148215
the optimizer doesn't eliminate objc_retainBlock calls which are needed
for their side effect of copying blocks onto the heap.
This implements rdar://10361249.
llvm-svn: 148076
1. Size heuristics changed. Now we calculate number of unswitching
branches only once per loop.
2. Some checks was moved from UnswitchIfProfitable to
processCurrentLoop, since it is not changed during processCurrentLoop
iteration. It allows decide to skip some loops at an early stage.
Extended statistics:
- Added total number of instructions analyzed.
llvm-svn: 147935
with other symbols.
An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>
llvm-svn: 147899
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.
Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.
Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.
On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.
llvm-svn: 147826
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.
In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.
llvm-svn: 147801
This collects a set of IV uses within the loop whose values can be
computed relative to each other in a sequence. Following checkins will
make use of this information.
llvm-svn: 147797
This will be more important as we extend the LSR pass in ways that don't rely on the formula solver. In particular, we need it for constructing IV chains.
llvm-svn: 147724
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.
llvm-svn: 147718
present in the bottom of the CFG triangle, as the transformation isn't
ever valuable if the branch can't be eliminated.
Also, unify some heuristics between SimplifyCFG's multiple
if-converters, for consistency.
This fixes rdar://10627242.
llvm-svn: 147630
code can incorrectly move the load across a store. This never
happens in practice today, but only because the current
heuristics accidentally preclude it.
llvm-svn: 147623
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.
Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.
llvm-svn: 147327
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
llvm-svn: 147254
performance regressions (both execution-time and compile-time) on our
nightly testers.
Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.
llvm-svn: 147131
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
llvm-svn: 146610
This should always be done as a matter of principal. I don't have a
case that exposes the problem. I just noticed this recently while
scanning the code and realized I meant to fix it long ago.
llvm-svn: 146438
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
detected in the forward-CFG DFS. This prevents the reverse-CFG from
visiting blocks inside loops after blocks that dominate them in the
case where loops have multiple exits.
No testcase, because this fixes a bug which in practice only shows
up in a full optimizer run, due to the use-list order.
This fixes rdar://10422791 and others.
llvm-svn: 146408
indicates whether the intrinsic has a defined result for a first
argument equal to zero. This will eventually allow these intrinsics to
accurately model the semantics of GCC's __builtin_ctz and __builtin_clz
and the X86 instructions (prior to AVX) which implement them.
This patch merely sets the stage by extending the signature of these
intrinsics and establishing auto-upgrade logic so that the old spelling
still works both in IR and in bitcode. The upgrade logic preserves the
existing (inefficient) semantics. This patch should not change any
behavior. CodeGen isn't updated because it can use the existing
semantics regardless of the flag's value.
Note that this will be followed by API updates to Clang and DragonEgg.
Reviewed by Nick Lewycky!
llvm-svn: 146357
Since we're not rewriting IVs in other loops, there's not much reason
to consider their stride when generating formulae.
This should reduce the number of useless formulas considered by LSR.
llvm-svn: 146302
Patch by Brendon Cahoon!
This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.
llvm-svn: 146245
- Walking over pred_begin/pred_end is an expensive operation.
- PHINodes contain a value for each predecessor anyway.
- While it may look like we used to save a few iterations with the set,
be aware that getIncomingValueForBlock does a linear search on
the values of the phi node.
- Another -5% on ARMDisassembler.cpp (Release build). This was the last
entry in the profile that was obviously wasting time.
llvm-svn: 145937
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.
llvm-svn: 145906
- Calling getUser in a loop is much more expensive than iterating over a few instructions.
- Use it instead of the open-coded loop in AddrModeMatcher.
- 5% speedup on ARMDisassembler.cpp Release builds.
llvm-svn: 145810
The callee is usually smaller than the caller, too. This reduces the compile
time of ARMDisassembler.cpp by 32% (Release build). It still takes ages to
compile though.
llvm-svn: 145690
weak variable are compiled by different compilers, such as GCC and LLVM, while
LLVM may increase the alignment to the preferred alignment there is no reason to
think that GCC will use anything more than the ABI alignment. Since it is the
GCC version that might end up in the final program (as the linkage is weak), it
is wrong to increase the alignment of loads from the global up to the preferred
alignment as the alignment might only be the ABI alignment.
Increasing alignment up to the ABI alignment might be OK, but I'm not totally
convinced that it is. It seems better to just leave the alignment of weak
globals alone.
llvm-svn: 145413
gcc, though I thought it was older (my gcc 4.4 has it as a local patch. Whoops!)
This fixes PR10589.
Also add some debugging statements.
Remove GcnoFiles, the mapping from CompilationUnit to raw_ostream. Now that we
start by iterating over each CU and descending into them, there's no need to
maintain a mapping.
llvm-svn: 145208
The right way to check for a binary operation is
cast<BinaryOperator>. The original check: cast<Instruction> &&
numOperands() == 2 would match phi "instructions", leading to an
infinite loop in extreme corner case: a useless phi with operands
[self, constant] that prior optimization passes failed to remove,
being used in the loop by another useless phi, in turn being used by an
lshr or udiv.
Fixes PR11350: runaway iteration assertion.
llvm-svn: 144935
lead to it trying to re-mark a value marked as a constant with a different value. It also appears to trigger very rarely.
Fixes PR11357.
llvm-svn: 144352
Size of data being pointed to wasn't always being checked so some small writes were killing big writes
Fixes <rdar://problem/10426753>
llvm-svn: 144312
Currently checks alignment and killing stores on a power of 2 boundary as this is likely
to trim the size of the earlier store without breaking large vector stores into scalar ones.
Fixes <rdar://problem/10140300>
llvm-svn: 144239
Only currently done if the later store is writing to a power of 2 address or
has the same alignment as the earlier store as then its likely to not break up
large stores into smaller ones
Fixes <rdar://problem/10140300>
llvm-svn: 143630
We've been hitting asserts in this code due to the many supported
combintions of modes (iv-rewrite/no-iv-rewrite) and IV types. This
second rewrite of the code attempts to deal with these cases systematically.
llvm-svn: 143546
instructions.
This doesn't introduce any optimizations we weren't doing before (except
potentially due to pass ordering issues), now passes will eliminate them sooner
as part of their own cleanups.
llvm-svn: 142787
element types, even though the element extraction code does. It is surprising
that this bug has been here for so long. Fixes <rdar://problem/10318778>.
llvm-svn: 142740
combining of the landingpad instruction. The ObjC personality function acts
almost identically to the C++ personality function. In particular, it uses
"null" as a "catch-all" value.
llvm-svn: 142256
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
profile metadata at the same time. Use it to preserve metadata attached
to a branch when re-writing it in InstCombine.
Add metadata to the canonicalize_branch InstCombine test, and check that
it is tranformed correctly.
Reviewed by Nick Lewycky!
llvm-svn: 142168
I rewrote the algorithm a while back so it doesn't require map lookup,
but neglected to change the data structure. This was caught by
llvm-gcc self host, not because there's anything special about
llvm-gcc, but because it is the only test for nondeterminism we
currently have. Unit tests don't work well for everything; we should
always try to have a nondeterminism stress test running.
Fixes PR11133: llvm-gcc self host .o mismatch after enable-iv-rewrite=false
llvm-svn: 142036
Someone more familiar with LSR should double-check that the extra cast is actually doing the right thing in the overflow cases; I'm not completely confident that's that case.
llvm-svn: 141916
would have never worked, since the element type of a vector type is never a
vector type. Also fix the conditional to be more direct in checking whether
EltTy is a vector type.
llvm-svn: 141713
IVs.
Indvars previously chose randomly between congruent IVs. Now it will
bias the decision toward IVs that SCEVExpander likes to create. This
was not done to fix any problem, it's just a welcome side effect of
factoring code.
llvm-svn: 141633
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
llvm-svn: 141599
switch (n) {
case 27:
do_something(x);
...
}
the call do_something(x) will be replaced with do_something(27). In
gcc-as-one-big-file this results in the removal of about 500 lines of
bitcode (about 0.02%), so has about 1/10 of the effect of propagating
branch conditions.
llvm-svn: 141360
While I'm here, fix the related issue with strncmp, add some actual tests for strcmp and strncmp, and start using StringRef::compare for constant folding instead of using strcmp/strncmp so that the optimized IR isn't dependent on the host's implementation of strcmp.
llvm-svn: 141227
Just pull the instruction name, but don't change the order of anything
else. That keeps --debug happy and non-crashing, but doesn't change
how the worklist gets built.
llvm-svn: 141210
When updating the worklist for InstCombine, the Add/AddUsersToWorklist
functions may access the instruction(s) being added, for debug output for
example. If the instructions aren't yet added to the basic block, this
can result in a crash. Finish the instruction transformation before
adjusting the worklist instead.
rdar://10238555
llvm-svn: 141203
branch "br i1 %x, label %if_true, label %if_false" then it replaces
"%x" with "true" in places only reachable via the %if_true arm, and
with "false" in places only reachable via the %if_false arm. Except
that actually it doesn't: if value numbering shows that %y is equal
to %x then, yes, %y will be turned into true/false in this way, but
any occurrences of %x itself are not transformed. Fix this. What's
more, it's often the case that %x is an equality comparison such as
"%x = icmp eq %A, 0", in which case every occurrence of %A that is
only reachable via the %if_true arm can be replaced with 0. Implement
this and a few other variations on this theme. This reduces the number
of lines of LLVM IR in "GCC as one big file" by 0.2%. It has a bigger
impact on Ada code, typically reducing the number of lines of bitcode
by around 0.4% by removing repeated compiler generated checks. Passes
the LLVM nightly testsuite and the Ada ACATS testsuite.
llvm-svn: 141177
it's OK for the false/true destination to have multiple
predecessors as long as the extra ones are dominated by
the branch destination.
llvm-svn: 141176
This handles the case in which LSR rewrites an IV user that is a phi and
splits critical edges originating from a switch.
Fixes <rdar://problem/6453893> LSR is not splitting edges "nicely"
llvm-svn: 141059
We want heuristics to be based on accurate data, but more importantly
we don't want llvm to behave randomly. A benign trunc inserted by an
upstream pass should not cause a wild swings in optimization
level. See PR11034. It's a general problem with threshold-based
heuristics, but we can make it less bad.
llvm-svn: 140919
catch or repeated filter clauses. Teach instcombine a bunch
of tricks for simplifying landingpad clauses. Currently the
code only recognizes the GNU C++ and Ada personality functions,
but that doesn't stop it doing a bunch of "generic" transforms
which are hopefully fine for any real-world personality function.
If these "generic" transforms turn out not to be generic, they
can always be conditioned on the personality function. Probably
someone should add the ObjC++ personality function. I didn't as
I don't know anything about it.
llvm-svn: 140852
Rewriting the entire loop nest now requires -enable-lsr-nested.
See PR11035 for some performance data.
A few unit tests specifically test nested LSR, and are now under a flag.
llvm-svn: 140762
The minor bug heuristic was noticed by inspection. I added the
isLoser/isValid helpers because they will become more
important with subsequent checkins.
llvm-svn: 140580
The landing pad must accompany the invoke when it's extracted. However, if it
does, then the loop isn't properly extracted. I.e., the resulting extraction has
a loop in it. The extracted function is then extracted, etc. resulting in an
infinite loop.
llvm-svn: 140193
extract its associated landing pad block as well. However, that landing pad
block may have more than one predecessor. So split the landing pad block so that
individual landing pads have only one predecessor.
This type of transformation may produce a false positive with bugpoint.
llvm-svn: 140173
extract the landing pad block. Otherwise, there will be a situation where the
invoke's unwind edge lands on a non-landing pad.
We also forbid the user from extracting the landing pad block by itself. Again,
this is not a valid transformation.
llvm-svn: 140083
No tests; these changes aren't really interesting in the sense that the logic is the same for volatile and atomic.
I believe this completes all of the changes necessary for the optimizer to handle loads and stores correctly. I'm going to try and come up with some additional testing, though.
llvm-svn: 139533
better.
Don't immediately give up when an add operation can't be trivially
sign/zero-extended within a loop. If it has NSW/NUW flags, generate a
new expression with sign extended (non-recurrent) operand. As before,
if SCEV says that all sign extends are loop invariant, then we can
widen the operation.
llvm-svn: 139453
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
This changes loop unrolling to use the same mechanism for trip count
computation as indvars. This is a stronger check that tends to unroll
more loops. A very common side-effect is that many single iteration
loops will be removed sooner. The real goal was simply to remove
dependence on canonical IVs.
x86 is break even.
ARM performance changes to expect (+ is good):
External/SPEC/CFP2000/183.equake/183.equake +13%
SingleSource/Benchmarks/Dhrystone/fldry +21%
MultiSource/Applications/spiff/spiff +3%
SingleSource/Benchmarks/Stanford/Puzzle -14%
The Puzzle regression is actually an improvement in loop optimization
that defeats GVN: rdar://problem/10065079.
llvm-svn: 139009
The landingpad instruction is required in the landing pad block. Because we're
not deleting terminating instructions, the invoke may still jump to here (see
Transforms/SCCP/2004-11-16-DeadInvoke.ll). Remove all uses of the landingpad
instruction, but keep it around until code-gen can remove the basic block.
llvm-svn: 138890
In theory this could be extended to other instructions, eg. division by zero, but it's likely that it will "miscompile" some code because people depend on div by zero not trapping. NULL pointer dereference usually leads to a crash so we should be on the safe side.
This shrinks the size of a Release clang by 16k on x86_64.
llvm-svn: 138618
We have to be careful when splitting the landing pad block, because the
landingpad instruction is required to remain as the first non-PHI of an invoke's
unwind edge. To retain this, we split the block into two blocks, moving the
predecessors within the loop to one block and the remaining predecessors to the
other. The landingpad instruction is cloned into the new blocks.
llvm-svn: 138015
SplitLandingPadPredecessors is similar to SplitBlockPredecessors in that it
splits the current block and attaches a set of predecessors to the new basic
block. However, it differs from SplitBlockPredecessors in that it's specifically
designed to handle landing pad blocks.
Two new basic blocks are created: one that is has the vector of predecessors as
its predecessors and one that has the remaining predecessors as its
predecessors. Those two new blocks then receive a cloned copy of the landingpad
instruction from the original block. The landingpad instructions are joined in a
PHI, etc. Like SplitBlockPredecessors, it updates the LLVM IR, AliasAnalysis,
DominatorTree, DominanceFrontier, LoopInfo, and LCCSA analyses.
llvm-svn: 138014
PRE needs the landing pads to have their critical edges split. Doing this for a
landing pad is non-trivial. Abandon the attempt to perform PRE when we come
across a landing pad. (Reviewed by Owen!)
llvm-svn: 137876
One way to exit the loop is through an unwind edge. However, that may involve
splitting the critical edge of the landing pad, which is non-trivial. Prevent
the transformation from rewriting the landing pad exit loop block.
llvm-svn: 137871
making random bad assumptions about instructions which are not explicitly listed.
Includes fix for rdar://9956541, a version of "undef ^ undef should return
0 because it's easier than arguing with users".
llvm-svn: 137777
This commit includes a mention of the landingpad instruction, but it's not
changing the behavior around it. I think the current behavior is correct,
though. Bill, can you double-check that?
llvm-svn: 137691
This builds off of the current scheme, but instead of llvm.eh.exception and
llvm.eh.selector, it uses the landingpad instruction. And instead of
llvm.eh.resume, it uses the resume instruction.
Because of the invariants in the landing pad instruction, a lot of code that's
currently needed to find the appropriate intrinsic calls for an invoke
instruction won't be needed once we go to the new EH scheme. The "FIXME"s tell
us what to remove after we switch.
llvm-svn: 137576
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
the retains and releases all use the same SSA pointer value.
Also, don't let CFG hazards disrupt nested retain+release pair
optimizations.
llvm-svn: 137399
SCEV unrolling can unroll loops with arbitrary induction variables. It
is a prerequisite for -disable-iv-rewrite performance. It is also
easily handles loops of arbitrary structure including multiple exits
and is generally more robust.
This is under a temporary option to avoid affecting default
behavior for the next couple of weeks. It is needed so that I can
checkin unit tests for updateUnloop.
llvm-svn: 137384
based on ScalarEvolution without changing the induction variable phis.
This utility is the main tool of IndVarSimplifyPass, but the pass also
restructures induction variables in strange ways that are sensitive to
pass ordering. This provides a way for other loop passes to simplify
new uses of induction variables created during transformation. The
utility may be used by any pass that preserves ScalarEvolution. Soon
LoopUnroll will use it.
The net effect in this checkin is to cleanup the IndVarSimplify pass
by factoring out the SimplifyIndVar algorithm into a standalone utility.
llvm-svn: 137197
These are not individual bug fixes. I had to rewrite a good chunk of
the unroller to make it sane. I think it was getting lucky on trivial
completely unrolled loops with no early exits. I included some fairly
simple unit tests for partial unrolling. I didn't do much stress
testing, so it may not be perfect, but should be usable now.
llvm-svn: 137190
The 'unwind' instruction was acting essentially as a placeholder, because it
would be replaced at the end of this function by a branch to the "unwind
handler". The 'unwind' instruction is going away, so use 'unreachable' instead,
which serves the same purpose as a placeholder.
llvm-svn: 137098
recurrence, the initial values low bits can sometimes be ignored.
To take advantage of this, added FoldIVUser to IndVarSimplify to fold
an IV operand into a udiv/lshr if the operator doesn't affect the
result.
-indvars -disable-iv-rewrite now transforms
i = phi i4
i1 = i0 + 1
idx = i1 >> (2 or more)
i4 = i + 4
into
i = phi i4
idx = i0 >> ...
i4 = i + 4
llvm-svn: 137013
inlined variable, based on the discussion in PR10542.
This explodes the runtime of several passes down the pipeline due to
a large number of "copies" remaining live across a large function. This
only shows up with both debug and opt, but when it does it creates
a many-minute compile when self-hosting LLVM+Clang. There are several
other cases that show these types of regressions.
All of this is tracked in PR10542, and progress is being made on fixing
the issue. Once its addressed, the re-instated, but until then this
restores the performance for self-hosting and other opt+debug builds.
Devang, let me know if this causes any trouble, or impedes fixing it in
any way, and thanks for working on this!
llvm-svn: 136953
- use SmallVectorImpl& for the function argument.
- ignore the operands on the GEP, even if they aren't constant! Much as we
pretend the malloc succeeds, we pretend that malloc + whatever-you-GEP'd-by
is not null. It's magic!
llvm-svn: 136757
Don't replace a gep/bitcast with 'undef' because that will form a "free(undef)"
which in turn means "unreachable". What we wanted was a no-op. Instead, analyze
the whole tree and look for all the instructions we need to delete first, then
delete them second, not relying on the use_list to stay consistent.
llvm-svn: 136752
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589