nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
llvm-svn: 27984
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
llvm-svn: 25366
it doesn't contain any calls. This is a fairly common case for C++ code,
so it will probably speed up the inliner marginally in these cases.
llvm-svn: 25284
function was not an alloca, we wouldn't check the entry block for any allocas,
leading to increased stack space in some cases. In practice, allocas are almost
always at the top of the block, so this was never noticed.
llvm-svn: 25280
has a single def. In this case, look for uses that are dominated by the def
and attempt to rewrite them to directly use the stored value.
This speeds up mem2reg on these values and reduces the number of phi nodes
inserted. This should address PR665.
llvm-svn: 24411
into the LLVMAnalysis library.
This allows LLVMTranform and LLVMTransformUtils to be archives and linked
with LLVMAnalysis.a, which provides any missing definitions.
llvm-svn: 24036
SparcV9 JIT.
2. Make LLVMTransformUtils a relinked object file and always link it before
LLVMAnalysis.a. These two libraries have circular dependencies on each
other which creates problem when building the SparcV9 JIT. This change
fixes the dependency on all platforms problems with a minimum of fuss.
llvm-svn: 24023
not define a value that is used outside of it's block. This catches many
more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc.
This implements branch-phi-thread.ll:test3.ll
llvm-svn: 23397
Instead, just update the BB in-place. This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.
llvm-svn: 22765
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
llvm-svn: 22664
consideration the case where a reference in an unreachable block could
occur. This fixes Transforms/SimplifyCFG/2005-08-01-PHIUpdateFail.ll,
something I ran into while bugpoint'ing another pass.
llvm-svn: 22584
The optimization for locally used allocas was not safe for allocas that
were read before they were written. This change disables that optimization
in that case.
llvm-svn: 22318
sinh, cosh, etc.
* Make the name comparisons for the fp libcalls a little more efficient by
switching on the first character of the name before doing comparisons.
llvm-svn: 21611
This does a simple form of "jump threading", which eliminates CFG edges that
are provably dead. This triggers 90 times in the external tests, and
eliminating CFG edges is always always a good thing! :)
llvm-svn: 20300
SimplifyCFG is one of those passes that we use for final cleanup: it should
not rely on other passes to clean up its garbage. This fixes the "why are
trivially dead setcc's in the output of gccas" problem.
llvm-svn: 19212
if (x) {
code
...
} else {
code
...
}
Turn it into:
code
if (x) {
...
} else {
...
}
This reduces code size and in some common cases allows us to completely
eliminate the conditional. This turns several if/then/else blocks in loops
into straightline code in 179.art, turning the loops into single basic blocks
(good for modsched even!).
Maybe now brg will leave me alone ;-)
llvm-svn: 18366
unneccesary. This allows us to delete several hundred phi nodes of the
form PHI(x,x,x,undef) from 253.perlbmk and probably other programs as well.
This implements Mem2Reg/UndefValuesMerge.ll
llvm-svn: 17098
whose addresses where used by trivial phi nodes and select instructions. This
is now performed by the instcombine pass, which is more powerful, is much
simpler, and is faster. This allows the deletion of a bunch of code, two
FIXME's and two gotos.
llvm-svn: 16406
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
block (common in a switch), make sure to remove extra edges in successor
blocks. This fixes CodeExtractor/2004-08-12-BlockExtractPHI.ll and should
be pulled into LLVM 1.3 (though the regression test need not be, as that
would require pulling in the LoopExtract.cpp changes).
llvm-svn: 15717
since May 1st. In this code, the pred iterator was being invalidated sometimes
causing the wrong entries to be added to PHI nodes.
The fix for this is to defererence and safe the *PI value before we hack on
branch instructions, which changes use/def chains, which SOMETIMES invalidates
the iterator.
llvm-svn: 14278
non-deterministic things like the ordering of blocks in the dominance
frontier of a BB. Unfortunately, I don't know of a better way to solve
this problem than to explicitly sort the BB's in function-order before
processing them. This is guaranteed to slow the pass down a bit, but
is absolutely necessary to get usable diffs between two different tools
executing the mem2reg or scalarrepl pass.
Before this, bazillions of spurious diff failures occurred all over the
place due to the different order of processing PHIs:
- %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.0, uint 0, uint 0
+ %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.1, uint 0, uint 0
Now, the diffs match.
llvm-svn: 14244
nondeterministic results that depend on where these objects land in memory.
Instead, sort by the value of the constant, which is stable.
Before this patch, the -simplifycfg pass run from two different compilers
could cause different code to be generated, though it was semantically the
same:
@@ -12258,8 +12258,8 @@
%s_addr.1 = phi sbyte* [ %s, %entry ], [ %inc.0, %no_exit ] ; <sbyte*> [#uses=5]
%tmp.1 = load sbyte* %s_addr.1 ; <sbyte> [#uses=1]
switch sbyte %tmp.1, label %no_exit [
- sbyte 0, label %loopexit
sbyte 46, label %loopexit
+ sbyte 0, label %loopexit
]
We need to stomp all of this stuff out.
llvm-svn: 14243
is write an autoconf macro that checks whether __isnan or isnan actually works
**using the C++ compiler after #include <cmath>**, instead of doing it the easy
way with AC_CHECK_FUNCS().
llvm-svn: 14171
Add support for acos/asin/atan. 188.ammp contains three calls to acos with
constant arguments. Constant folding it allows elimination of those 3 calls
and three FP divisions of the results.
llvm-svn: 13821
CloneTrace, and because it is primarily an operation on ValueMaps. It
is now a global (non-static) function which can be pulled in using
ValueMapper.h.
llvm-svn: 13600
PHI node entries from multiple outside-the-region blocks. This also fixes
extraction of the entry block in a function. Yaay.
This has successfully block extracted all (but one) block from the score_move
function in obsequi (out of 33). Hrm, I wonder which block the bug is in. :)
llvm-svn: 13489
* Add a stub for the severSplitPHINodes which will allow us to bbextract
bb's with PHI nodes in them soon.
* Remove unused arguments from findInputsOutputs
* Dramatically simplify the code in findInputsOutputs. In particular,
nothing really cares whether or not a PHI node is using something.
* Move moveCodeToFunction to after emitCallAndSwitchStatement as that's the
order they get called.
* Fix a bug where we would code extract a region that included a call to
vastart. Like 'alloca', calls to vastart must stay in the function that
they are defined in.
* Add some comments.
llvm-svn: 13482
from the extracted region. If the return has 0 or 1 exit blocks, the new
function returns void. If it has 2 exits, it returns bool, otherwise it
returns a ushort as before.
This allows us to use a conditional branch instruction when there are two
exit blocks, as often happens during block extraction.
llvm-svn: 13481
1. Get rid of the silly abort block. When doing bb extraction, we get one
abort block for every block extracted, which is kinda annoying.
2. If the switch ends up having a single destination, turn it into an
unconditional branch.
I would like to add support for conditional branches, but to do this we will
want to have the function return a bool instead of a ushort.
llvm-svn: 13478
Basically we were using SimplifyCFG as a huge sledgehammer for a simple
optimization. Because simplifycfg does so many things, we can't use it
for this purpose.
llvm-svn: 12977
1. Names were not put on the new arguments created (ok, this just helps sanity :)
2. Fix outgoing pointer values
3. Do not insert stores for values that had not been computed
4. Fix some wierd problems with the outset calculation
This fixes CodeExtractor/2004-03-14-DominanceProblem.ll, making the extractor
work on at least one simple case!
llvm-svn: 12484
Simplify the input/output finder. All elements of a basic block are
instructions. Any used arguments are also inputs. An instruction can only
be used by another instruction.
llvm-svn: 12405
* Don't insert a branch to the switch instruction after the call, just
make it a single block.
* Insert the new alloca instructions in the entry block of the original
function instead of having them execute dynamically
* Don't make the default edge of the switch instruction go back to the switch.
The loop extractor shouldn't create new loops!
* Give meaningful names to the alloca slots and the reload instructions
* Some minor code simplifications
llvm-svn: 12402
This also implements a two minor improvements:
* Don't insert live-out stores IN the region, insert them on the code path
that exits the region
* If the region is exited to the same block from multiple paths, share the
switch statement entry, live-out store code, and the basic block.
llvm-svn: 12401
a member of the class. While we're at it, turn the collection into a set
instead of a vector to improve efficiency and make queries simpler.
llvm-svn: 12400
This case occurs many times in various benchmarks, especially when combined
with the previous patch. This allows it to get stuff like:
if (X == 4 || X == 3)
if (X == 5 || X == 8)
and
switch (X) {
case 4: case 5: case 6:
if (X == 4 || X == 5)
llvm-svn: 11797
allowed in invoke instructions. Thus, if we are inlining a call to an intrinsic
function into an invoke site, we don't need to turn the call into an invoke!
llvm-svn: 11384
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
llvm-svn: 11307
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
llvm-svn: 11118
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
llvm-svn: 11116
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
llvm-svn: 11108
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
llvm-svn: 11107
Added assert() to ensure symbol table is well formed.
Added code to remember the value that was found; resolving types can change
the symbol table and invalidate the value of the iterator.
Added comments to the ResolveTypes() function (mainly for my own benefit).
Please feel free to correct the comments if they are not accurate.
llvm-svn: 9693
PHI node entries for unwind instructions just like for call instructions which
became invokes! This fixes PR57, tested by
Inline/2003-10-26-InlineInvokeExceptionDestPhi.ll
llvm-svn: 9526
break dominance relationships, and is otherwise bad. This fixes bug:
Inline/2003-10-13-AllocaDominanceProblem.ll. This also fixes miscompilation
of 3 176.gcc source files (reload1.c, global.c, flow.c)
llvm-svn: 9109
Running the inliner on 252.eon used to take 48.4763s, now it takes 14.4148s.
In release mode, it went from taking 25.8741s to taking 11.5712s.
This also fixes a FIXME.
llvm-svn: 8890
"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
llvm-svn: 8884
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
llvm-svn: 8881
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
llvm-svn: 8863
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
llvm-svn: 8861
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
llvm-svn: 8858
* Eliminate the KillList instance variable, instead, just delete loads and
stores as they are "renamed", and delete allocas when they are done
* Make the 'visited' set an instance variable to avoid passing it on the stack.
llvm-svn: 8857
... by making sure to update PHI nodes to take into consideration the
extra edges we get if we inline a call instruction through an invoke.
llvm-svn: 8664