* Correct stale documentation in a few places
* Re-order the file to better associate things and reduce line count
* Make the pass thread safe by caching the Function* objects needed by the
optimizers in the pass object instead of globally.
* Provide the SimplifyLibCalls pass object to the optimizer classes so they
can access cached Function* objects and TargetData info
* Make sure the pass resets its cache if the Module passed to runOnModule
changes
* Rename CallOptimizer LibCallOptimization. All the classes are named
*Optimization while the objects are *Optimizer.
* Don't cache Function* in the optimizer objects because they could be used
by multiple PassManager's running in multiple threads
* Add an optimization for strcpy which is similar to strcat
* Add a "TODO" list at the end of the file for ideas on additional libcall
optimizations that could be added (get ideas from other compilers).
Sorry for the huge diff. Its mostly reorganization of code. That won't
happen again as I believe the design and infrastructure for this pass is
now done or close to it.
llvm-svn: 21589
call to them into an 'unreachable' instruction.
This triggers a bunch of times, particularly on gcc:
gzip: 36
gcc: 601
eon: 12
bzip: 38
llvm-svn: 21587
* MemCpyOptimization can only be optimized if the 3rd and 4th arguments are
constants and we weren't checking for that.
* The result of llvm.memcpy (and llvm.memmove) is void* not sbyte*, put in
a cast.
llvm-svn: 21570
* Have the SimplifyLibCalls pass acquire the TargetData and pass it down to
the optimization classes so they can use it to make better choices for
the signatures of functions, etc.
* Rearrange the code a little so the utility functions are closer to their
usage and keep the core of the pass near the top of the files.
* Adjust the StrLen pass to get/use the correct prototype depending on the
TargetData::getIntPtrType() result. The result of strlen is size_t which
could be either uint or ulong depending on the platform.
* Clean up some coding nits (cast vs. dyn_cast, remove redundant items from
a switch, etc.)
* Implement the MemMoveOptimization as a twin of MemCpyOptimization (they
only differ in name).
llvm-svn: 21569
named getConstantStringLength. This is the common part of StrCpy and
StrLen optimizations and probably several others, yet to be written. It
performs all the validity checks for looking at constant arrays that are
supposed to be null-terminated strings and then computes the actual
length of the string.
* Implement the MemCpyOptimization class. This just turns memcpy of 1, 2, 4
and 8 byte data blocks that are properly aligned on those boundaries into
a load and a store. Much more could be done here but alignment
restrictions and lack of knowledge of the target instruction set prevent
use from doing significantly more. That will have to be delegated to the
code generators as they lower llvm.memcpy calls.
llvm-svn: 21562
* Change signatures of OptimizeCall and ValidateCalledFunction so they are
non-const, allowing the optimization object to be modified. This is in
support of caching things used across multiple calls.
* Provide two functions for constructing and caching function types
* Modify the StrCatOptimization to cache Function objects for strlen and
llvm.memcpy so it doesn't regenerate them on each call site. Make sure
these are invalidated each time we start the pass.
* Handle both a GEP Instruction and a GEP ConstantExpr
* Add additional checks to make sure we really are dealing with an arary of
sbyte and that all the element initializers are ConstantInt or
ConstantExpr that reduce to ConstantInt.
* Make sure the GlobalVariable is constant!
* Don't use ConstantArray::getString as it can fail and it doesn't give us
the right thing. We must check for null bytes in the middle of the array.
* Use llvm.memcpy instead of memcpy so we can factor alignment into it.
* Don't use void* types in signatures, replace with sbyte* instead.
llvm-svn: 21555
* Don't use std::string for the function names, const char* will suffice
* Allow each CallOptimizer to validate the function signature before
doing anything
* Repeatedly loop over the functions until an iteration produces
no more optimizations. This allows one optimization to insert a
call that is optimized by another optimization.
* Implement the ConstantArray portion of the StrCatOptimization
* Provide a template for the MemCpyOptimization
* Make ExitInMainOptimization split the block, not delete everything
after the return instruction.
(This covers revision 1.3 and 1.4, as the 1.3 comments were botched)
llvm-svn: 21548
* Fix comments at top of file
* Change algorithm for running the call optimizations from n*n to something
closer to n.
* Use a hash_map to store and lookup the optimizations since there will
eventually (or potentially) be a large number of them. This gets lookup
based on the name of the function to O(1). Each CallOptimizer now has a
std::string member named func_name that tracks the name of the function
that it applies to. It is this string that is entered into the hash_map
for fast comparison against the function names encountered in the module.
* Cleanup some style issues pertaining to iterator invalidation
* Don't pass the Function pointer to the OptimizeCall function because if
the optimization needs it, it can get it from the CallInst passed in.
* Add the skeleton for a new CallOptimizer, StrCatOptimizer which will
eventually replace strcat's of constant strings with direct copies.
llvm-svn: 21526
calls. The pass visits all external functions in the module and determines
if such function calls can be optimized. The optimizations are specific to
the library calls involved. This initial version only optimizes calls to
exit(3) when they occur in main(): it changes them to ret instructions.
llvm-svn: 21522
Completely rework the 'setcc (cast x to larger), y' code. This code has
the advantage of implementing setcc.ll:test19 (being more general than
the previous code) and being correct in all cases.
This allows us to unxfail 2004-11-27-SetCCForCastLargerAndConstant.ll,
and close PR454.
llvm-svn: 21491
Make IPSCCP strip off dead constant exprs that are using functions, making
them appear as though their address is taken. This allows us to propagate
some more pool descriptors, lowering the overhead of pool alloc.
llvm-svn: 21363
This pass forward branches through conditions when it can show that the
conditions is either always true or false for a predecessor. This currently
only handles the most simple cases of this, but is successful at threading
across 2489 branches and 65 switch instructions in 176.gcc, which isn't bad.
llvm-svn: 21306
* Loop invariant code does not dominate the loop header, but rather
the end of the loop preheader.
* The base for a reduced GEP isn't a constant unless all of its
operands (preceding the induction variable) are constant.
* Allow induction variable elimination for the simple case after all.
Also made changes recommended by Chris for properly deleting
instructions.
llvm-svn: 20383
This does a simple form of "jump threading", which eliminates CFG edges that
are provably dead. This triggers 90 times in the external tests, and
eliminating CFG edges is always always a good thing! :)
llvm-svn: 20300
and handle incomplete control dependences correctly. This fixes:
Regression/Transforms/ADCE/dead-phi-edge.ll
-> a missed optimization
Regression/Transforms/ADCE/dead-phi-edge.ll
-> a compiler crash distilled from QT4
llvm-svn: 20227
* Properly compile this:
struct a {};
int test() {
struct a b[2];
if (&b[0] != &b[1])
abort ();
return 0;
}
to 'return 0', not abort().
llvm-svn: 19875
The second folds operations into selects, e.g. (select C, (X+Y), (Y+Z))
-> (Y+(select C, X, Z)
This occurs a few times across spec, e.g.
select add/sub
mesa: 83 0
povray: 5 2
gcc 4 2
parser 0 22
perlbmk 13 30
twolf 0 3
llvm-svn: 19706
Disable the xform for < > cases. It turns out that the following is being
miscompiled:
bool %test(sbyte %S) {
%T = cast sbyte %S to uint
%V = setgt uint %T, 255
ret bool %V
}
llvm-svn: 19628
* We can now fold cast instructions into select instructions that
have at least one constant operand.
* We now optimize expressions more aggressively based on bits that are
known to be zero. These optimizations occur a lot in code that uses
bitfields even in simple ways.
* We now turn more cast-cast sequences into AND instructions. Before we
would only do this if it if all types were unsigned. Now only the
middle type needs to be unsigned (guaranteeing a zero extend).
* We transform sign extensions into zero extensions in several cases.
This corresponds to these test/Regression/Transforms/InstCombine testcases:
2004-11-22-Missed-and-fold.ll
and.ll: test28-29
cast.ll: test21-24
and-or-and.ll
cast-cast-to-and.ll
zeroext-and-reduce.ll
llvm-svn: 19220
SimplifyCFG is one of those passes that we use for final cleanup: it should
not rely on other passes to clean up its garbage. This fixes the "why are
trivially dead setcc's in the output of gccas" problem.
llvm-svn: 19212
do not insert a prototype for malloc of: void* malloc(uint): on 64-bit u
targets this is not correct. Instead of prototype it as void *malloc(...),
and pass the correct intptr_t through the "...".
Finally, fix Regression/CodeGen/SparcV9/2004-12-13-MallocCrash.ll, by not
forming constantexpr casts from pointer to uint.
llvm-svn: 18908
in SPEC, the subsequent optimziations that we are after don't play with
with FP values, so disable this xform for them. Really we just don't want
stuff like:
double G; (always 0 or 412312.312)
= G;
turning into:
bool G_b;
= G_b ? 412312.312 : 0;
We'd rather just do the load.
-Chris
llvm-svn: 18819
down to actually BE a bool. This allows simple value range propagation
stuff work harder, deleting comparisons in bzip2 in some hot loops.
This implements GlobalOpt/integer-bool.ll, which is the essence of the
loop condition distilled into a testcase.
llvm-svn: 18817
1. Actually increment the Statistic for the GV elim optzn
2. When resolving undef branches, only resolve branches in executable blocks,
avoiding marking a bunch of completely dead blocks live. This has a big
impact on the quality of the generated code.
With this patch, we positively rip up vortex, compiling Ut_MoveBytes to a
single memcpy call. In vortex we get this:
12 ipsccp - Number of globals found to be constant
986 ipsccp - Number of arguments constant propagated
1378 ipsccp - Number of basic blocks unreachable
8919 ipsccp - Number of instructions removed
llvm-svn: 18796
This implements SCCP/ipsccp-basic.ll, rips apart Olden/mst (as described in
PR415), and does other nice things.
There is still more to come with this, but it's a start.
llvm-svn: 18752
successor block. This turns cases like this:
x = a op b
if (c) {
use x
}
into:
if (c) {
x = a op b
use x
}
This triggers 3965 times in spec, and is tested by
Regression/Transforms/InstCombine/sink_instruction.ll
This appears to expose a bug in the X86 backend for 177.mesa, which I'm
looking in to.
llvm-svn: 18677
in scary and unknown ways before we promote it. This fixes the miscompilation
of 188.ammp that has been plauging us since a globalopt patch went in.
Thanks a ton to Tanya for helping me diagnose the problem!
llvm-svn: 18418
if (x) {
code
...
} else {
code
...
}
Turn it into:
code
if (x) {
...
} else {
...
}
This reduces code size and in some common cases allows us to completely
eliminate the conditional. This turns several if/then/else blocks in loops
into straightline code in 179.art, turning the loops into single basic blocks
(good for modsched even!).
Maybe now brg will leave me alone ;-)
llvm-svn: 18366
1. Speedup getValueState by having it not consider Arguments. It's better
to just add them before we start SCCP'ing.
2. SCCP can delete the contents of dead blocks. No really, it's ok! This
reduces the size of the IR for subsequent passes, even though
simplifycfg would do the same job. In practice, simplifycfg does not
run until much later than sccp in gccas
llvm-svn: 17820
class. The only changes are minor:
* Do not try to SCCP instructions that return void in the rewrite loop.
This is silly and fool hardy, wasting a map lookup and adding an entry
to the map which is never used.
* If we decide something has an undefined value, rewrite it to undef,
potentially leading to further simplications.
llvm-svn: 17816
value. This allows us to turn more globals into constants and eliminate them.
This patch implements GlobalOpt/load-store-global.llx.
Note that this patch speeds up 255.vortex from:
Output/255.vortex.out-cbe.time:program 7.640000
Output/255.vortex.out-llc.time:program 9.810000
to:
Output/255.vortex.out-cbe.time:program 7.250000
Output/255.vortex.out-llc.time:program 9.490000
Which isn't bad at all!
llvm-svn: 17746
If this happens, detect it early instead of relying on instcombine to notice
it later. This can be a big speedup, because PHI nodes can have many
incoming values.
llvm-svn: 17741
%X = alloca ...
%Y = alloca ...
X == Y
into false. This allows us to simplify some stuff in eon (and probably
many other C++ programs) where operator= was checking for self assignment.
Folding this allows us to SROA several additional structs.
llvm-svn: 17735
constant value. This makes the return value dead and allows for
simplification in the caller.
This implements IPConstantProp/return-constant.ll
This triggers several dozen times throughout SPEC.
llvm-svn: 17730
of the array is just two. This occurs 8 times in gcc, 6 times in crafty, and
12 times in 099.go.
This implements ScalarRepl/sroa_two.ll
llvm-svn: 17727
argument pointers. This is only valid to do if the function already
unconditionally loaded an argument or if the pointer passed in is known
to be valid. Make sure to do the required checks.
This fixed ArgumentPromotion/control-flow.ll and the Burg program.
llvm-svn: 17718
for (X * C1) + (X * C2) (where * can be mul or shl), allowing us to fold:
Y+Y+Y+Y+Y+Y+Y+Y
into
%tmp.8 = shl long %Y, ubyte 3 ; <long> [#uses=1]
instead of
%tmp.4 = shl long %Y, ubyte 2 ; <long> [#uses=1]
%tmp.12 = shl long %Y, ubyte 2 ; <long> [#uses=1]
%tmp.8 = add long %tmp.4, %tmp.12 ; <long> [#uses=1]
This implements add.ll:test25
Also add support for (X*C1)-(X*C2) -> X*(C1-C2), implementing sub.ll:test18
llvm-svn: 17704