loops. This optimization is not turned on by default yet, but may be run
with the opt tool's -loop-reduce flag. There are many FIXMEs listed in the
code that will make it far more applicable to a wide range of code, but you
have to start somewhere :)
This limited version currently triggers on the following tests in the
MultiSource directory:
pcompress2: 7 times
cfrac: 5 times
anagram: 2 times
ks: 6 times
yacr2: 2 times
llvm-svn: 17134
change hacks off 10K of bytecode from perlbmk (.5%) even though the front-end
is not generating them yet and we are not optimizing the resultant code.
This isn't too bad.
llvm-svn: 17111
exercise that I'm not interested in tackling right now. Just punt and treat them
like unwind's.
This 'fixes' test/Regression/Transforms/ADCE/unreachable-function.ll
llvm-svn: 17106
pointer recurrences into expressions from this:
%P_addr.0.i.0 = phi sbyte* [ getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), %entry ], [ %inc.0.i, %no_exit.i ]
%inc.0.i = getelementptr sbyte* %P_addr.0.i.0, int 1 ; <sbyte*> [#uses=2]
into this:
%inc.0.i = getelementptr sbyte* getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), int %inc.0.i.rec
Actually create something nice, like this:
%inc.0.i = getelementptr [8 x sbyte]* %.str_1, int 0, int %inc.0.i.rec
llvm-svn: 16924
an instruction if it can be hoisted to a common dominator of the block.
This implements: test/Regression/Transforms/TailDup/MergeTest.ll
llvm-svn: 16758
* SubOne/AddOne functions always return ConstantInt, declare them as such
* Pull code for handling setcc X, cst, where cst is at the end of the range,
or cc is LE or GE up earlier in visitSetCondInst. This reduces #iterations
in some cases.
* Fold: (div X, C1) op C2 -> range check, implementing div.ll:test6 - test9.
llvm-svn: 16588
This takes something like this:
%A = phi int [ 3, %cond_false.0 ], [ 2, %endif.0.i ], [ 2, %endif.1.i ]
%B = div int %tmp.243, 4
and turns it into:
%A = phi int [ 3/4, %cond_false.0 ], [ 2/4, %endif.0.i ], [ 2/4, %endif.1.i ]
which is later simplified (in this case) into %A = 0.
This triggers thousands of times in spec, for example, 269 times in 176.gcc.
This is tested by InstCombine/add.ll:test23 and set.ll:test18.
llvm-svn: 16582
Instcombine (setcc (truncate X), C1).
This occurs THOUSANDS of times in many benchmarks. Particularlly common
seem to be things like (seteq (cast bool X to int), int 0)
This turns it into (seteq bool %X, false), which then becomes (not %X).
llvm-svn: 16567
This is important for several reasons:
1. Benchmarks have lots of code that looks like this (perlbmk in particular):
%tmp.2.i = setne int %tmp.0.i, 128 ; <bool> [#uses=1]
%tmp.6343 = seteq int %tmp.0.i, 1 ; <bool> [#uses=1]
%tmp.63 = and bool %tmp.2.i, %tmp.6343 ; <bool> [#uses=1]
we now fold away the setne, a clear improvement.
2. In the more important cases, such as (X >= 10) & (X < 20), we now produce
smaller code: (X-10) < 10.
3. Perhaps the nicest effect of this patch is that it really helps out the
code generators. In particular, for a 'range test' like the above,
instead of generating this on X86 (the difference on PPC is even more
pronounced):
cmp %EAX, 50
setge %CL
cmp %EAX, 100
setl %AL
and %CL, %AL
cmp %CL, 0
we now generate this:
add %EAX, -50
cmp %EAX, 50
Furthermore, this causes setcc's to be folded into branches more often.
These combinations trigger dozens of times in the spec benchmarks, particularly
in 176.gcc, 186.crafty, 253.perlbmk, 254.gap, & 099.go.
llvm-svn: 16559
Implement (setcc (shl X, C1), C2) folding.
The second one occurs several dozen times in spec. The first was added
just in case. :)
These are tested by shift.ll:test2[12], and div.ll:test5
llvm-svn: 16549
This latent bug was exposed by recent changes, and is tested as:
llvm/test/Regression/Transforms/InstCombine/2004-09-28-BadShiftAndSetCC.llx
llvm-svn: 16546
triggers often, for example:
6x in povray, 1x in gzip, 279x in gcc, 1x in crafty, 8x in eon, 11x in perlbmk,
362x in gap, 4x in vortex, 14 in m88ksim, 211x in 126.gcc, 1x in compress,
11x in ijpeg, and 4x in 147.vortex.
llvm-svn: 16521
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
assumed that a constant on the RHS of a multiplication was either an
IntConstant or an FPConstant. It checked for an IntConstant and then,
if it did not find one, did a hard cast to an FPConstant. That code
would crash if the RHS were a ConstantExpr that was neither an
IntConstant nor an FPConstant. This version replaces the hard cast
with a dyn_cast. It performs the same way for IntConstants and
FPConstants but does nothing, instead of crashing, for constant
expressions.
The regression test for this change is 2004-07-27-ConstantExprMul.ll.
llvm-svn: 15291
a bug in DSE).
* Delete dead operand uses iteratively instead of recursively, using a
SetVector.
* Defer deletion of dead operand uses until the end of processing, which means
we don't have to bother with updating the AliasSetTracker. This speeds up
DSE substantially.
llvm-svn: 15204
* Test for whether bits are shifted out during the optzn.
If so, the fold is illegal, though it can be handled explicitly for setne/seteq
This fixes the miscompilation of 254.gap last night, which was a latent bug
exposed by other optimizer improvements.
llvm-svn: 15085
actually care about. Someday when the cast instruction is gone, we can do
better here, but this will do for now. This implements
instcombine/cast.ll:test17/18 as well.
llvm-svn: 15018
This eliminates an N*N*logN algorithm from the loop simplify pass, replacing
it with a much simpler and faster alternative. In a debug build, this reduces
gccas time on eon from 85s to 42s.
llvm-svn: 14851
"load (cast foo)". This allows us to compile C++ code like this:
class Bclass {
public: virtual int operator()() { return 666; }
};
class Dclass: public Bclass {
public: virtual int operator()() { return 667; }
} ;
int main(int argc, char** argv) {
Dclass x;
return x();
}
Into this:
int %main(int %argc, sbyte** %argv) {
entry:
call void %__main( )
ret int 667
}
Instead of this:
int %main(int %argc, sbyte** %argv) {
entry:
%x = alloca "struct.std::bad_typeid" ; <"struct.std::bad_typeid"*> [#uses=3]
call void %__main( )
%tmp.1.i.i = getelementptr "struct.std::bad_typeid"* %x, uint 0, uint 0, uint 0 ; <int (...)***> [#uses=1]
store int (...)** getelementptr ([3 x int (...)*]* %vtable for Bclass, int 0, long 2), int (...)*** %tmp.1.i.i
%tmp.3.i = getelementptr "struct.std::bad_typeid"* %x, int 0, uint 0, uint 0 ; <int (...)***> [#uses=1]
store int (...)** getelementptr ([3 x int (...)*]* %vtable for Dclass, int 0, long 2), int (...)*** %tmp.3.i
%tmp.5 = load int ("struct.std::bad_typeid"*)** cast (int (...)** getelementptr ([3 x int (...)*]* %vtable for Dclass, int 0, long 2) to int
("struct.std::bad_typeid"*)**) ; <int ("struct.std::bad_typeid"*)*> [#uses=1]
%tmp.6 = call int %tmp.5( "struct.std::bad_typeid"* %x ) ; <int> [#uses=1]
ret int %tmp.6
ret int 0
}
In order words, we now resolve the virtual function call.
llvm-svn: 14783
Don't touch GEPs for which DecomposeArrayRef is not going to do anything
special (e.g., < 2 indices, or 2 indices and the last one is a constant.)
llvm-svn: 14647
186.crafty, fhourstones and 132.ijpeg.
Bugpoint makes really nasty miscompilations embarassingly easy to find. It
narrowed it down to the instcombiner and this testcase (from fhourstones):
bool %l7153_l4706_htstat_loopentry_2E_4_no_exit_2E_4(int* %i, [32 x int]* %works, int* %tmp.98.out) {
newFuncRoot:
%tmp.96 = load int* %i ; <int> [#uses=1]
%tmp.97 = getelementptr [32 x int]* %works, long 0, int %tmp.96 ; <int*> [#uses=1]
%tmp.98 = load int* %tmp.97 ; <int> [#uses=2]
%tmp.99 = load int* %i ; <int> [#uses=1]
%tmp.100 = and int %tmp.99, 7 ; <int> [#uses=1]
%tmp.101 = seteq int %tmp.100, 7 ; <bool> [#uses=2]
%tmp.102 = cast bool %tmp.101 to int ; <int> [#uses=0]
br bool %tmp.101, label %codeRepl4.exitStub, label %codeRepl3.exitStub
codeRepl4.exitStub: ; preds = %newFuncRoot
store int %tmp.98, int* %tmp.98.out
ret bool true
codeRepl3.exitStub: ; preds = %newFuncRoot
store int %tmp.98, int* %tmp.98.out
ret bool false
}
... which only has one combination performed on it:
$ llvm-as < t.ll | opt -instcombine -debug | llvm-dis
IC: Old = %tmp.101 = seteq int %tmp.100, 7 ; <bool> [#uses=1]
New = setne int %tmp.100, 0 ; <bool>:<badref> [#uses=0]
IC: MOD = br bool %tmp.101, label %codeRepl3.exitStub, label %codeRepl4.exitStub
IC: MOD = %tmp.97 = getelementptr [32 x int]* %works, uint 0, int %tmp.96 ; <int*> [#uses=1]
It doesn't get much better than this. :)
llvm-svn: 14109
collapse this:
bool %le(int %A, int %B) {
%c1 = setgt int %A, %B
%tmp = select bool %c1, int 1, int 0
%c2 = setlt int %A, %B
%result = select bool %c2, int -1, int %tmp
%c3 = setle int %result, 0
ret bool %c3
}
into:
bool %le(int %A, int %B) {
%c3 = setle int %A, %B ; <bool> [#uses=1]
ret bool %c3
}
which is handy, because the Java FE makes these sequences all over the place.
This is tested as: test/Regression/Transforms/InstCombine/JavaCompare.ll
llvm-svn: 14086
This code hadn't been updated after the "structs with more than 256 elements"
related changes to the GEP instruction. Also it was not handling the
ConstantAggregateZero class.
Now it does!
llvm-svn: 13834
into (X & (C2 << C1)) != (C3 << C1), where the shift may be either left or
right and the compare may be any one.
This triggers 1546 times in 176.gcc alone, as it is a common pattern that
occurs for bitfield accesses.
llvm-svn: 13740
in the size calculation.
This is not something you want to see:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - UNROLLING!
The problem was that 2*2147483648 == 0.
Now we get:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - TOO LARGE: 4294967296>100
Thanks to some anonymous person playing with the demo page that repeatedly
caused zion to go into swapping land. That's one way to ensure you'll get
a quick bugfix. :)
Testcase here: Transforms/LoopUnroll/2004-05-13-DontUnrollTooMuch.ll
llvm-svn: 13564
%tmp.0 = getelementptr [50 x sbyte]* %ar, uint 0, int 5 ; <sbyte*> [#uses=2]
%tmp.7 = getelementptr sbyte* %tmp.0, int 8 ; <sbyte*> [#uses=1]
together. This patch actually allows us to simplify and generalize the code.
llvm-svn: 13415