- Looking at the number of sign bits of the a sext instruction to determine whether new trunc + sext pair should be added when its source is being evaluated in a different type.
llvm-svn: 62263
sequences in SPUDAGToDAGISel.cpp and SPU64InstrInfo.td, killing custom
DAG node types as needed.
- i64 mul is now a legal instruction, but emits an instruction sequence
that stretches tblgen and the imagination, as well as violating laws of
several small countries and most southern US states (just kidding, but
looking at a function with 80+ parameters is really weird and just plain
wrong.)
- Update tests as needed.
llvm-svn: 62254
frame index. eliminateFrameIndex will replace these instructions with
(LDWSP|STWSP|LDAWSP) or (LDW|STW|LDAWF) if a frame pointer is in use.
This fixes PR 3324. Previously we used LDWSP, STWSP, LDAWSP before frame
pointer elimination. However since they were marked as implicitly using
SP they could not be rematerialised.
llvm-svn: 62238
my earlier patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Also, the mechanism for keeping SCEV's corresponding to GEP's
no longer works, as the GEP might change after its SCEV
is remembered, invalidating the SCEV, and we might get a bad
SCEV value when looking up the GEP again for a later loop.
This also couldn't happen before, as we weren't recursing
into GEP's outside the loop.
Also, when we build an expression that involves a (possibly
non-affine) IV from a different loop as well as an IV from
the one we're interested in (containsAddRecFromDifferentLoop),
don't recurse into that. We can't do much with it and will
get in trouble if we try to create new non-affine IVs or something.
More testcases are coming.
llvm-svn: 62212
vector and extraneous loop over it, 2) not delete globals used by
phis/selects etc which could actually be useful. This fixes PR3321.
Many thanks to Duncan for narrowing this down.
llvm-svn: 62201
to Eli for pointing out that these forms don't ignore the high bits of
their index operands, and as such are not immediately suitable for use
by isel.
llvm-svn: 62194
scheduling dependencies. Add assertion checks to help catch
this.
It appears the Mips target defaults to list-td, and it has a
regression test that uses a physreg dependence. Such code was
liable to be miscompiled, and now evokes an assertion failure.
llvm-svn: 62177
functions that don't already have a (dynamic) alloca.
Dynamic allocas cause inefficient codegen and we shouldn't
propagate this (behavior follows gcc). Two existing tests
assumed such inlining would be done; they are hacked by
adding an alloca in the caller, preserving the point of
the tests.
llvm-svn: 61946
will get its preferred alignment. It has to be careful and cautiously assume
it will just get the ABI alignment. This prevents instcombine from rounding
up the alignment of a load/store without adjusting the alignment of the alloca.
llvm-svn: 61934
check242, which invalidates this test. This test is an x86-32 ABI test
that is trying to be run in a target-independent way, which is not going
to work very well. Just remove the test.
llvm-svn: 61921
loads from allocas that cover the entire aggregate. This handles
some memcpy/byval cases that are produced by llvm-gcc. This triggers
a few times in kc++ (with std::pair<std::_Rb_tree_const_iterator
<kc::impl_abstract_phylum*>,bool>) and once in 176.gcc (with %struct..0anon).
llvm-svn: 61915
was it not very helpful, it was also wrong! The problem
is shown in the testcase: the alloca might be passed to
a nocapture callee which dereferences it and returns the
original pointer. But because it was a nocapture call we
think we don't need to track its uses, but we do.
llvm-svn: 61876
integer to a (transitive) bitcast the alloca and if that integer
has the full size of the alloca, then it clobbers the whole thing.
Handle this by extracting pieces out of the stored integer and
filing them away in the SROA'd elements.
This triggers fairly frequently because the CFE uses integers to
pass small structs by value and the inliner exposes these. For
example, in kimwitu++, I see a bunch of these with i64 stores to
"%struct.std::pair<std::_Rb_tree_const_iterator<kc::impl_abstract_phylum*>,bool>"
In 176.gcc I see a few i32 stores to "%struct..0anon".
In the testcase, this is a difference between compiling test1 to:
_test1:
subl $12, %esp
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
movl (%esp), %eax
addl 4(%esp), %eax
addl $12, %esp
ret
vs:
_test1:
movl 8(%esp), %eax
addl 4(%esp), %eax
ret
The second half of this will be to handle loads of the same form.
llvm-svn: 61853
v1024 = EDI // not killed
=
= EDI
One possible solution is for the coalescer to examine the sub-register live intervals in the same manner as the physical register. Another possibility is to examine defs and uses (when needed) of sub-registers. Both solutions are too expensive. For now, look for "short virtual intervals" and scan instructions to look for conflict instead.
This is a small win on x86-64. e.g. It shaves 403.gcc by ~80 instructions.
llvm-svn: 61847
avoid the need for spilling, add a new testcase that tests that the
pcmpeqd used for V_SETALLONES is changed to a constant-pool load as
needed.
llvm-svn: 61831
converted to LEA64_32r in x86's convertToThreeAddress. This
replaces code like this:
movl %esi, %edi
inc %edi
with this:
lea 1(%rsi), %edi
which appears to be beneficial.
llvm-svn: 61830
aggregate types. Don't increment the current index after reaching
the end of a struct, as it will already be pointing at
one-past-the end. This fixes PR3288.
llvm-svn: 61828
- Fix bugs 3194, 3195: i128 load/stores produce correct code (although, we
need to ensure that i128 is 16-byte aligned in real life), and 128 zero-
extends are supported.
- New td file: SPU128InstrInfo.td: this is where all new i128 support should
be put in the future.
- Continue to hammer on i64 operations and test cases; ensure that the only
remaining problem will be i64 mul.
llvm-svn: 61784
AddPseudoTwoAddrDeps. This lets the scheduling infrastructure
avoid recalculating node heights. In very large testcases this
was a major bottleneck. Thanks to Roman Levenstein for finding
this!
As a side effect, fold-pcmpeqd-0.ll is now scheduled better
and it no longer requires spilling on x86-32.
llvm-svn: 61778
In fact this also deletes those with linkonce linkage,
however this is currently dead because for the moment
aliases aren't allowed to have this linkage type.
llvm-svn: 61742
- Fix (brcond (setq ...)) bug, where BRNZ should have been used vice BRZ.
- Kill unused/unnecessary nodes in SPUNodes.td
- Beef out the i64operations.c test harness to use a lot of unaligned
loads, test loops and LLVM loop/basic block optimizations; run the
test harness successfully on real Cell hardware.
llvm-svn: 61664
- Remove custom lowering for BRCOND
- Add remaining functionality for branches in SPUInstrInfo, such as branch
condition reversal and load/store folding. Updated BrCond test to reflect
branch reversal.
llvm-svn: 61597
the argument to be stored to an alloca by tracking uses
of the alloca. This occurs 4 times (out of 7121, 0.05%)
in MultiSource/Applications, so may not be worth it. On
the other hand, it is easy to do and fairly cheap. The
functions it helps are: W_addcom and W_addlit in spiff;
process_args (argv) in d (make_dparser); ercPixConcealIMB
in JM/ldecod.
llvm-svn: 61570
and clean recursive descent parser.
This change has a couple of ramifications:
1. The parser code is about 400 lines shorter (in what we maintain, not
including what is autogenerated).
2. The code should be significantly faster than the old code because we
don't have to work around bison's poor handling of datatypes with
ctors/dtors. This also makes the code much more resistant to memory
leaks.
3. We now get caret diagnostics from the .ll parser, woo.
4. The actual diagnostics emited from the parser are completely different
so a bunch of testcases had to be updated.
5. I now disallow "%ty = type opaque %ty = type i32". There was no good
reason to support this, it was just an accident of the old
implementation. I have no reason to think that anyone is actually using
this.
6. The syntax for sticking a global variable has changed to make it
unambiguous. I don't think anyone is depending on this since only clang
supports this and it is not solid yet, so I'm not worried about anything
breaking.
7. This gets rid of the last use of bison, and along with it the .cvs files.
I'll prune this from the makefiles as a subsequent commit.
There are a few minor cleanups that can be done after this commit (suggestions
welcome!) but this passes dejagnu testing and is ready for its time in the
limelight.
llvm-svn: 61558
functions that don't write can't leak a pointer except through
the return value, so a void readonly function is implicitly nocapture.
Test these, and add a test that verifies that f1 calling f2 with an
otherwise dead pointer gets both of them marked nocapture.
llvm-svn: 61552
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType. In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).
llvm-svn: 61542
to work out (in a very simplistic way) which function
arguments (pointer arguments only) are only dereferenced
and so do not escape. Mark such arguments 'nocapture'.
llvm-svn: 61525
instruction sequence and cannot ordinarily be simplified by DAGcombine
into the various target description files or SPUDAGToDAGISel.cpp.
This makes some 64-bit operations legal.
- Eliminate target-dependent ISD enums.
- Update tests.
llvm-svn: 61508
constants, since doing so is irrelevant for aliasing
purposes. While this doesn't increase the total number
of functions marked readonly or readnone in MultiSource/
Applications (3089), it does result in 12 functions being
marked readnone rather than readonly.
Before:
readnone: 820
readonly: 2269
After:
readnone: 832
readonly: 2257
llvm-svn: 61469
DAGcombine's ability to find reasons to remove truncates when they were not
needed. Consequently, the CellSPU backend would produce correct, but _really
slow and horrible_, code.
Replaced with instruction sequences that do the equivalent truncation in
SPUInstrInfo.td.
- Re-examine how unaligned loads and stores work. Generated unaligned
load code has been tested on the CellSPU hardware; see the i32operations.c
and i64operations.c in CodeGen/CellSPU/useful-harnesses. (While they may be
toy test code, it does prove that some real world code does compile
correctly.)
- Fix truncating stores in bug 3193 (note: unpack_df.ll will still make llc
fault because i64 ult is not yet implemented.)
- Added i64 eq and neq for setcc and select/setcc; started new instruction
information file for them in SPU64InstrInfo.td. Additional i64 operations
should be added to this file and not to SPUInstrInfo.td.
llvm-svn: 61447
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>,
36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
The right way to represent this is something like
%reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47)
Of course that won't work since that means overlapping live ranges defined by two val#.
The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom.
llvm-svn: 61259
- Use SplitBlockPredecessors to factor out common predecessors of the critical edge destination. This is disabled for now due to some regressions.
llvm-svn: 61248
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
llvm-svn: 61242
The problematic part of this patch is that we were out of attribute bits,
requiring some fancy bit hacking to make it fit (by shrinking alignment)
without breaking existing users or the file format.
This change will require users to rebuild llvm-gcc to match llvm.
llvm-svn: 61239
nodes. This allows it to do fairly general phi insertion if a
load from a pointer global wants to be SRAd but the load is used
by (recursive) phi nodes. This fixes a pessimization on ppc
introduced by Load PRE.
llvm-svn: 61123
consistently for deleting branches. In addition to being slightly
more readable, this makes SimplifyCFG a bit better
about cleaning up after itself when it makes conditions unused.
llvm-svn: 61100
visited set before they are used. If used, their blocks need to be
added to the visited set so that subsequent queries don't use conflicting
pointer values in the cache result blocks.
llvm-svn: 61080
computation code. Also, avoid adding output-depenency edges when both
defs are dead, which frequently happens with EFLAGS defs.
Compute Depth and Height lazily, and always in terms of edge latency
values. For the schedulers that don't care about latency, edge latencies
are set to 1.
Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array.
These are all subsumed by the Depth and Height fields.
llvm-svn: 61073
cleans up the generated code a bit. This should have the added benefit of
not randomly renaming functions/globals like my previous patch did. :)
llvm-svn: 61023
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
llvm-svn: 61022
llvm[2]: Linking Release executable opt (without symbols)
...
Undefined symbols:
"llvm::APFloat::IEEEsingle", referenced from:
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
"llvm::APFloat::IEEEdouble", referenced from:
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
ld: symbol(s) not found
This is in release mode. To replicate, compile llvm and llvm-gcc in optimized
mode. Then build llvm, in optimized mode, with the newly created compiler.
llvm-svn: 60977
which are identical to the original patterns.
- Change the multiply with overflow so that we distinguish between signed and
unsigned multiplication. Currently, unsigned multiplication with overflow
isn't working!
llvm-svn: 60963
for promoted integer types, eg: i16 on ppc-32, or
i24 on any platform. Complete support for arbitrary
precision integers would require handling expanded
integer types, eg: i128, but I couldn't be bothered.
llvm-svn: 60834
parallel, allowing it to decide that P/Q must alias if A/B
must alias in things like:
P = gep A, 0, i, 1
Q = gep B, 0, i, 1
This allows GVN to delete 62 more instructions out of 403.gcc.
llvm-svn: 60820
overflow/carry from the "arithmetic with overflow" intrinsics. It searches the
machine basic block from bottom to top to find the SETO/SETC instruction that is
its conditional. If an instruction modifies EFLAGS before it reaches the
SETO/SETC instruction, then it defaults to the normal instruction emission.
llvm-svn: 60807
essential problem was that the DAG can contain
random unused nodes which were never analyzed.
When remapping a value of a node being processed,
such a node may become used and need to be analyzed;
however due to operands being transformed during
analysis the node may morph into a different one.
Users of the morphing node need to be updated, and
this wasn't happening. While there I added a bunch
of documentation and sanity checks, so I (or some
other poor soul) won't have to scratch their head
over this stuff so long trying to remember how it
was all supposed to work next time some obscure
problem pops up! The extra sanity checking exposed
a few places where invariants weren't being preserved,
so those are fixed too. Since some of the sanity
checking is expensive, I added a flag to turn it
on. It is also turned on when building with
ENABLE_EXPENSIVE_CHECKS=1.
llvm-svn: 60797
tricks based on readnone/readonly functions.
Teach memdep to look past readonly calls when analyzing
deps for a readonly call. This allows elimination of a
few more calls from 403.gcc:
before:
63 gvn - Number of instructions PRE'd
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
after:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
5 calls isn't much, but this adds plumbing for the next change.
llvm-svn: 60794
Fix the shift amount when unrolling a vector shift into scalar shifts.
Fix problem in getShuffleScalarElt where it assumes that the input of
a bit convert must be a vector.
llvm-svn: 60740
and use it in x86 address mode folding. Also, make
getRegForValue return 0 for illegal types even if it has a
ValueMap for them, because Argument values are put in the
ValueMap. This fixes PR3181.
llvm-svn: 60696
doesn't do its own local caching, and is slightly more aggressive about
free/store dse (see testcase). This eliminates the last external client
of MemDep::getDependenceFrom().
llvm-svn: 60619
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
llvm-svn: 60608
1. GlobalBaseReg may have been spilled.
2. It may not be live at the use.
3. Spiller doesn't know this is happening so it won't prevent GlobalBaseReg from being spilled later (That by itself is a nasty hack. It's needed because we don't insert the reload until later).
llvm-svn: 60595
aren't part of the test suite but are generally useful nonetheless, and can
be expanded later to test the backend against the actual Cell SPU system.
There's basically no other good place to put this code, so put it here for
the time being.
- vecoperations.c: Vector shuffles for all supported vector types, tests
for v16i8 add and multiply.
llvm-svn: 60566
This fixes many bugs. I will add more test cases in a separate check-in.
Some day, the code that manipulates CFG and updates dom. info could use refactoring help.
llvm-svn: 60554
1) have it fold "br undef", which does occur with
surprising frequency as jump threading iterates.
2) teach j-t to delete dead blocks. This removes the successor
edges, reducing the in-edges of other blocks, allowing
recursive simplification.
3) Fold things like:
br COND, BBX, BBY
BBX:
br COND, BBZ, BBW
which also happens because jump threading iterates.
llvm-svn: 60470
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.
Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).
llvm-svn: 60461
delegates to the regular x86-32 convention which handles byval, but only
after it handles a few cases, and it's necessary to handle byval before
handling those cases. This fixes PR3122 (and rdar://6400815), llvm-gcc
miscompiling LLVM.
llvm-svn: 60453
1. ppcf128 select is expanded to f64 select's.
2. f64 select operand 0 is an i1 truncate, it's promoted to i32 zero_extend.
3. f64 select is updated. It's changed back to a "NewNode" and being re-analyzed.
4. f64 select operands are being processed. Operand 0 is a "NewNode". It's being expunged out of ReplacedValues map.
5. ExpungeNode tries to remap f64 select and notice it's a "NewNode" and assert.
Duncan, please take a look. Thanks.
llvm-svn: 60443
- Incorporate Tilmann Scheller's ISD::TRUNCATE custom lowering patch
- Update SPU calling convention info, even if it's not used yet (but can be
at some point or another)
- Ensure that any-extended f32 loads are custom lowered, especially when
they're promoted for use in printf.
llvm-svn: 60438
straight-forward implementation. This does not require any extra
alias analysis queries beyond what we already do for non-local loads.
Some programs really really like load PRE. For example, SPASS triggers
this ~1000 times, ~300 times in 255.vortex, and ~1500 times on 403.gcc.
The biggest limitation to the implementation is that it does not split
critical edges. This is a huge killer on many programs and should be
addressed after the initial patch is enabled by default.
The implementation of this should incidentally speed up rejection of
non-local loads because it avoids creating the repl densemap in cases
when it won't be used for fully redundant loads.
This is currently disabled by default.
Before I turn this on, I need to fix a couple of miscompilations in
the testsuite, look at compile time performance numbers, and look at
perf impact. This is pretty close to ready though.
llvm-svn: 60408
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
depending on the type of ADDO requested.
- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
COND_C set.
llvm-svn: 60388
figuring out the base of the IV. This produces better
code in the example. (Addresses use (IV) instead of
(BASE,IV) - a significant improvement on low-register
machines like x86).
llvm-svn: 60374
- Fix v2[if]64 vector insertion code before IBM files a bug report.
- Ensure that zero (0) offsets relative to $sp don't trip an assert
(add $sp, 0 gets legalized to $sp alone, tripping an assert)
- Shuffle masks passed to SPUISD::SHUFB are now v16i8 or v4i32
llvm-svn: 60358
multiplies.
Some more cleverness would be nice, though. It would be nice if we
could do this transformation on illegal types. Also, we would
prefer a narrower constant when possible so that we can use a narrower
multiply, which can be cheaper.
llvm-svn: 60283
overflowed on negation. This commit checks to make sure that neithe C nor X
overflows. This requires that the RHS of X (a subtract instruction) be a
constant integer.
llvm-svn: 60275
properly updates the reverse dependency map when it installs updated
dependencies for instructions that depend on the removed instruction.
llvm-svn: 60222
1. Make it fold blocks separated by an unconditional branch. This enables
jump threading to see a broader scope.
2. Make jump threading able to eliminate locally redundant loads when they
feed the branch condition of a block. This frequently occurs due to
reg2mem running.
3. Make jump threading able to eliminate *partially redundant* loads when
they feed the branch condition of a block. This is common in code with
lots of loads and stores like C++ code and 255.vortex.
This implements thread-loads.ll and rdar://6402033.
Per the fixme's, several pieces of this should be moved into Transforms/Utils.
llvm-svn: 60148
performance in most cases on the Grawp tester, but does speed some
things up (like shootout/hash by 15%). This also doesn't impact
compile time in a noticable way on the Grawp tester.
It also, of course, gets the testcase it was designed for right :)
llvm-svn: 60120