Commit Graph

7832 Commits

Author SHA1 Message Date
Chris Lattner f6ae904e34 Fix FoldSingleEntryPHINodes to update memdep and AA when it deletes
phi nodes.  It is called from MergeBlockIntoPredecessor which is 
called from GVN, which claims to preserve these.

I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.

llvm-svn: 123222
2011-01-11 08:13:40 +00:00
Chris Lattner dfcfcb49fa random cleanups
llvm-svn: 123221
2011-01-11 08:00:40 +00:00
Chris Lattner 63fe78de68 remove a bogus assertion: the latch block of a loop is not
neccesarily an uncond branch to the header.  This fixes 
PR8955 (the assertion tripping).

llvm-svn: 123219
2011-01-11 07:47:59 +00:00
Owen Anderson d490c2d2ae Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by
a comparison against a constant.

llvm-svn: 123203
2011-01-11 00:36:45 +00:00
Chandler Carruth cf414cf0a6 Teach instcombine about the rest of the SSE and SSE2 conversion
intrinsics element dependencies. Reviewed by Nick.

llvm-svn: 123161
2011-01-10 07:19:37 +00:00
Chris Lattner 88bc848ab6 another random stab in the dark trying to fix llvm-gcc-i386-linux-selfhost
llvm-svn: 123149
2011-01-10 02:34:11 +00:00
Chris Lattner 4662bd4b13 another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost
back to life.

llvm-svn: 123146
2011-01-10 00:47:34 +00:00
Chris Lattner 1017fa6746 temporarily disable memset formation from memsets in an effort to restore buildbot stability.
llvm-svn: 123144
2011-01-09 23:52:48 +00:00
Chris Lattner caf5c0d037 fix a few old bugs (found by inspection) where we would zap instructions
without informing memdep.  This could cause nondeterminstic weirdness 
based on where instructions happen to get allocated, and will hopefully
breath some life into some broken testers.

llvm-svn: 123124
2011-01-09 19:26:10 +00:00
Tobias Grosser cc21c4aa98 Instcombine: Fix pattern where the sext did not dominate the icmp using it
llvm-svn: 123121
2011-01-09 16:00:11 +00:00
Cameron Zwarich a42e5915bf LoopInstSimplify preserves LoopSimplify.
llvm-svn: 123117
2011-01-09 12:35:16 +00:00
Chris Lattner a337f5ec5c reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec's
that have the bit set.

llvm-svn: 123104
2011-01-09 02:16:18 +00:00
Chris Lattner 7d6433ae76 fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't
updating memdep when fusing stores together.  This fixes the crash optimizing
the bullet benchmark.

llvm-svn: 123091
2011-01-08 22:19:21 +00:00
Chris Lattner ff6ed2ac5f tryMergingIntoMemset can only handle constant length memsets.
llvm-svn: 123090
2011-01-08 22:11:56 +00:00
Chris Lattner 9a1d63ba9f Merge memsets followed by neighboring memsets and other stores into
larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089
2011-01-08 21:19:19 +00:00
Chris Lattner 5120ebf184 fix an issue in IsPointerOffset that prevented us from recognizing that
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
2011-01-08 21:07:56 +00:00
Chris Lattner 4dc1fd938f enhance memcpyopt to merge a store and a subsequent
memset into a single larger memset.

llvm-svn: 123086
2011-01-08 20:54:51 +00:00
Chris Lattner c638147e9f constify TargetData references.
Split memset formation logic out into its own
"tryMergingIntoMemset" helper function.

llvm-svn: 123081
2011-01-08 20:24:01 +00:00
Chris Lattner 59c82f850d When loop rotation happens, it is *very* common for the duplicated condbr
to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079
2011-01-08 19:59:06 +00:00
Chris Lattner 30f318e5d1 split ssa updating code out to its own helper function. Don't bother
moving the OrigHeader block anymore: we just merge it away anyway so
its code layout doesn't matter.

llvm-svn: 123077
2011-01-08 19:26:33 +00:00
Chris Lattner 2615130e1d Implement a TODO: Enhance loopinfo to merge away the unconditional branch
that it was leaving in loops after rotation (between the original latch
block and the original header.

With this change, it is possible for rotated loops to have just a single
basic block, which is useful.

llvm-svn: 123075
2011-01-08 19:10:28 +00:00
Chris Lattner 930b716e1b various code cleanups, enhance MergeBlockIntoPredecessor to preserve
loop info.

llvm-svn: 123074
2011-01-08 19:08:40 +00:00
Chris Lattner fee37c5fa3 inline preserveCanonicalLoopForm now that it is simple.
llvm-svn: 123073
2011-01-08 18:55:50 +00:00
Chris Lattner 063dca0f6a Three major changes:
1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072
2011-01-08 18:52:51 +00:00
Chris Lattner 30d95f9f87 reduce nesting.
llvm-svn: 123071
2011-01-08 18:47:43 +00:00
Chris Lattner 7fab23bc1d LoopRotate requires canonical loop form, so it always has preheaders
and latch blocks.  Reorder entry conditions to make hte pass faster
and more logical.

llvm-svn: 123069
2011-01-08 18:06:22 +00:00
Chris Lattner d62691f4e8 use the LI ivar.
llvm-svn: 123068
2011-01-08 17:49:51 +00:00
Chris Lattner 385f2ec6d8 some cleanups: remove dead arguments and eliminate ivars
that are just passed to one function.

llvm-svn: 123067
2011-01-08 17:48:33 +00:00
Chris Lattner 25ba40a0cc fix an issue duncan pointed out, which could cause loop rotate
to violate LCSSA form

llvm-svn: 123066
2011-01-08 17:38:45 +00:00
Cameron Zwarich b4ab257bcc Fix coding style issues.
llvm-svn: 123065
2011-01-08 17:07:11 +00:00
Cameron Zwarich 84986b298a Make more passes preserve dominators (or state that they preserve dominators if
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.

The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.

llvm-svn: 123064
2011-01-08 17:01:52 +00:00
Cameron Zwarich 80bd9af7c5 Contract subloop bodies. However, it is still important to visit the phis at the
top of subloop headers, as the phi uses logically occur outside of the subloop.

llvm-svn: 123062
2011-01-08 15:52:22 +00:00
Frits van Bommel 6a1fb8f235 Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little.
llvm-svn: 123061
2011-01-08 10:51:36 +00:00
Chris Lattner 8c5defd0b0 Have loop-rotate simplify instructions (yay instsimplify!) as it clones
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations 
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060
2011-01-08 08:24:46 +00:00
Chris Lattner 43f8d16482 Revamp the ValueMapper interfaces in a couple ways:
1. Take a flags argument instead of a bool.  This makes
   it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
   map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
   more efficient.  For lookup failures, don't drop null values
   into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
   and LoopUnswitch, kill it.

No functionality change.

llvm-svn: 123058
2011-01-08 08:15:20 +00:00
Chris Lattner 2b3f20e6ec two minor changes: switch to the standard ValueToValueMapTy
map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.

llvm-svn: 123057
2011-01-08 07:21:31 +00:00
Tobias Grosser fc3d7f664b InstCombine: Match min/max hidden by sext/zext
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X

Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866

llvm-svn: 123034
2011-01-07 21:33:14 +00:00
Tobias Grosser 411e6eedff Some whitespace fixes
llvm-svn: 123033
2011-01-07 21:33:13 +00:00
Benjamin Kramer 134cde912a Revert 122959, it needs more thought. Add it back to README.txt with additional notes.
llvm-svn: 123030
2011-01-07 20:42:20 +00:00
Jay Foad 89afb43b1e Remove all uses of the "ugly" method BranchInst::setUnconditionalDest().
llvm-svn: 123025
2011-01-07 20:25:56 +00:00
Benjamin Kramer ae67cc13a9 InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal.
This happens when we take the (non-constant) length from a malloc.

llvm-svn: 122961
2011-01-06 14:22:52 +00:00
Benjamin Kramer 799b011276 InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc.
llvm-svn: 122959
2011-01-06 13:11:05 +00:00
Benjamin Kramer a76cc117e0 InstCombine: Teach llvm.objectsize folding to look through GEPs.
llvm-svn: 122958
2011-01-06 13:07:49 +00:00
Cameron Zwarich 9ec19ea06a Add the CallInst optimizations that don't involve expanding inline assembly to
OptimizeInst() so that they can be used on a worklist instruction.

llvm-svn: 122945
2011-01-06 02:56:42 +00:00
Cameron Zwarich d28c78eb4f Move the GEP handling in CodeGenPrepare to OptimizeInst().
llvm-svn: 122944
2011-01-06 02:44:52 +00:00
Cameron Zwarich 14ac865ca9 Split the optimizations in CodeGenPrepare that don't manipulate the iterators
into a separate function, so that it can be called from a loop using a worklist
rather than a loop traversing a whole basic block.

llvm-svn: 122943
2011-01-06 02:37:26 +00:00
Jakob Stoklund Olesen 70be93a200 Zap the last two -Wself-assign warnings in llvm.
Simplify RALinScan::DowngradeRegister with TRI::getOverlaps while we are there.

llvm-svn: 122940
2011-01-06 01:33:22 +00:00
Cameron Zwarich ce3b930a98 Stop reallocating SunkAddrs for each basic block. When we move to an instruction
worklist, the key will need to become std::pair<BasicBlock*, Value*>.

llvm-svn: 122932
2011-01-06 00:42:50 +00:00
Cameron Zwarich b62ccb241b Add some more statistics to CodeGenPrepare.
llvm-svn: 122891
2011-01-05 17:47:38 +00:00
Cameron Zwarich ced753fadf Add some stats to CodeGenPrepare to make it easier to speed it up without
regressing code quality.

llvm-svn: 122887
2011-01-05 17:27:27 +00:00
Cameron Zwarich 6a78995369 Use pop_back_val instead of back followed by pop_back.
llvm-svn: 122876
2011-01-05 16:08:47 +00:00
Cameron Zwarich 5a2bb998ac Use a worklist for later iterations just like ordinary instsimplify. The next
step is to only process instructions in subloops if they have been modified by
an earlier simplification.

llvm-svn: 122869
2011-01-05 05:47:47 +00:00
Cameron Zwarich 4c51d122d5 Change LoopInstSimplify back to a LoopPass. It revisits subloops rather than
skipping them, but it should probably use a worklist and only revisit those
instructions in subloops that have actually changed. It should probably also
use a worklist after the first iteration like instsimplify now does. Regardless,
it's only 0.3% of opt -O2 time on 403.gcc if it replaces the instcombine placed
in the middle of the loop passes.

llvm-svn: 122868
2011-01-05 05:15:53 +00:00
Owen Anderson 7b25ff04bd Don't bother value numbering instructions with void types in GVN. In theory this should allow us to insert
fewer things into the value numbering maps, but any speedup is beneath the noise threshold on my machine
on 403.gcc.

llvm-svn: 122844
2011-01-04 22:15:21 +00:00
Owen Anderson e39cb57b09 Complete the NumberTable --> LeaderTable rename.
llvm-svn: 122828
2011-01-04 19:29:46 +00:00
Owen Anderson d7d06d3aaf Fix typo in a comment.
llvm-svn: 122827
2011-01-04 19:25:18 +00:00
Owen Anderson 51489b3b28 Prune #include's.
llvm-svn: 122826
2011-01-04 19:24:57 +00:00
Owen Anderson c7c3bc63f7 Clarify terminology, settling on referring to what was the "number table" as the "leader table", and
rename methods to make it much more clear what they're doing.

llvm-svn: 122823
2011-01-04 19:13:25 +00:00
Owen Anderson 83546f2fe0 When removing a value from GVN's leaders list, don't drop the Next pointer in a corner case.
llvm-svn: 122822
2011-01-04 19:10:54 +00:00
Dale Johannesen a71d2cc88d Improve the accuracy of the inlining heuristic looking for the
case where a static caller is itself inlined everywhere else, and
thus may go away if it doesn't get too big due to inlining other
things into it.  If there are references to the caller other than
calls, it will not be removed; account for this.
This results in same-day completion of the case in PR8853.

llvm-svn: 122821
2011-01-04 19:01:54 +00:00
Owen Anderson 41a1550ef5 Branch instructions don't produce values, so there's no need to generate a value number for them. This
avoids adding them to the various value numbering tables, resulting in a minor (~3%) speedup for GVN
on 40.gcc.

llvm-svn: 122819
2011-01-04 18:54:18 +00:00
Owen Anderson 22c53e277a Remove commented out code.
llvm-svn: 122817
2011-01-04 18:22:08 +00:00
Cameron Zwarich b2a41e9388 Switch to the new style of asterisk placement.
llvm-svn: 122815
2011-01-04 18:19:19 +00:00
Chris Lattner 8643810ede Teach loop-idiom to turn a loop containing a memset into a larger memset
when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806
2011-01-04 07:46:33 +00:00
Chris Lattner a62b01dc37 restructure this a bit. Initialize the WeakVH with "I", the
instruction *after* the store.  The store will always be deleted
if the transformation kicks in, so we'd do an N^2 scan of every
loop block.  Whoops.

llvm-svn: 122805
2011-01-04 07:27:30 +00:00
Cameron Zwarich f4e13699e7 Avoid finding loop back edges when we are not splitting critical edges in
CodeGenPrepare (which is the default behavior).

llvm-svn: 122801
2011-01-04 04:43:31 +00:00
Cameron Zwarich e924969380 Address most of Duncan's review comments. Also, make LoopInstSimplify a simple
FunctionPass. It probably doesn't have a reason to be a LoopPass, as it will
probably drop the simple fixed point and either use RPO iteration or Duncan's
approach in instsimplify of only revisiting instructions that have changed.

The next step is to preserve LoopSimplify. This looks like it won't be too hard,
although the pass manager doesn't actually seem to respect when non-loop passes
claim to preserve LCSSA or LoopSimplify. This will have to be fixed.

llvm-svn: 122791
2011-01-04 00:12:46 +00:00
Chris Lattner 0ba473c218 use the very-handy getTruncateOrZeroExtend helper function, and
stop setting NSW: signed overflow is possible.  Thanks to Dan
for pointing these out.

llvm-svn: 122790
2011-01-04 00:06:55 +00:00
Owen Anderson 0839d3930a Fix comment.
llvm-svn: 122788
2011-01-03 23:51:56 +00:00
Owen Anderson d62d37225a Use the new addEscapingValue callback to update GlobalsModRef when GVN adds PHIs of GEPs. For the moment,
have GlobalsModRef handle this conservatively by simply removing the value from its maps.

llvm-svn: 122787
2011-01-03 23:51:43 +00:00
Chris Lattner bde6ec1db6 Duncan deftly points out that readnone functions aren't
invalidated by stores, so they can be handled as 'simple'
operations.

llvm-svn: 122785
2011-01-03 23:38:13 +00:00
Owen Anderson 3a33d0cc4a Simplify GVN's value expression structure, allowing the elimination of a lot of
almost-but-not-quite-identical code.  No intended functionality change.

llvm-svn: 122760
2011-01-03 19:00:11 +00:00
Chris Lattner 16ca19ffc5 stength reduce my previous patch a bit. The only instructions
that are allowed to have metadata operands are intrinsic calls,
and the only ones that take metadata currently return void.
Just reject all void instructions, which should not be value
numbered anyway.  To future proof things, add an assert to the
getHashValue impl for calls to check that metadata operands 
aren't present.

llvm-svn: 122759
2011-01-03 18:43:03 +00:00
Chris Lattner 142f1cd251 fix PR8895: metadata operands don't have a strong use of their
nested values, so they can change and drop to null, which can
change the hash and cause havok.

It turns out that it isn't a good idea to value number stuff
with metadata operands anyway, so... don't.

llvm-svn: 122758
2011-01-03 18:28:15 +00:00
Duncan Sands 697de77339 Speed up instsimplify by about 10-15% by not bothering to retry
InstructionSimplify on instructions that didn't change since the
last time round the loop.

llvm-svn: 122745
2011-01-03 10:50:04 +00:00
Cameron Zwarich 43cecb1200 Switch a worklist in CodeGenPrepare to SmallVector and increase the inline
capacity on the Visited SmallPtrSet. On 403.gcc, this is about a 4.5% speedup of
CodeGenPrepare time (which itself is 10% of time spent in the backend).

This is progress towards PR8889.

llvm-svn: 122741
2011-01-03 06:33:01 +00:00
Chris Lattner 9e5e9ed79a earlycse can do trivial with-a-block dead store
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
2011-01-03 04:17:24 +00:00
Chris Lattner 4b9a525742 switch the load table to use a recycling bump pointer allocator,
speeding earlycse up by 6%.

llvm-svn: 122733
2011-01-03 03:53:50 +00:00
Chris Lattner e0e32a9ef0 now that loads are in their own table, we can implement
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
2011-01-03 03:46:34 +00:00
Chris Lattner 92bb0f9f9d split loads and calls into separate tables. Loads are now just indexed
by their pointer instead of using MemoryValue to wrap it.

llvm-svn: 122731
2011-01-03 03:41:27 +00:00
Chris Lattner 4cb365414f various cleanups, no functionality change.
llvm-svn: 122729
2011-01-03 03:28:23 +00:00
Chris Lattner b9a8efc960 Teach EarlyCSE to do trivial CSE of loads and read-only calls.
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
2011-01-03 03:18:43 +00:00
Chris Lattner 79d83067ee rename InstValue to SimpleValue, add some comments.
llvm-svn: 122725
2011-01-03 02:20:48 +00:00
Michael J. Spencer edb5bcdde5 CMake: Add missing source file.
llvm-svn: 122724
2011-01-03 02:13:05 +00:00
Chris Lattner d815f69b30 Allocate nodes for the scoped hash table from a recyling bump pointer
allocator.  This speeds up early cse by about 20%

llvm-svn: 122723
2011-01-03 01:42:46 +00:00
Chris Lattner 02a9776b64 reduce redundancy in the hashing code and other misc cleanups.
llvm-svn: 122720
2011-01-03 01:10:08 +00:00
Cameron Zwarich cab9a0abab Add a new loop-instsimplify pass, with the intention of replacing the instance
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.

llvm-svn: 122719
2011-01-03 00:25:16 +00:00
Chris Lattner 0844c76f9a fix some pastos
llvm-svn: 122718
2011-01-02 23:29:58 +00:00
Chris Lattner 8fac5db251 add DEBUG and -stats output to earlycse.
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
2011-01-02 23:19:45 +00:00
Chris Lattner 18ae5436b1 Enhance earlycse to do CSE of casts, instsimplify and die.
Add a testcase.

llvm-svn: 122715
2011-01-02 23:04:14 +00:00
Chris Lattner bf0aa927cc split dom frontier handling stuff out to its own DominanceFrontier header,
so that Dominators.h is *just* domtree.  Also prune #includes a bit.

llvm-svn: 122714
2011-01-02 22:09:33 +00:00
Chris Lattner 704541bb23 sketch out a new early cse pass. No functionality yet.
llvm-svn: 122713
2011-01-02 21:47:05 +00:00
Chris Lattner 9c69406f2b fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712
2011-01-02 21:14:18 +00:00
Chris Lattner 5702a43c09 If a loop iterates exactly once (has backedge count = 0) then don't
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
2011-01-02 20:24:21 +00:00
Nick Lewycky 5361b84184 Also remove functions that use complex constant expressions in terms of
another function.

llvm-svn: 122705
2011-01-02 19:16:44 +00:00
Chris Lattner 8455b6e45e enhance loop idiom recognition to scan *all* unconditionally executed
blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704
2011-01-02 19:01:03 +00:00
Chris Lattner 0cdc6f62a5 make inSubLoop much more efficient.
llvm-svn: 122703
2011-01-02 18:53:08 +00:00
Chris Lattner 27497ece96 rip out isExitBlockDominatedByBlockInLoop, calling DomTree::dominates instead.
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was 
*just* a tree and didn't have DFS numbers.  Checking DFS numbers is faster
and easier than "limiting the search of the tree".

llvm-svn: 122702
2011-01-02 18:45:39 +00:00
Chris Lattner 0469e01c02 add a list of opportunities for future improvement.
llvm-svn: 122701
2011-01-02 18:32:09 +00:00
Duncan Sands 64f1c0dcda Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described
in the PR, the pass could break LCSSA form when inserting preheaders.  It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.

llvm-svn: 122695
2011-01-02 13:38:21 +00:00
Chris Lattner ddf58010bd Allow loop-idiom to run on multiple BB loops, but still only scan the loop
header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685
2011-01-02 07:58:36 +00:00
Chris Lattner 5b5a043d82 remove debugging code.
llvm-svn: 122683
2011-01-02 07:37:13 +00:00
Chris Lattner 12f91befce add some -stats output.
llvm-svn: 122682
2011-01-02 07:36:44 +00:00
Chris Lattner 679572e584 improve loop rotation to use CodeMetrics to analyze the
size of a loop header instead of its own code size estimator.
This allows it to handle bitcasts etc more precisely.

llvm-svn: 122681
2011-01-02 07:35:53 +00:00
Chris Lattner 85b6d81d41 teach loop idiom recognition to form memcpy's from simple loops.
llvm-svn: 122678
2011-01-02 03:37:56 +00:00
Nick Lewycky 4e250c8245 Remove functions from the FnSet when one of their callee's is being merged. This
maintains the guarantee that the DenseSet expects two elements it contains to
not go from inequal to equal under its nose.

As a side-effect, this also lets us switch from iterating to a fixed-point to
actually maintaining a work queue of functions to look at again, and we don't
add thunks to our work queue so we don't need to detect and ignore them.

llvm-svn: 122677
2011-01-02 02:46:33 +00:00
Chris Lattner 1903c42b97 fix a globalopt crash on two Adobe-C++ testcases that the recent
loop idiom pass exposed.

llvm-svn: 122674
2011-01-01 22:31:46 +00:00
Chris Lattner a3514441e0 add a validity check that was missed, fixing a crash on the
new testcase.

llvm-svn: 122662
2011-01-01 20:12:04 +00:00
Chris Lattner 91a4435875 improve validity check to handle constant-trip-count loops more
aggressively.  In practice, this doesn't help anything though,
see the todo.

llvm-svn: 122660
2011-01-01 19:54:22 +00:00
Chris Lattner 8b3baf6d75 implement the "no aliasing accesses in loop" safety check. This pass
should be correct now.

llvm-svn: 122659
2011-01-01 19:39:01 +00:00
Duncan Sands 2c440fa403 Simplify this pass by using a depth-first iterator to ensure that all
operands are visited before the instructions themselves.

llvm-svn: 122647
2010-12-31 17:49:05 +00:00
Duncan Sands 6cc7126ed9 Zap dead instructions harder.
llvm-svn: 122645
2010-12-31 16:17:54 +00:00
Benjamin Kramer 570dd787a6 Make a bunch of symbols internal.
llvm-svn: 122642
2010-12-30 22:34:44 +00:00
Chris Lattner 65a699d4d0 simplify this, isBytewiseValue handles the extra check. We still
check for "multiple of a byte" in size to make it clear that the
>> 3 below is safe.

llvm-svn: 122604
2010-12-28 18:53:48 +00:00
Duncan Sands 5cf10e691b Silence gcc warning about an unused variable when doing a release build.
llvm-svn: 122593
2010-12-28 09:41:15 +00:00
Chris Lattner cb18bfa3d2 fix some issues Frits noticed, add AliasAnalysis as a dependency
llvm-svn: 122585
2010-12-27 18:39:08 +00:00
Benjamin Kramer 84bd73c527 BuildLibCalls: Nuke EmitMemCpy, EmitMemMove and EmitMemSet. They are dead and superseded by IRBuilder.
llvm-svn: 122576
2010-12-27 00:25:32 +00:00
Benjamin Kramer 7cba269dfb SimplifyLibCalls: Use IRBuilder to simplify code.
llvm-svn: 122575
2010-12-27 00:16:46 +00:00
Chris Lattner b9fe685b9a have loop-idiom nuke instructions that feed stores that get removed.
llvm-svn: 122574
2010-12-27 00:03:23 +00:00
Chris Lattner 29e14edc8d implement enough of the memset inference algorithm to recognize and insert
memsets.  This is still missing one important validity check, but this is enough
to compile stuff like this:

void test0(std::vector<char> &X) {
  for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
    *I = 0;
}

void test1(std::vector<int> &X) {
  for (long i = 0, e = X.size(); i != e; ++i)
    X[i] = 0x01010101;
}

With:
 $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 

to:

__Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rsi
	cmpq	%rsi, %rax
	je	LBB0_2
## BB#1:                                ## %bb.nph
	subq	%rax, %rsi
	movq	%rax, %rdi
	callq	___bzero
LBB0_2:                                 ## %for.end
	addq	$8, %rsp
	ret
...
__Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rdx
	subq	%rax, %rdx
	cmpq	$4, %rdx
	jb	LBB1_2
## BB#1:                                ## %for.body.preheader
	andq	$-4, %rdx
	movl	$1, %esi
	movq	%rax, %rdi
	callq	_memset
LBB1_2:                                 ## %for.end
	addq	$8, %rsp
	ret

llvm-svn: 122573
2010-12-26 23:42:51 +00:00
Chris Lattner 6cf8d6cc6e start using irbuilder to make mem intrinsics in a few passes.
llvm-svn: 122572
2010-12-26 22:57:41 +00:00
Chris Lattner 7c5f9c35d1 sketch more of this out.
llvm-svn: 122567
2010-12-26 20:45:45 +00:00
Chris Lattner 9cb1035f94 move isBytewiseValue out to ValueTracking.h/cpp
llvm-svn: 122565
2010-12-26 20:15:01 +00:00
Chris Lattner 81ae3f299a actually add the file...
llvm-svn: 122563
2010-12-26 19:39:38 +00:00
Chris Lattner 2ef535a4e4 Start of a pass for recognizing memset and memcpy idioms.
No functionality yet.

llvm-svn: 122562
2010-12-26 19:32:44 +00:00
Benjamin Kramer 30342fb1fd Simplify code.
llvm-svn: 122561
2010-12-26 15:23:45 +00:00
Chris Lattner d729d0dcdb don't lose TD info
llvm-svn: 122556
2010-12-25 20:52:04 +00:00
Chris Lattner 20fca48341 switch the inliner alignment enforcement stuff to use the
getOrEnforceKnownAlignment function, which simplifies the code
and makes it stronger.

llvm-svn: 122555
2010-12-25 20:42:38 +00:00
Chris Lattner 6fcd32e7d7 Move getOrEnforceKnownAlignment out of instcombine into Transforms/Utils.
llvm-svn: 122554
2010-12-25 20:37:57 +00:00
Benjamin Kramer b90b2f0635 Fix a thinko pointed out by Frits van Bommel: looking through global variables in isBytewiseValue is not safe.
llvm-svn: 122550
2010-12-24 22:23:59 +00:00
Benjamin Kramer ea9152e551 MemCpyOpt: Turn memcpys from a constant into a memset if possible.
This allows us to compile "int cst[] = {-1, -1, -1};" into
  movl  $-1, 16(%rsp)
  movq  $-1, 8(%rsp)
instead of
  movl  _cst+8(%rip), %eax
  movl  %eax, 16(%rsp)
  movq  _cst(%rip), %rax
  movq  %rax, 8(%rsp)

llvm-svn: 122548
2010-12-24 21:17:12 +00:00
Owen Anderson 226ac14afb When determining if we can fold (x >> C1) << C2, the bits that we need to verify are zero
are not the low bits of x, but the bits that WILL be the low bits after the operation completes.

llvm-svn: 122529
2010-12-23 23:56:24 +00:00
Owen Anderson 5d690d4168 It is possible for SimplifyCFG to cause PHI nodes to become redundant too late in the optimization
pipeline to be caught by instcombine, and it's not feasible to catch them in SimplifyCFG because the
use-lists are in an inconsistent state at the point where it could know that it need to simplify them.
Instead, have CodeGenPrepare look for trivially redundant PHIs as part of its general cleanup effort.

llvm-svn: 122516
2010-12-23 20:57:35 +00:00
Mon P Wang 18b762a946 Preserve the address space when generating bitcasts for MemTransferInst in ConvertToScalarInfo
llvm-svn: 122462
2010-12-23 01:41:32 +00:00
Jeffrey Yasskin 9b43f33620 Change all self assignments X=X to (void)X, so that we can turn on a
new gcc warning that complains on self-assignments and
self-initializations.

llvm-svn: 122458
2010-12-23 00:58:24 +00:00
Benjamin Kramer 8ef5001b27 InstCombine: creating selects from -1 and 0 is fine, they combine into a sext from i1.
llvm-svn: 122453
2010-12-22 23:12:15 +00:00
Duncan Sands fbb9ac3cca Add a generic expansion transform: A op (B op' C) -> (A op B) op' (A op C)
if both A op B and A op C simplify.  This fires fairly often but doesn't
make that much difference.  On gcc-as-one-file it removes two "and"s and
turns one branch into a select.

llvm-svn: 122399
2010-12-22 13:36:08 +00:00
Duncan Sands 3547d2ebd8 Add some statistics, good for understanding how much more powerful
instcombine is compared to instsimplify.

llvm-svn: 122397
2010-12-22 09:40:51 +00:00
Owen Anderson 5ab8d4b5e5 Give GVN back the ability to perform simple conditional propagation on conditional branch values.
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.

llvm-svn: 122378
2010-12-21 23:54:34 +00:00
Owen Anderson 12470778d7 Remove dead code.
llvm-svn: 122371
2010-12-21 22:31:24 +00:00
Benjamin Kramer 43493c089f GVN's Expression is not POD-like (it contains a SmallVector). Simplify code while at it.
llvm-svn: 122362
2010-12-21 21:30:19 +00:00
Duncan Sands 3b8af41a3e Visit instructions deterministically. Use a FIFO so as to approximately
visit instructions before their uses, since InstructionSimplify does a
better job in that case.  All this prompted by Frits van Bommel.

llvm-svn: 122343
2010-12-21 17:08:55 +00:00
Duncan Sands e7cbb64ec0 If an instruction simplifies, try again to simplify any uses of it. This is
not very important since the pass is only used for testing, but it does make
it more realistic.  Suggested by Frits van Bommel.

llvm-svn: 122336
2010-12-21 16:12:03 +00:00
Duncan Sands d0eb6d39f8 Pull a few more simplifications out of instcombine (there are still
plenty left though!), in particular for multiplication.

llvm-svn: 122330
2010-12-21 14:00:22 +00:00
Duncan Sands eaff500c7b Oops, forgot to add the pass itself!
llvm-svn: 122265
2010-12-20 21:07:42 +00:00
Duncan Sands a436cbe4bf Add a new convenience pass for testing InstructionSimplify. Previously
it could only be tested indirectly, via instcombine, gvn or some other
pass that makes use of InstructionSimplify, which means that testcases
had to be carefully contrived to dance around any other transformations
that that pass did.

llvm-svn: 122264
2010-12-20 20:54:37 +00:00
Benjamin Kramer f7957d0463 Add a check missing from my last commit and avoid a potential overflow situation.
llvm-svn: 122258
2010-12-20 20:00:31 +00:00
Benjamin Kramer 2bca3a67b3 Reduce indentation.
llvm-svn: 122249
2010-12-20 16:21:59 +00:00
Benjamin Kramer 68531baea9 Teach InstCombine to merge (icmp ult (X + CA), C1) | (icmp eq X, C2) into (icmp ult (X + CA), C1 + 1) if C2 + CA == C1.
InstCombine creates these so now we compile x == 23 || x == 24 || x == 25 to
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 3
instead of
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 2
  %cmp3 = icmp eq i32 %x, 25
  %ret2 = or i1 %1, %cmp3

llvm-svn: 122248
2010-12-20 16:18:51 +00:00
Chris Lattner 27ca8ebd4b fix PR8807 by making transformConstExprCastCall aware of byval arguments.
llvm-svn: 122238
2010-12-20 08:36:38 +00:00
Chris Lattner 7398965b67 various cleanups for transformConstExprCastCall
llvm-svn: 122237
2010-12-20 08:25:06 +00:00
Chris Lattner 0f11495289 when eliding a byval copy due to inlining a readonly function, we have
to make sure that the reused alloca has sufficient alignment.

llvm-svn: 122236
2010-12-20 08:10:40 +00:00
Chris Lattner 0099744506 pull byval processing out to its own helper function.
llvm-svn: 122235
2010-12-20 07:57:41 +00:00
Chris Lattner 7394680a00 fix PR8769, a miscompilation by inliner when inlining a function with a byval
argument.  The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.

llvm-svn: 122234
2010-12-20 07:45:28 +00:00
Mon P Wang 1991c47ec1 Avoid dropping the address space when InstCombine optimizes memset
llvm-svn: 122215
2010-12-20 01:05:30 +00:00
Chris Lattner 4fb9dd4c74 fix an oversight caught by Frits!
llvm-svn: 122204
2010-12-19 23:24:04 +00:00
Chris Lattner b6252a376a tidy up
llvm-svn: 122190
2010-12-19 20:24:28 +00:00
Chris Lattner 3e635d2e99 move a transformation to a more logical place, simplifying it.
llvm-svn: 122183
2010-12-19 19:43:52 +00:00
Chris Lattner 5e0c0c72e9 recognize an unsigned add with overflow idiom into uadd.
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.

Previously we got:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	cmpq	%rdi, %rsi
	movl	$42, %eax
	cmovaq	%rsi, %rax
	ret

Now we get:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	movl	$42, %eax
	cmovbq	%rsi, %rax
	ret

llvm-svn: 122182
2010-12-19 19:37:52 +00:00
Chris Lattner 33dc3f0cfa optimize uadd(x, cst) into a comparison when the normal
result is dead.  This is required for my next patch to not
regress the testsuite.

llvm-svn: 122181
2010-12-19 19:35:32 +00:00
Chris Lattner ce2995ae58 use IC.ReplaceInstUsesWith instead of a raw RAUW so that uses of
the old thing end up on the instcombine worklist.  Not doing this
can cause an extra top-level iteration of instcombine, burning
compile time.

llvm-svn: 122179
2010-12-19 18:38:44 +00:00
Chris Lattner 79874566ce generalize the sadd creation code to not require that the
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:

unsigned char X(char a, char b) {
  int res = a+b;
  if ((unsigned )(res+128) > 255U)
    abort();
  return res;
}

llvm-svn: 122178
2010-12-19 18:35:09 +00:00
Chris Lattner c56c845377 fix another miscompile in the llvm.sadd formation logic: it wasn't
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.

This is likely to be the fix for 8816.

llvm-svn: 122177
2010-12-19 18:22:06 +00:00
Chris Lattner f29562db25 fix a bug (possibly 8816) in the sadd forming xform: it isn't
profitable (or safe) to promote code when the add-with-constant
has other uses.

llvm-svn: 122175
2010-12-19 17:59:02 +00:00
Chris Lattner ee61c1d820 rework the code added in r122072 to pull it out to its own
helper function, clean up comments, and reduce indentation.
No functionality change.

llvm-svn: 122174
2010-12-19 17:52:50 +00:00
Chris Lattner 408a684d29 Enhance LICM to promote alias sets whose pointers themselves are stored,
which doesn't affect the memory address being promoted.

llvm-svn: 122172
2010-12-19 05:57:25 +00:00
Chris Lattner 3337a81450 fix PR8602, a bug in an assertion: a volatile store *of* a pointer
does not make the alias set for that pointer volatile, just stores
*to* the pointer.

llvm-svn: 122171
2010-12-19 05:51:54 +00:00
Chris Lattner fb888622c3 revert r122164, I'm going to go with a different approach.
llvm-svn: 122168
2010-12-19 04:23:03 +00:00
Chris Lattner 583ec6fa44 first step to fixing PR8642: don't fold away empty basic blocks
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.

llvm-svn: 122164
2010-12-19 03:02:34 +00:00
Chris Lattner 6b8b4855ff simplify this a bit.
llvm-svn: 122156
2010-12-18 20:22:49 +00:00
Bill Wendling 5e3605552e Whitespace fixes. No functionality change.
llvm-svn: 122110
2010-12-17 23:27:41 +00:00
Nate Begeman 7aa18bf46a Add vector versions of some existing scalar transforms to aid codegen in matching psign & pblend operations to the IR produced by clang/gcc for their C idioms.
llvm-svn: 122105
2010-12-17 23:12:19 +00:00
Owen Anderson 1294ea7d53 Reapply r121905 (automatic synthesis of @llvm.sadd.with.overflow) with a fix for a bug that manifested itself
on the DragonEgg self-host bot.  Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.

llvm-svn: 122072
2010-12-17 18:08:00 +00:00
Benjamin Kramer e5f49c4ff2 SimplifyCFG: Ranges can be larger than 64 bits. Fixes Release-selfhost build.
llvm-svn: 122054
2010-12-17 10:48:14 +00:00
Chris Lattner d14b0f1db7 improve switch formation to handle small range
comparisons formed by comparisons.  For example,
this:

void foo(unsigned x) {
  if (x == 0 || x == 1 || x == 3 || x == 4 || x == 6) 
    bar();
}

compiles into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$6, %edi
	ja	LBB0_2
## BB#1:                                ## %entry
	movl	%edi, %eax
	movl	$91, %ecx
	btq	%rax, %rcx
	jb	LBB0_3

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$2, %edi
	jb	LBB0_4
## BB#1:                                ## %switch.early.test
	cmpl	$6, %edi
	ja	LBB0_3
## BB#2:                                ## %switch.early.test
	movl	%edi, %eax
	movl	$88, %ecx
	btq	%rax, %rcx
	jb	LBB0_4

This catches a bunch of cases in GCC, which look like this:

 %804 = load i32* @which_alternative, align 4, !tbaa !0
 %805 = icmp ult i32 %804, 2
 %806 = icmp eq i32 %804, 3
 %or.cond121 = or i1 %805, %806
 %807 = icmp eq i32 %804, 4
 %or.cond124 = or i1 %or.cond121, %807
 br i1 %or.cond124, label %.thread, label %808

turning this into a range comparison.

llvm-svn: 122045
2010-12-17 06:20:15 +00:00
Dan Gohman 93dc2b808f Revert r64460. strtol and friends cannot be marked readonly, even with
a null endptr argument, because they may write to errno.

This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.

llvm-svn: 122014
2010-12-17 01:09:43 +00:00
Frits van Bommel 9bbe849fc3 Fix a bug in the loop in JumpThreading::ProcessThreadableEdges() where it could falsely produce a MultipleDestSentinel value if the first predecessor ended with an 'indirectbr'. If that happened, it caused an unnecessary FindMostPopularDest() call.
This wasn't a correctness problem, but it broke the fast path for single-predecessor blocks.

llvm-svn: 121966
2010-12-16 12:16:00 +00:00
Duncan Sands 8d1ab6f6e1 Speculatively revert commit 121905 since it looks like it might have broken the
dragonegg self-host buildbot.  Original commit message:

Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

llvm-svn: 121965
2010-12-16 09:40:54 +00:00
Dan Gohman e1a17a3473 Make memcpyopt TBAA-aware.
llvm-svn: 121944
2010-12-16 02:51:19 +00:00
Dan Gohman 4467aa5294 Preserve TBAA tags when doing load PRE.
llvm-svn: 121921
2010-12-15 23:53:55 +00:00
Owen Anderson 1cf8881299 Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

Fixes <rdar://problem/8558713>.

llvm-svn: 121905
2010-12-15 22:32:38 +00:00
Dan Gohman a4fcd2418d Move Value::getUnderlyingObject to be a standalone
function so that it can live in Analysis instead of
VMCore.

llvm-svn: 121885
2010-12-15 20:02:24 +00:00
Duncan Sands 0a2c416894 Move Sub simplifications and additional Add simplifications out of
instcombine and into InstructionSimplify.

llvm-svn: 121861
2010-12-15 14:07:39 +00:00
Frits van Bommel 3d1803495e Teach jump threading to "look through" a select when the branch direction of a terminator depends on it.
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.

llvm-svn: 121859
2010-12-15 09:51:20 +00:00
Chris Lattner e893e2601e make qsort predicate more conformant by returning 0 for equal values.
llvm-svn: 121838
2010-12-15 04:52:41 +00:00
Owen Anderson 35609d97ae Fix PR8790, another instance where unreachable code can cause instruction simplification to fail,
this case involve a select that simplifies to itself.

llvm-svn: 121817
2010-12-15 00:55:35 +00:00
Owen Anderson 15c85c916f Cleanup trailing whitespace.
llvm-svn: 121816
2010-12-15 00:52:44 +00:00
Chris Lattner 7499b452c1 - Insert new instructions before DomBlock's terminator,
which is simpler than finding a place to insert in BB.
 - Don't perform the 'if condition hoisting' xform on certain
   i1 PHIs, as it interferes with switch formation.

This re-fixes "example 7", without breaking the world hopefully.

llvm-svn: 121764
2010-12-14 08:46:09 +00:00
Chris Lattner 335f0e4ad4 fix two significant issues with FoldTwoEntryPHINode:
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.

second, it doesn't clean up the "if diamond" that it just 
eliminated away.  This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.

llvm-svn: 121762
2010-12-14 08:01:53 +00:00
Chris Lattner dc20a7d38c remove the instsimplify logic I added in r121754. It is apparently
breaking the selfhost builds, though I can't fathom how.

llvm-svn: 121761
2010-12-14 07:53:03 +00:00
Chris Lattner 9ac168d0ab clean up logic, convert std::set to SmallPtrSet, handle the case
when all 2-entry phis are simplified away.

llvm-svn: 121760
2010-12-14 07:41:39 +00:00
Chris Lattner 9fd838d31b tidy up a bit, move DEBUG down to when we commit to doing the transform so we
don't print it unless the xform happens.

llvm-svn: 121758
2010-12-14 07:23:10 +00:00
Chris Lattner b42d293faa use SimplifyInstruction instead of reimplementing part of it.
llvm-svn: 121757
2010-12-14 07:20:29 +00:00
Chris Lattner fb73de482c simplify GetIfCondition by using getSinglePredecessor.
llvm-svn: 121756
2010-12-14 07:15:21 +00:00
Chris Lattner 0f4d67bd88 use AddPredecessorToBlock in 3 places instead of a manual loop.
llvm-svn: 121755
2010-12-14 07:09:42 +00:00
Chris Lattner a07cc6f4fd make FoldTwoEntryPHINode use instsimplify a bit, make
GetIfCondition faster by avoiding pred_iterator.  No
really interesting change.

llvm-svn: 121754
2010-12-14 07:00:00 +00:00
Chris Lattner afd2a8cfbb remove the dead (and terrible) llvm::RemoveSuccessor function.
llvm-svn: 121753
2010-12-14 06:51:55 +00:00
Chris Lattner d7beca3782 improve DEBUG's a bit, switch to eraseFromParent() to simplify
code a bit, switch from constant folding to instsimplify.

llvm-svn: 121751
2010-12-14 06:17:25 +00:00
Chris Lattner 5a9d59d918 reapply my recent change that disables a piece of the switch formation
work, but fixes 400.perlbmk.

llvm-svn: 121749
2010-12-14 05:57:30 +00:00
Owen Anderson 3e5648896e Fix recent buildbot breakage by pulling SimplifyCFG back to its state as of r121694, the most recent state
where I'm confident there were no crashes or miscompilations.  XFAIL the test added since then for now.

llvm-svn: 121733
2010-12-13 23:49:28 +00:00
Chris Lattner a6e5d5694a temporarily disable part of my previous patch, which causes an iterator invalidation issue, causing a crash on some versions of perlbmk.
llvm-svn: 121728
2010-12-13 23:02:19 +00:00
Chris Lattner 2d434e594e add some DEBUG's.
llvm-svn: 121711
2010-12-13 19:55:30 +00:00
Benjamin Kramer 1e155ab7e1 Fix sort predicate. qsort(3)'s predicate semantics differ from std::sort's. Fixes PR 8780.
llvm-svn: 121705
2010-12-13 18:20:38 +00:00
Chris Lattner fb836f8c1a reinstate my patch: the miscompile was caused by an inverted branch in the
'and' case.

llvm-svn: 121695
2010-12-13 08:12:19 +00:00
Chris Lattner 79db357d80 Completely disable the optimization I added in r121680 until
I can track down a miscompile.  This should bring the buildbots
back to life

llvm-svn: 121693
2010-12-13 07:41:29 +00:00
Chris Lattner fbeb55844b Make simplifycfg reprocess newly formed "br (cond1 | cond2)" conditions
when simplifying, allowing them to be eagerly turned into switches.  This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320

On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:

_crud:                                  ## @crud
## BB#0:                                ## %entry
	cmpb	$33, %dil
	jb	LBB0_4
## BB#1:                                ## %switch.early.test
	addb	$-34, %dil
	cmpb	$58, %dil
	ja	LBB0_3
## BB#2:                                ## %switch.early.test
	movzbl	%dil, %eax
	movabsq	$288230376537592865, %rcx ## imm = 0x400000017001421
	btq	%rax, %rcx
	jb	LBB0_4
LBB0_3:                                 ## %lor.rhs
	xorl	%eax, %eax
	ret
LBB0_4:                                 ## %lor.end
	movl	$1, %eax
	ret

llvm-svn: 121690
2010-12-13 07:00:06 +00:00
Chris Lattner 1d05761df4 make this logic a bit simpler.
llvm-svn: 121689
2010-12-13 06:36:51 +00:00
Chris Lattner 25c3af35d8 split all the guts of SimplifyCFGOpt::run out into one function
per terminator kind.

llvm-svn: 121688
2010-12-13 06:25:44 +00:00
Chris Lattner cb570f87e5 fix a bug in r121680 that upset the various buildbots.
llvm-svn: 121687
2010-12-13 05:34:18 +00:00
Chris Lattner a6db741f3d refactor the speculative execution logic to be factored into the cond branch code instead of
doing a cfg search for every block simplified.

llvm-svn: 121686
2010-12-13 05:26:52 +00:00
Chris Lattner 466f54ffcf simplify a bunch of code.
llvm-svn: 121685
2010-12-13 05:20:28 +00:00
Chris Lattner 6df7bdd810 move HoistThenElseCodeToIf up to a more logical and efficient-to-handle place.
llvm-svn: 121684
2010-12-13 05:15:29 +00:00
Chris Lattner 2e3832d9a0 move 'MergeBlocksIntoPredecessor' call earlier. Use
getSinglePredecessor to simplify code.

llvm-svn: 121683
2010-12-13 05:10:48 +00:00
Chris Lattner a69c443459 factor new code out to a SimplifyBranchOnICmpChain helper function.
llvm-svn: 121681
2010-12-13 05:03:41 +00:00
Chris Lattner a442f24a36 enhance the "change or icmp's into switch" xform to handle one value in an
'or sequence' that it doesn't understand.  This allows us to optimize
something insane like this:

int crud (unsigned char c, unsigned x)
 {
   if(((((((((( (int) c <= 32 ||
                    (int) c == 46) || (int) c == 44)
                  || (int) c == 58) || (int) c == 59) || (int) c == 60)
               || (int) c == 62) || (int) c == 34) || (int) c == 92)
            || (int) c == 39) != 0)
     foo();
 }

into:

define i32 @crud(i8 zeroext %c, i32 %x) nounwind ssp noredzone {
entry:
  %cmp = icmp ult i8 %c, 33
  br i1 %cmp, label %if.then, label %switch.early.test

switch.early.test:                                ; preds = %entry
  switch i8 %c, label %if.end [
    i8 39, label %if.then
    i8 44, label %if.then
    i8 58, label %if.then
    i8 59, label %if.then
    i8 60, label %if.then
    i8 62, label %if.then
    i8 46, label %if.then
    i8 92, label %if.then
    i8 34, label %if.then
  ]

by pulling the < comparison out ahead of the newly formed switch.

llvm-svn: 121680
2010-12-13 04:50:38 +00:00
Chris Lattner 5a177e681e merge two very similar functions into one that has a bool argument.
llvm-svn: 121678
2010-12-13 04:26:26 +00:00
Chris Lattner 9b1af510cb don't bother handling non-canonical icmp's
llvm-svn: 121676
2010-12-13 04:18:32 +00:00
Chris Lattner 395252d93e inline a function, making the result much simpler.
llvm-svn: 121675
2010-12-13 04:15:19 +00:00
Chris Lattner 62cc76e9cc Fix my previous patch to handle a degenerate case that the llvm-gcc
bootstrap buildbot tripped over.

llvm-svn: 121674
2010-12-13 03:43:57 +00:00
Chris Lattner 11dafaa3ec convert some methods to be static functions
llvm-svn: 121673
2010-12-13 03:30:12 +00:00
Chris Lattner 4642d79fb0 zap two more std::sorts.
llvm-svn: 121672
2010-12-13 03:24:30 +00:00
Chris Lattner d9bacc088a fix a fairly serious oversight with switch formation from
or'd conditions.  Previously we'd compile something like this:

int crud (unsigned char c) {
   return c == 62 || c == 34 || c == 92;
}

into:

  switch i8 %c, label %lor.rhs [
    i8 62, label %lor.end
    i8 34, label %lor.end
  ]

lor.rhs:                                          ; preds = %entry
  %cmp8 = icmp eq i8 %c, 92
  br label %lor.end

lor.end:                                          ; preds = %entry, %entry, %lor.rhs
  %0 = phi i1 [ true, %entry ], [ %cmp8, %lor.rhs ], [ true, %entry ]
  %lor.ext = zext i1 %0 to i32
  ret i32 %lor.ext

which failed to merge the compare-with-92 into the switch.  With this patch
we simplify this all the way to:

  switch i8 %c, label %lor.rhs [
    i8 62, label %lor.end
    i8 34, label %lor.end
    i8 92, label %lor.end
  ]

lor.rhs:                                          ; preds = %entry
  br label %lor.end

lor.end:                                          ; preds = %entry, %entry, %entry, %lor.rhs
  %0 = phi i1 [ true, %entry ], [ false, %lor.rhs ], [ true, %entry ], [ true, %entry ]
  %lor.ext = zext i1 %0 to i32
  ret i32 %lor.ext

which is much better for codegen's switch lowering stuff.  This kicks in 33 times
on 176.gcc (for example) cutting 103 instructions off the generated code.

llvm-svn: 121671
2010-12-13 03:18:54 +00:00
Chris Lattner 73a58627c3 simplify code and reduce indentation
llvm-svn: 121670
2010-12-13 02:38:13 +00:00
Chris Lattner 7c8e6047d6 convert an std::sort to array_pod_sort.
llvm-svn: 121669
2010-12-13 02:00:58 +00:00
Chris Lattner 1475987634 move the "br (X == 0 | X == 1), T, F" -> switch optimization to a new
location in simplifycfg.  In the old days, SimplifyCFG was never run on
the entry block, so we had to scan over all preds of the BB passed into
simplifycfg to do this xform, now we can just check blocks ending with
a condbranch.  This avoids a scan over all preds of every simplified 
block, which should be a significant compile-time perf win on functions
with lots of edges.  No functionality change.

llvm-svn: 121668
2010-12-13 01:57:34 +00:00
Chris Lattner 4088e2b8e4 reduce indentation and generally simplify code, no functionality change.
llvm-svn: 121667
2010-12-13 01:47:07 +00:00
Chris Lattner 7cb7867d7a use getFirstNonPHIOrDbg to simplify this code.
llvm-svn: 121664
2010-12-13 01:28:06 +00:00
Benjamin Kramer c4169cebe3 Generalize the and-icmp-select instcombine further by allowing selects of the form
(x & 2^n) ? 2^m+C : C

we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.

llvm-svn: 121609
2010-12-11 10:49:22 +00:00
Benjamin Kramer c8b035d006 Factor the (x & 2^n) ? 2^m : 0 instcombine into its own method and generalize it
to catch cases where n != m with a shift.

llvm-svn: 121608
2010-12-11 09:42:59 +00:00
Chris Lattner bc4457e317 enhance memcpyopt to zap memcpy's that have the same src/dst.
llvm-svn: 121362
2010-12-09 07:45:45 +00:00
Chris Lattner fd51c52ef6 fix PR8753, eliminating a case where we'd infinitely make a
substitution because it doesn't actually change the IR.  Patch by
Jakub Staszak!

llvm-svn: 121361
2010-12-09 07:39:50 +00:00
Dan Gohman a32986e899 Really check that the bits that will become zero are actually already zero
before eliminating the operation that zeros them. This fixes rdar://8739316.

llvm-svn: 121353
2010-12-09 02:52:17 +00:00
Frits van Bommel d2f4b09e10 Remove some dead code from the jump threading pass.
The last uses of these functions were removed in r113852 when LazyValueInfo was permanently enabled and removed the need for them.

llvm-svn: 121133
2010-12-07 13:08:07 +00:00
Jay Foad 583abbc4df PR5207: Change APInt methods trunc(), sext(), zext(), sextOrTrunc() and
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.

llvm-svn: 121120
2010-12-07 08:25:19 +00:00
Chris Lattner 0d71c4f564 reapply r121100 with a tweak to constant fold ConstExprs with TargetData
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.

llvm-svn: 121111
2010-12-07 04:33:29 +00:00
Eric Christopher f10dcfb9fb Temporarily revert r121100 as it's causing clang to fail
CodeGenCXX/virtual-base-ctor.cpp.

llvm-svn: 121102
2010-12-07 02:41:11 +00:00
Chris Lattner 287f4366c1 fix PR8710 - teach global opt that some constantexprs are too complex to
put in a global variable's initializer.

llvm-svn: 121100
2010-12-07 01:59:32 +00:00
Frits van Bommel d9df6eaa9c Implement jump threading of 'indirectbr' by keeping track of whether we're looking for ConstantInt*s or BlockAddress*s.
llvm-svn: 121066
2010-12-06 23:36:56 +00:00
Chris Lattner 7ff0ba41bd replace a linear scan with a symtab lookup, reduce indentation.
No functionality change.

llvm-svn: 121042
2010-12-06 21:53:07 +00:00
Chris Lattner 4dc53e37d9 Use a stronger predicate here, pointed out by Duncan
llvm-svn: 121040
2010-12-06 21:48:10 +00:00
Chris Lattner ca335e38cf add some DEBUG statements.
llvm-svn: 121038
2010-12-06 21:13:51 +00:00
Chris Lattner fb212de06d Fix PR8735, a really terrible problem in the inliner's "alloca merging"
optimization.

Consider:
static void foo() {
  A = alloca
  ...
}

static void bar() {
  B = alloca
  ...
  call foo();
}

void main() {
  bar()
}

The inliner proceeds bottom up, but lets pretend it decides not to inline foo
into bar.  When it gets to main, it inlines bar into main(), and says "hey, I
just inlined an alloca "B" into main, lets remember that.  Then it keeps going
and finds that it now contains a call to foo.  It decides to inline foo into
main, and says "hey, foo has an alloca A, and I have an alloca B from another
inlined call site, lets reuse it".  The problem with this of course, is that 
the lifetime of A and B are nested, not disjoint.

Unfortunately I can't create a reasonable testcase for this: the one in the
PR is both huge and extremely sensitive, because you minor tweaks end up
causing foo to get inlined into bar too early.  We already have tests for the
basic alloca merging optimization and this does not break them.

llvm-svn: 120995
2010-12-06 07:52:42 +00:00
Chris Lattner cd3af96a8f improve comment
llvm-svn: 120994
2010-12-06 07:43:04 +00:00
Chris Lattner 5b6a865f2e improve -debug output and comments a little.
llvm-svn: 120993
2010-12-06 07:38:40 +00:00
Chris Lattner 94fbdf3814 Fix PR8728, a miscompilation I recently introduced. When optimizing
memcpy's like:
  memcpy(A, B)
  memcpy(A, C)

we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:

  memcpy(A, B)
  memcpy(A, A)

which is not correct to transform into:

  memcpy(A, A)

This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!

llvm-svn: 120974
2010-12-06 01:48:06 +00:00
Frits van Bommel 76244867cf Refactor jump threading.
Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output.
Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls.

llvm-svn: 120946
2010-12-05 19:06:41 +00:00
Frits van Bommel 5e75ef4a8e Remove trailing whitespace.
llvm-svn: 120945
2010-12-05 19:02:47 +00:00
Frits van Bommel 8fb69ee805 Teach SimplifyCFG to turn
(indirectbr (select cond, blockaddress(@fn, BlockA),
                            blockaddress(@fn, BlockB)))
into
  (br cond, BlockA, BlockB).

llvm-svn: 120943
2010-12-05 18:29:03 +00:00
Jay Foad 25a5e4ca1f PR5207: Rename overloaded APInt methods set(), clear(), flip() to
setAllBits(), setBit(unsigned), etc.

llvm-svn: 120564
2010-12-01 08:53:58 +00:00
Chris Lattner 1c577b54b0 fix a bozo bug I introduced in r119930, causing a miscompile of
20040709-1.c from the gcc testsuite.  I was using the size of a
pointer instead of the pointee.  This fixes rdar://8713376

llvm-svn: 120519
2010-12-01 01:24:55 +00:00
Chris Lattner 903add84d9 Enhance DSE to handle the variable index case in PR8657.
llvm-svn: 120498
2010-11-30 23:43:23 +00:00
Chris Lattner c0f3379ae0 teach DSE to use GetPointerBaseWithConstantOffset to analyze
may-aliasing stores that partially overlap with different base
pointers.  This implements PR6043 and the non-variable part of
PR8657

llvm-svn: 120485
2010-11-30 23:05:20 +00:00
Chris Lattner e28618de59 move GetPointerBaseWithConstantOffset out of GVN into ValueTracking.h
llvm-svn: 120476
2010-11-30 22:25:26 +00:00
Chris Lattner 50162e3c2a remove a fixed fixme
llvm-svn: 120474
2010-11-30 22:18:11 +00:00
Chris Lattner 6712251f41 Make DeleteDeadInstruction be a static function, move some code around.
llvm-svn: 120471
2010-11-30 21:58:14 +00:00
Chris Lattner 51d67ce2ff switch RemoveAccessedObjects to use AliasAnalysis::Location to simplify
the code.  We now get accurate sizes on Loads, though it surely doesn't
matter in practice.

llvm-svn: 120469
2010-11-30 21:47:58 +00:00
Chris Lattner f80b39986f two improvements to RemoveAccessedObjects:
1. if the underlying pointer passed in can be resolved
   to any argument or alloca, then we don't need to scan.
   Previously we would only avoid the scan if the alloca
   or byval was actually considered dead.
2. The dead store processing code is itself completely
   dead and didn't handle volatile stores right anyway,
   so delete it.  This allows simplifying the interface
   to RemoveAccessedObjects.

llvm-svn: 120467
2010-11-30 21:38:30 +00:00
Chris Lattner 7fe08b67fa remove the "undead" terminology, which is nonstandard and never
made sense to me.  We now have a set of dead stack objects, and
they become live when loaded.  Fix a theoretical problem where
we'd pass in the wrong pointer to the alias query.

llvm-svn: 120465
2010-11-30 21:32:12 +00:00
Chris Lattner 127818d746 move call handling in handleEndBlock up a bit, and simplify it.
If the call might read all the allocas, stop scanning early.
Convert a vector to smallvector, shrink SmallPtrSet to 16 instead
of 64 to avoid crazy linear scans.

llvm-svn: 120463
2010-11-30 21:18:46 +00:00
Dale Johannesen d3a58c8fa1 Avoid exponential growth of a table. It feels like
there should be a better way to do this.  PR 8679.

llvm-svn: 120457
2010-11-30 20:23:21 +00:00
Chris Lattner 60a8b3dab8 various cleanups and code simplification
llvm-svn: 120454
2010-11-30 19:48:15 +00:00
Chris Lattner 51c28a93cc make getPointerSize a static function. Add ivars to DSE for
AA and MD pass info instead of using getAnalysis<> all over.

llvm-svn: 120453
2010-11-30 19:34:42 +00:00
Chris Lattner 77d79fa25f reduce indentation, clean up TD use a bit.
llvm-svn: 120452
2010-11-30 19:28:23 +00:00
Chris Lattner b63ba73b1b enhance isRemovable to refuse to delete volatile mem transfers
now that DSE hacks on them.  This fixes a regression I introduced,
by generalizing DSE to hack on transfers.

llvm-svn: 120445
2010-11-30 19:12:10 +00:00
Chris Lattner 58b779e9c2 Rewrite the main DSE loop to be written in terms of reasoning
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate.  This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.

This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet.  Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.

llvm-svn: 120406
2010-11-30 07:23:21 +00:00
Anders Carlsson e3ea1cba79 Add a puts optimization that converts puts() to putchar('\n').
llvm-svn: 120398
2010-11-30 06:19:18 +00:00
Chris Lattner 3590ef817c rename a function and reduce some indentation, no functionality change.
llvm-svn: 120391
2010-11-30 05:30:45 +00:00
Chris Lattner b438ef236c remove the pointless check of MemoryUseIntrinsic from
is trivially dead, since these have side effects.  This makes the
(misnamed) MemoryUseIntrinsic class dead, so remove it.

llvm-svn: 120382
2010-11-30 02:03:47 +00:00
Chris Lattner 2227a8a192 rename doesClobberMemory -> hasMemoryWrite to be more specific, and
remove an actively-wrong comment.

llvm-svn: 120378
2010-11-30 01:37:52 +00:00
Chris Lattner 9d179d911d clean up handling of 'free', detangling it from everything else.
It can be seriously improved, but at least now it isn't intertwined
with the other logic.

llvm-svn: 120377
2010-11-30 01:28:33 +00:00
Chris Lattner 9a146372b5 Teach basicaa that memset's modref set is at worst "mod" and never
contains "ref".

Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.

llvm-svn: 120368
2010-11-30 00:28:45 +00:00
Chris Lattner c3c754f750 my previous patch would cause us to start deleting some volatile
stores, fix and add a testcase.

llvm-svn: 120363
2010-11-30 00:12:39 +00:00
Chris Lattner d4f1090948 two changes to DSE that shouldn't affect anything:
1. Don't bother trying to optimize:

lifetime.end(ptr)
store(ptr)

as it is undefined, and therefore shouldn't exist.

2. Move the 'storing a loaded pointer' xform up, simplifying
  the may-aliased store code.

llvm-svn: 120359
2010-11-30 00:01:19 +00:00
Chris Lattner b4df1d5a3e prune an llvmcontext include and simplify some code.
llvm-svn: 120347
2010-11-29 23:35:33 +00:00
Chris Lattner 2e8793482c fix PR8677, patch by Jakub Staszak!
llvm-svn: 120325
2010-11-29 21:59:31 +00:00
Frits van Bommel 28218aa8f1 Transform (extractvalue (load P), ...) to (load (gep P, 0, ...)) if the load has no other uses, shrinking the load.
llvm-svn: 120323
2010-11-29 21:56:20 +00:00
Owen Anderson 8ba5f39f70 Second attempt at fixing the performance regressions introduced
by my recent GVN improvement.  Looking through a single layer of
PHI nodes when attempting to sink GEPs, we need to iteratively
look through arbitrary PHI nests.

llvm-svn: 120202
2010-11-27 08:15:55 +00:00
Nick Lewycky b8de00ee07 Treat a call of function pointer like a load of the pointer when considering
whether the pointer can be replaced with the global variable it is a copy of.
Fixes PR8680.

llvm-svn: 120126
2010-11-24 22:04:20 +00:00
Duncan Sands 0488d564e1 Rename SimplifyDistributed to the more meaningfull name SimplifyByFactorizing.
llvm-svn: 120051
2010-11-23 20:42:39 +00:00
Benjamin Kramer 94a622af4c The srem -> urem transform is not safe for any divisor that's not a power of two.
E.g. -5 % 5 is 0 with srem and 1 with urem.

Also addresses Frits van Bommel's comments.

llvm-svn: 120049
2010-11-23 20:33:57 +00:00
Duncan Sands 433c1679cf Replace calls to ConstantFoldInstruction with calls to SimplifyInstruction
in two places that are really interested in simplified instructions, not
constants.

llvm-svn: 120044
2010-11-23 20:26:33 +00:00
Duncan Sands bb2cd025a9 Constant folding here is pointless, because InstructionSimplify
(which does constant folding and more) is called a few lines
later.

llvm-svn: 120042
2010-11-23 20:24:21 +00:00
Benjamin Kramer b5afa65b0a InstCombine: Reduce "X shift (A srem B)" to "X shift (A urem B)" iff B is positive.
This allows to transform the rem in "1 << ((int)x % 8);" to an and.

llvm-svn: 120028
2010-11-23 18:52:42 +00:00
Duncan Sands 60813f96e0 Propagate LeftDistributes and RightDistributes into their only uses.
Stylistic improvement suggested by Frits van Bommel.

llvm-svn: 120026
2010-11-23 15:28:14 +00:00
Duncan Sands 22df741687 Fix typo pointed out by Frits van Bommel and Marius Wachtler.
llvm-svn: 120025
2010-11-23 15:25:34 +00:00
Duncan Sands adc7771f18 Exploit distributive laws (eg: And distributes over Or, Mul over Add, etc) in a
fairly systematic way in instcombine.  Some of these cases were already dealt
with, in which case I removed the existing code.  The case of Add has a bunch of
funky logic which covers some of this plus a few variants (considers shifts to be
a form of multiplication), which I didn't touch.  The simplification performed is:
A*B+A*C -> A*(B+C).  The improvement is to do this in cases that were not already
handled [such as A*B-A*C -> A*(B-C), which was reported on the mailing list], and
also to do it more often by not checking for "only one use" if "B+C" simplifies.

llvm-svn: 120024
2010-11-23 14:23:47 +00:00
Chris Lattner e5afa15b77 duncan's spider sense was right, I completely reversed the condition
on this instcombine xform.  This fixes a miscompilation of 403.gcc.

llvm-svn: 119988
2010-11-23 02:42:04 +00:00
Benjamin Kramer f1ebb63161 InstCombine: Implement X - A*-B -> X + A*B.
llvm-svn: 119984
2010-11-22 20:31:27 +00:00
Duncan Sands c133c54426 If a GEP index simply advances by multiples of a type of zero size,
then replace the index with zero.

llvm-svn: 119974
2010-11-22 16:32:50 +00:00
Duncan Sands 8a0f486e36 Move the "gep undef" -> "undef" transform from instcombine to
InstructionSimplify.

llvm-svn: 119970
2010-11-22 13:42:49 +00:00
Duncan Sands c6648eb4c3 Don't keep track of inserted phis in PromoteMemoryToRegister: the information
is never used.  Patch by Cameron Zwarich.

llvm-svn: 119963
2010-11-22 09:41:24 +00:00
Chris Lattner fc9aead6fd fix comment
llvm-svn: 119948
2010-11-21 19:05:34 +00:00
Chris Lattner 5957229659 rework some DSE paths to use the newly-public "getPointerDependencyFrom"
method in MemDep instead of inserting an instruction, doing a query,
then removing it.  Neither operation is effectively cached.

llvm-svn: 119930
2010-11-21 08:06:10 +00:00
Chris Lattner e48c31ce33 implement PR8576, deleting dead stores with intervening may-alias stores.
llvm-svn: 119927
2010-11-21 07:34:32 +00:00
Chris Lattner f7e896138e optimize:
void a(int x) { if (((1<<x)&8)==0) b(); }

into "x != 3", which occurs over 100 times in 403.gcc but in no
other program in llvm-test.

llvm-svn: 119922
2010-11-21 06:44:42 +00:00
Chris Lattner 58f9f58716 Implement PR8644: forwarding a memcpy value to a byval,
allowing the memcpy to be eliminated.

Unfortunately, the requirements on byval's without explicit 
alignment are really weak and impossible to predict in the 
mid-level optimizer, so this doesn't kick in much with current
frontends.  The fix is to change clang to set alignment on all
byval arguments.

llvm-svn: 119916
2010-11-21 00:28:59 +00:00
Benjamin Kramer ddd1b7b801 Simplify code. No change in functionality.
llvm-svn: 119908
2010-11-20 18:43:35 +00:00
Owen Anderson ea326db47b Document the new GVN number table structure.
llvm-svn: 119865
2010-11-19 22:48:40 +00:00
Owen Anderson dfb8c3bbfc When folding addressing modes in CodeGenPrepare, attempt to look through PHI nodes
if all the operands of the PHI are equivalent.  This allows CodeGenPrepare to undo
unprofitable PRE transforms.

llvm-svn: 119853
2010-11-19 22:15:03 +00:00
Duncan Sands aef146b890 Factor code for testing whether replacing one value with another
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class.  Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form.  Fixes PR8622.

llvm-svn: 119727
2010-11-18 19:59:41 +00:00
Owen Anderson c21c100f3d Completely rework the datastructure GVN uses to represent the value number to leader mapping. Previously,
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed.  This led to really bad performance on tall, narrow CFGs.

We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree.  Because there are typically few duplicates of a given value, this scan tends to be
quite fast.  Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.

This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.

The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack.  If you see conditional propagation that's not happening, please file bugs against LVI or CVP.

llvm-svn: 119714
2010-11-18 18:32:40 +00:00
Chris Lattner 1385dff8c0 slightly simplify code and substantially improve comment. Instead of
saying "it would be bad", give an example of what is going on.

llvm-svn: 119695
2010-11-18 08:07:09 +00:00
Chris Lattner 731caac7c6 remove a pointless restriction from memcpyopt. It was
refusing to optimize two memcpy's like this:

copy A <- B
copy C <- A

if it couldn't prove that noalias(B,C).  We can eliminate
the copy by producing a memmove instead of memcpy.

llvm-svn: 119694
2010-11-18 08:00:57 +00:00
Chris Lattner c274a83442 remove another pointless noalias check: M is a memcpy, so the
source and dest are known to not overlap.

llvm-svn: 119692
2010-11-18 07:39:57 +00:00
Chris Lattner 75cfe98534 use AA::isNoAlias instead of open coding it. Remove an extraneous noalias check:
there is no need to check to see if the source and dest of a memcpy are noalias,
behavior is undefined if not.

llvm-svn: 119691
2010-11-18 07:38:43 +00:00
Chris Lattner 1e37bbafbb finish a thought.
llvm-svn: 119690
2010-11-18 07:32:33 +00:00
Chris Lattner 7e9b2ea3bf rearrange some code, splitting memcpy/memcpy optimization
out of processMemCpy into its own function.

llvm-svn: 119687
2010-11-18 07:02:37 +00:00
Chris Lattner ac5701319b allow eliminating an alloca that is just copied from an constant global
if it is passed as a byval argument.  The byval argument will just be a
read, so it is safe to read from the original global instead.  This allows
us to promote away the %agg.tmp alloca in PR8582

llvm-svn: 119686
2010-11-18 06:41:51 +00:00
Chris Lattner f183d5c4be enhance the "alloca is just a memcpy from constant global"
to ignore calls that obviously can't modify the alloca
because they are readonly/readnone.

llvm-svn: 119683
2010-11-18 06:26:49 +00:00
Chris Lattner 7aeae25c78 fix a small oversight in the "eliminate memcpy from constant global"
optimization.  If the alloca that is "memcpy'd from constant" also has
a memcpy from *it*, ignore it: it is a load.  We now optimize the testcase to:

define void @test2() {
  %B = alloca %T
  %a = bitcast %T* @G to i8*
  %b = bitcast %T* %B to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false)
  call void @bar(i8* %b)
  ret void
}

previously we would generate:

define void @test() {
  %B = alloca %T
  %b = bitcast %T* %B to i8*
  %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0
  %tmp3 = load i8* %G.0, align 4
  %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1
  %G.15 = bitcast [123 x i8]* %G.1 to i8*
  %1 = bitcast [123 x i8]* %G.1 to i984*
  %srcval = load i984* %1, align 1
  %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0
  store i8 %tmp3, i8* %B.0, align 4
  %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1
  %B.12 = bitcast [123 x i8]* %B.1 to i8*
  %2 = bitcast [123 x i8]* %B.1 to i984*
  store i984 %srcval, i984* %2, align 1
  call void @bar(i8* %b)
  ret void
}

llvm-svn: 119682
2010-11-18 06:20:47 +00:00
Dan Gohman 20d9ce21ef Move SCEV::dominates and properlyDominates to ScalarEvolution.
llvm-svn: 119570
2010-11-17 21:41:58 +00:00
Dan Gohman afd6db9932 Move SCEV::isLoopInvariant and hasComputableLoopEvolution to be member
functions of ScalarEvolution, in preparation for memoization and
other optimizations.

llvm-svn: 119562
2010-11-17 21:23:15 +00:00
Dan Gohman 1ee6d24072 Reference ScalarEvolution by name rather than directly in LICM,
to avoid an unneeded dependence.

llvm-svn: 119557
2010-11-17 20:50:07 +00:00
Benjamin Kramer 07726c7d52 InstCombine: Add a missing irem identity (X % X -> 0).
llvm-svn: 119538
2010-11-17 19:11:46 +00:00
Duncan Sands c89ac07e7a Move some those Xor simplifications which don't require creating new
instructions out of InstCombine and into InstructionSimplify.  While
there, introduce an m_AllOnes pattern to simplify matching with integers
and vectors with all bits equal to one.

llvm-svn: 119536
2010-11-17 18:52:15 +00:00
Duncan Sands 9d9a4e2ca2 Have InlineFunction use SimplifyInstruction rather than
hasConstantValue.  I was leery of using SimplifyInstruction
while the IR was still in a half-baked state, which is the
reason for delaying the simplification until the IR is fully
cooked.

llvm-svn: 119494
2010-11-17 11:16:23 +00:00
Duncan Sands ba0b22c785 Have RemovePredecessorAndSimplify you SimplifyInstruction
rather than hasConstantValue.

llvm-svn: 119457
2010-11-17 04:12:05 +00:00
Duncan Sands 72313843d5 Remove dead code in GVN: now that SimplifyInstruction is called
systematically, CollapsePhi will always return null here.  Note
that CollapsePhi did an extra check, isSafeReplacement, which
the SimplifyInstruction logic does not do.  I think that check
was bogus - I guess we will soon find out!  (It was originally
added in commit 41998 without a testcase).

llvm-svn: 119456
2010-11-17 04:05:21 +00:00
Duncan Sands 637049515f Have a few places that want to simplify phi nodes use SimplifyInstruction
rather than calling hasConstantValue.  No intended functionality change.

llvm-svn: 119352
2010-11-16 17:41:24 +00:00
Duncan Sands b99f39b9f6 If dom tree information is available, make it possible to pass
it to get better phi node simplification.

llvm-svn: 119055
2010-11-14 18:36:10 +00:00
Duncan Sands 4581ddc123 Teach InstructionSimplify about phi nodes. I chose to have it simply
offload the work to hasConstantValue rather than do something more
complicated (such handling mutually recursive phis) because (1) it is
not clear it is worth it; and (2) if it is worth it, maybe such logic
would be better placed in hasConstantValue.  Adjust some GVN tests
which are now cleaned up much further (eg: all phi nodes are removed).

llvm-svn: 119043
2010-11-14 13:30:18 +00:00
Duncan Sands 641baf1646 Generalize the reassociation transform in SimplifyCommutative (now renamed to
SimplifyAssociativeOrCommutative) "(A op C1) op C2" -> "A op (C1 op C2)",
which previously was only done if C1 and C2 were constants, to occur whenever
"C1 op C2" simplifies (a la InstructionSimplify).  Since the simplifying operand
combination can no longer be assumed to be the right-hand terms, consider all of
the possible permutations.  When compiling "gcc as one big file", transform 2
(i.e. using right-hand operands) fires about 4000 times but it has to be said
that most of the time the simplifying operands are both constants.  Transforms
3, 4 and 5 each fired once.  Transform 6, which is an existing transform that
I didn't change, never fired.  With this change, the testcase is now optimized
perfectly with one run of instcombine (previously it required instcombine +
reassociate + instcombine, and it may just have been luck that this worked).

llvm-svn: 119002
2010-11-13 15:10:37 +00:00
Duncan Sands 246b71c596 Have GVN simplify instructions as it goes. For example, consider
"%z = %x and %y".  If GVN can prove that %y equals %x, then it turns
this into "%z = %x and %x".  With the new code, %z will be replaced
with %x everywhere (and then deleted).  Previously %z would be value
numbered too, which is a waste of time.  Also, while a clever value
numbering algorithm would give %z the same value number as %x, our
current one doesn't do so (at least I don't think it does).  The new
logic has an essentially equivalent effect to what you would get if
%z was given the same value number as %x, i.e. it should make value
numbering smarter.  While there, get hold of target data once at the
start rather than a gazillion times all over the place.

llvm-svn: 118923
2010-11-12 21:10:24 +00:00
Dan Gohman d4b7fff2e8 Enhance DSE to handle the case where a free call makes more than
one store dead. This is especially noticeable in
SingleSource/Benchmarks/Shootout/objinst.

llvm-svn: 118875
2010-11-12 02:19:17 +00:00
Dan Gohman 65316d6749 Add helper functions for computing the Location of load, store,
and vaarg instructions.

llvm-svn: 118845
2010-11-11 21:50:19 +00:00
Dan Gohman a826a88755 Factor out Instruction::isSafeToSpeculativelyExecute's code for
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer.  Teach it how to reason about GEPs
with simple non-zero indices.

Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.

llvm-svn: 118840
2010-11-11 21:23:25 +00:00
Dan Gohman dcdfd8dd24 TBAA-enable ArgumentPromotion.
llvm-svn: 118804
2010-11-11 18:09:32 +00:00
Dan Gohman 0cc4c7516e Make Sink tbaa-aware.
llvm-svn: 118788
2010-11-11 16:21:47 +00:00
Dan Gohman c3b4ea7b7d It's safe to sink some instructions which are not safe to speculatively
execute. Make Sink's predicate more precise.

llvm-svn: 118787
2010-11-11 16:20:28 +00:00
Dan Gohman 0a6021a54d Enhance GVN to do more precise alias queries for non-local memory
references. For example, this allows gvn to eliminate the load in
this example:

  void foo(int n, int* p, int *q) {
    p[0] = 0;
    p[1] = 1;
    if (n) {
      *q = p[0];
    }
  }

llvm-svn: 118714
2010-11-10 20:37:15 +00:00
Dan Gohman d209911642 Use getValueOperand() and getPointerOperand() on load and store
instructions instead of hard-coding operand numbers.

llvm-svn: 118698
2010-11-10 19:03:33 +00:00
Dan Gohman 066c1bb1e9 Add a doesAccessArgPointees helper function, and update code to use
it, and to be consistent.

llvm-svn: 118692
2010-11-10 18:17:28 +00:00
Dan Gohman 2577580967 Factor out the code for testing whether a function accesses
arbitrary memory into a helper function, and adjust some comments.

llvm-svn: 118687
2010-11-10 17:34:04 +00:00
Dale Johannesen 0171dc30ff When checking that the necessary bits are zero in
order to reduce ((x<<30)>>24) to x<<6, check the
correct bits.  PR 8547.

llvm-svn: 118665
2010-11-10 01:30:56 +00:00
Dan Gohman 2694e14087 Make ModRefBehavior a lattice. Use this to clean up AliasAnalysis
chaining and simplify FunctionAttrs' GetModRefBehavior logic.

llvm-svn: 118660
2010-11-10 01:02:18 +00:00
Dan Gohman e3467a7687 Teach FunctionAttrs about the VAArg instruction.
llvm-svn: 118627
2010-11-09 20:17:38 +00:00
Dan Gohman 35814e6128 Use the AliasAnalysis interface to determine how a Function accesses
memory. This isn't a real improvement with present day AliasAnalysis
implementations; it's mainly for consistency.

llvm-svn: 118624
2010-11-09 20:13:27 +00:00
Dan Gohman 0f17507478 Teach LICM and AliasSetTracker about AccessesArgumentsReadonly.
llvm-svn: 118618
2010-11-09 19:58:21 +00:00
Dan Gohman de52155685 Teach FunctionAttrs about AccessesArgumentsReadonly.
llvm-svn: 118617
2010-11-09 19:56:27 +00:00
Dan Gohman 470ade12e0 Fix a thinko that Duncan spotted.
llvm-svn: 118430
2010-11-08 19:24:47 +00:00
Dan Gohman 2cd1fd4a82 Make FunctionAttrs TBAA-aware.
llvm-svn: 118417
2010-11-08 17:12:04 +00:00
Dan Gohman 9130bad71f Extend the AliasAnalysis::pointsToConstantMemory interface to allow it
to optionally look for constant or local (alloca) memory.

Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.

Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.

llvm-svn: 118412
2010-11-08 16:45:26 +00:00
Dan Gohman 86449d705a Make FunctionAttrs use AliasAnalysis::getModRefBehavior, now that it
knows about intrinsic functions.

llvm-svn: 118410
2010-11-08 16:10:15 +00:00
Duncan Sands 9d1fe4c40d Rename PointsToLocalMemory to PointsToLocalOrConstantMemory to make
the code more self-documenting.

llvm-svn: 118171
2010-11-03 14:45:05 +00:00
Jakob Stoklund Olesen 31a7eb40c1 Let the -inline-threshold command line argument take precedence over the
threshold given to createFunctionInliningPass().

Both opt -O3 and clang would silently ignore the -inline-threshold option.

llvm-svn: 118117
2010-11-02 23:40:26 +00:00
Owen Anderson 6186c96765 When folding away a (shl (shr)) pair, we need to check that the bits that will BECOME the low
bits are zero, not that the current low bits are zero.  Fixes <rdar://problem/8606771>.

llvm-svn: 117953
2010-11-01 21:08:20 +00:00
Duncan Sands e659aba516 Now that the MallocInst no longer exists, this workaround for
it claiming not to have side-effects is no longer needed.

llvm-svn: 117789
2010-10-30 16:12:16 +00:00
Duncan Sands b8f3b14dfb If a function does a volatile load from a global constant, do not
consider it to be readonly.  In fact, don't even consider it to be
readonly if it does a volatile load from an AllocaInst either (it
is debatable as to whether readonly would be correct or not in this
case; play safe for the moment).  This fixes PR8279.

llvm-svn: 117783
2010-10-30 12:59:44 +00:00
Bob Wilson 67a6f32c59 Clean up indentation and other whitespace.
llvm-svn: 117728
2010-10-29 22:20:45 +00:00
Bob Wilson 8ecf98b04f Remove trailing whitespace.
llvm-svn: 117727
2010-10-29 22:20:43 +00:00