Commit Graph

4982 Commits

Author SHA1 Message Date
Ted Kremenek 20164dcc68 Unbreak CMake build.
llvm-svn: 126715
2011-02-28 23:56:33 +00:00
Chris Lattner 1ac5e0c5c6 update cmake
llvm-svn: 126694
2011-02-28 22:45:25 +00:00
Dan Gohman 06d70015ce Delete the GEPSplitter experiment.
llvm-svn: 126671
2011-02-28 19:47:47 +00:00
Dan Gohman b8a25f49f3 Delete the SimplifyHalfPowrLibCalls pass, which was unused, and
only existed as the result of a misunderstanding.

llvm-svn: 126669
2011-02-28 19:41:14 +00:00
Chris Lattner eddb33ebd0 wire TargetLibraryInfo into simplify libcalls and use it in a couple of
trivial places.  This pass needs a lot of work.

llvm-svn: 126367
2011-02-24 07:16:14 +00:00
Chris Lattner 2e56e20662 move a massive amount of code out into its own helper function
to reduce nesting.  This needs to be turned into a table.

llvm-svn: 126366
2011-02-24 07:12:12 +00:00
Cameron Zwarich 826308586c Make LoopDeletion work on loops with multiple edges, as long as the incoming
values from all of the loop's exiting blocks are equal. Patch by Andrew Clinton.

llvm-svn: 126253
2011-02-22 22:25:39 +00:00
Chris Lattner 2333ac279f fix a crasher in disabled code (on variable stride loops)
llvm-svn: 126125
2011-02-21 17:02:55 +00:00
Chris Lattner bc661d6686 Add some (disabled code) to print out negative strides.
llvm-svn: 126102
2011-02-21 02:08:54 +00:00
Chris Lattner 72a35fb974 rewrite the memset_pattern pattern generation stuff to accept any 2/4/8/16-byte
constant, including globals.  This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.

This enables us to handle stores of relocatable globals.  This kicks in about
48 times in 254.gap, giving us stuff like this:

@.memset_pattern40 = internal constant [2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*] [%struct.TypHeader* (%struct.TypHeader*, %struct
.TypHeader*)* @IsFalse, %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)* @IsFalse], align 16

...
  call void @memset_pattern16(i8* %scevgep5859, i8* bitcast ([2 x %struct.TypHeader* (%struct.TypHeader*, %struct.TypHeader*)*]* @.memset_pattern40 to i8*
), i64 %tmp75) nounwind

llvm-svn: 126044
2011-02-19 19:56:44 +00:00
Chris Lattner 0f4a64011e Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins).  This transforms lots of strided loop stores
of ints for example, like 5 in vpr:

  Formed memset:   call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
    from store to: {%3,+,4}<%11> at:   store i32 3, i32* %scevgep, align 4, !tbaa !4

llvm-svn: 126040
2011-02-19 19:31:39 +00:00
Chris Lattner e6b261fec5 Make loop-idiom use TargetLibraryInfo to determine whether it is allowed
to hack on memset, memcpy etc.

llvm-svn: 125974
2011-02-18 22:22:15 +00:00
Chris Lattner 1a924e770a prevent jump threading from merging blocks when their address is
taken (and used!).  This prevents merging the blocks (invalidating
the block addresses) in a case like this:

#define _THIS_IP_  ({ __label__ __here; __here: (unsigned long)&&__here; })

void foo() {
  printf("%p\n", _THIS_IP_);
  printf("%p\n", _THIS_IP_);
  printf("%p\n", _THIS_IP_);
}

which fixes PR4151.

llvm-svn: 125829
2011-02-18 04:43:06 +00:00
Chris Lattner 3eb0af94c4 fix PR9215, preventing -reassociate from clearing nsw/nuw when
it swaps the LHS/RHS of a single binop.

llvm-svn: 125700
2011-02-17 01:29:24 +00:00
Duncan Sands 75b5d27b84 Spelling fix: consequtive -> consecutive.
llvm-svn: 125563
2011-02-15 09:23:02 +00:00
Chris Lattner 69229316aa convert ConstantVector::get to use ArrayRef.
llvm-svn: 125537
2011-02-15 00:14:00 +00:00
Devang Patel 3058398655 Do not hoist @llvm.dbg.value. Here, @llvm.dbg.value is "referring" a value that is modified inside loop.
llvm-svn: 125529
2011-02-14 23:03:23 +00:00
Chris Lattner 34442e6ebf revert my ConstantVector patch, it seems to have made the llvm-gcc
builders unhappy.

llvm-svn: 125504
2011-02-14 18:15:46 +00:00
Chris Lattner d9f5b88548 Switch ConstantVector::get to use ArrayRef instead of a pointer+size
idiom.  Change various clients to simplify their code.

llvm-svn: 125487
2011-02-14 07:55:32 +00:00
Daniel Dunbar 210ce0feb5 SimplifyLibCalls: Add missing legalize check on various printf to puts and
putchar transforms, their return values are not compatible.

llvm-svn: 125442
2011-02-12 18:19:57 +00:00
Cameron Zwarich 99de19b3cb Make LoopUnswitch preserve ScalarEvolution by just forgetting everything about
a loop when unswitching it. It only does this in the complex case, because
everything should be fine already in the simple case.

llvm-svn: 125369
2011-02-11 06:08:28 +00:00
Cameron Zwarich 25cb63c791 LoopInstSimplify preserves ScalarEvolution.
llvm-svn: 125368
2011-02-11 06:08:25 +00:00
Cameron Zwarich 97dae4d361 If we can't avoid running loop-simplify twice for now, at least avoid running
iv-users twice.

llvm-svn: 125318
2011-02-10 23:53:14 +00:00
Eric Christopher da6bd45088 Revert this in an attempt to bring the builders back.
llvm-svn: 125257
2011-02-10 01:48:24 +00:00
Cameron Zwarich 58c8670ab2 Turn this pass ordering:
Natural Loop Information
 Loop Pass Manager
   Canonicalize natural loops
 Scalar Evolution Analysis
 Loop Pass Manager
   Induction Variable Users
   Canonicalize natural loops
   Induction Variable Users
   Loop Strength Reduction

into this:

Scalar Evolution Analysis
Loop Pass Manager
  Canonicalize natural loops
  Induction Variable Users
  Loop Strength Reduction

This fixes <rdar://problem/8869639>. I also filed PR9184 on doing this sort of
thing automatically, but it seems easier to just change the ordering of the
passes if this is the only case.

llvm-svn: 125254
2011-02-10 01:07:54 +00:00
Dan Gohman de7f699754 Don't split any loop backedges, including backedges of loops other than
the active loop. This is generally desirable, and it avoids trouble
in situations such as the testcase in PR9123, though the failure
mode depends on use-list order, so it is infeasible to test.

llvm-svn: 125065
2011-02-08 00:55:13 +00:00
Dan Gohman 08d2c98c23 Fix reassociate to clear optional flags, such as nsw.
llvm-svn: 124712
2011-02-02 02:02:34 +00:00
Francois Pichet 326e4a2966 Unbreak the MSVC build.
The DEBUG() call at line 606 demands to see raw_ostream's definition. I have no idea why this seems to only break MSVC.

llvm-svn: 124545
2011-01-29 20:06:16 +00:00
Evan Cheng 73c29178ac Add a test for TCE return duplication.
llvm-svn: 124527
2011-01-29 04:53:35 +00:00
Evan Cheng d983eba7dc Re-apply r124518 with fix. Watch out for invalidated iterator.
llvm-svn: 124526
2011-01-29 04:46:23 +00:00
Evan Cheng 65b8ccf6ac Revert r124518. It broke Linux self-host.
llvm-svn: 124522
2011-01-29 02:43:04 +00:00
Evan Cheng d4eff31476 Re-commit r124462 with fixes. Tail recursion elim will now dup ret into unconditional predecessor to enable TCE on demand.
llvm-svn: 124518
2011-01-29 01:29:26 +00:00
Duncan Sands 69bdb585b2 Fix PR9039, a use-after-free in reassociate. The issue was that the
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.

llvm-svn: 124287
2011-01-26 10:08:38 +00:00
Dan Gohman 0f124e1987 Give GetUnderlyingObject a TargetData, to keep it in sync
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.

Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.

llvm-svn: 124134
2011-01-24 18:53:32 +00:00
Chris Lattner d83e7b0ff6 enhance SRoA to promote allocas that are used by PHI nodes. This often
occurs because instcombine sinks loads and inserts phis.  This kicks in 
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.

This resolves the last of rdar://7339113.

llvm-svn: 124090
2011-01-24 01:07:11 +00:00
Chris Lattner a960725d18 Enhance SRoA to promote allocas that are used by selects in some
common cases.  This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this.  For example, code from the
STL ends up looking like this to SRoA:

  %202 = load i64* %__old_size, align 8, !tbaa !3
  %203 = load i64* %__old_size, align 8, !tbaa !3
  %204 = load i64* %__n, align 8, !tbaa !3
  %205 = icmp ult i64 %203, %204
  %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
  %206 = load i64* %storemerge.i, align 8, !tbaa !3

We can now promote both the __n and the __old_size allocas.

This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.

llvm-svn: 124088
2011-01-23 22:04:55 +00:00
Chris Lattner 9491dee24e Enhance SRoA to be more aggressive about scalarization of aggregate allocas
that have PHI or select uses of their element pointers.  This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.

With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted.  This is still a win for large aggregates where only one element
is used.  This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).

llvm-svn: 124070
2011-01-23 08:27:54 +00:00
Chris Lattner 8acbb79506 have AllocaInfo store the alloca being inspected, simplifying callers.
No functionality change.

llvm-svn: 124067
2011-01-23 07:29:29 +00:00
Chris Lattner 3e56c29068 Rearrange some code a bit. Change MarkUnsafe to
handle the "Transformation preventing inst" printing, 
so that -scalarrepl -debug will always print the rejected
instruction.  No functionality change.

llvm-svn: 124066
2011-01-23 07:05:44 +00:00
Chris Lattner a587ab7b94 remove an old hack that avoided creating MMX datatypes. The
X86 backend has been fixed.

llvm-svn: 124064
2011-01-23 06:40:33 +00:00
Dan Gohman 19e30d5a7d Actually check memcpy lengths, instead of just commenting about
how they should be checked.

llvm-svn: 123999
2011-01-21 22:07:57 +00:00
Nick Lewycky ae0275e018 SCCP doesn't actually preserve the CFG. It will delete and insert terminator
instructions.

llvm-svn: 123973
2011-01-21 08:38:09 +00:00
Chris Lattner 86d56c651d fix rdar://8878965, a regression I introduced with the recent
llvm.objectsize changes.

llvm-svn: 123771
2011-01-18 20:53:04 +00:00
Cameron Zwarich b703654edc Remove code for updating dominance frontiers and some outdated references to
dominance and post-dominance frontiers.

llvm-svn: 123725
2011-01-18 04:11:31 +00:00
Cameron Zwarich 4694e69540 Remove outdated references to dominance frontiers.
llvm-svn: 123724
2011-01-18 03:53:26 +00:00
Owen Anderson 459e079912 Remove dead code, that I apparently wrote a while back. We seem to be doing well enough
without whatever this was trying to do.  When/if someone has the time to do some empirical
evaluations, it might be worth it to figure out what this code was trying to do and see if
it's worth resurrecting/fixing.

llvm-svn: 123684
2011-01-17 22:39:54 +00:00
Cameron Zwarich b410858a5f Roll r123609 back in with two changes that fix test failures with expensive
checks enabled:

1) Use '<' to compare integers in a comparison function rather than '<='.

2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.

The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.

llvm-svn: 123662
2011-01-17 17:38:41 +00:00
Cameron Zwarich 67431d7943 Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot.
llvm-svn: 123618
2011-01-17 07:26:51 +00:00
Cameron Zwarich 814cd9233e Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.

Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.

llvm-svn: 123609
2011-01-17 01:08:59 +00:00
Chris Lattner 7c9f4c9c2b tidy up a comment, as suggested by duncan
llvm-svn: 123590
2011-01-16 17:46:19 +00:00
Chris Lattner ed1fb92cfe simplify a little
llvm-svn: 123573
2011-01-16 07:11:21 +00:00
Chris Lattner 6fab2e9418 if an alloca is only ever accessed as a unit, and is accessed with load/store instructions,
then don't try to decimate it into its individual pieces.  This will just make a mess of the
IR and is pointless if none of the elements are individually accessed.  This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.

The testcase now is optimized to:

define i64 @test2(i64 %X) {
  br label %L2

L2:                                               ; preds = %0
  ret i64 %X
}

before we generated:

define i64 @test2(i64 %X) {
  %sroa.store.elt = lshr i64 %X, 56
  %1 = trunc i64 %sroa.store.elt to i8
  %sroa.store.elt8 = lshr i64 %X, 48
  %2 = trunc i64 %sroa.store.elt8 to i8
  %sroa.store.elt9 = lshr i64 %X, 40
  %3 = trunc i64 %sroa.store.elt9 to i8
  %sroa.store.elt10 = lshr i64 %X, 32
  %4 = trunc i64 %sroa.store.elt10 to i8
  %sroa.store.elt11 = lshr i64 %X, 24
  %5 = trunc i64 %sroa.store.elt11 to i8
  %sroa.store.elt12 = lshr i64 %X, 16
  %6 = trunc i64 %sroa.store.elt12 to i8
  %sroa.store.elt13 = lshr i64 %X, 8
  %7 = trunc i64 %sroa.store.elt13 to i8
  %8 = trunc i64 %X to i8
  br label %L2

L2:                                               ; preds = %0
  %9 = zext i8 %1 to i64
  %10 = shl i64 %9, 56
  %11 = zext i8 %2 to i64
  %12 = shl i64 %11, 48
  %13 = or i64 %12, %10
  %14 = zext i8 %3 to i64
  %15 = shl i64 %14, 40
  %16 = or i64 %15, %13
  %17 = zext i8 %4 to i64
  %18 = shl i64 %17, 32
  %19 = or i64 %18, %16
  %20 = zext i8 %5 to i64
  %21 = shl i64 %20, 24
  %22 = or i64 %21, %19
  %23 = zext i8 %6 to i64
  %24 = shl i64 %23, 16
  %25 = or i64 %24, %22
  %26 = zext i8 %7 to i64
  %27 = shl i64 %26, 8
  %28 = or i64 %27, %25
  %29 = zext i8 %8 to i64
  %30 = or i64 %29, %28
  ret i64 %30
}

In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off.  It's better to not generate this stuff
in the first place.

llvm-svn: 123571
2011-01-16 06:18:28 +00:00
Chris Lattner 7cd8cf7d24 Use an irbuilder to get some trivial constant folding when doing a store
of a constant.

llvm-svn: 123570
2011-01-16 05:58:24 +00:00
Chris Lattner d55581ded8 enhance FoldOpIntoPhi in instcombine to try harder when a phi has
multiple uses.  In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi.  In the testcase
this pushes an add out of the loop.

llvm-svn: 123568
2011-01-16 05:28:59 +00:00
Chris Lattner af26390790 temporarily revert r123526. While working on a follow-on patch I
realize that ConstantFoldTerminator doesn't preserve dominfo.

llvm-svn: 123527
2011-01-15 07:51:19 +00:00
Chris Lattner 8df83c4a24 fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea.  Just fold branches on constants
into unconditional branches.

llvm-svn: 123526
2011-01-15 07:36:13 +00:00
Chris Lattner ee588defc6 simplify code, no functionality change.
llvm-svn: 123525
2011-01-15 07:29:01 +00:00
Chris Lattner 1b93be501d Now that instruction optzns can update the iterator as they go, we can
have objectsize folding recursively simplify away their result when it
folds.  It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.

llvm-svn: 123524
2011-01-15 07:25:29 +00:00
Chris Lattner 7a2771440f make the current instruction iterator an ivar, allowing xforms that
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.

llvm-svn: 123523
2011-01-15 07:14:54 +00:00
Chris Lattner b68ec5c339 Generalize LoadAndStorePromoter a bit and switch LICM
to use it.

llvm-svn: 123501
2011-01-15 00:12:35 +00:00
Chris Lattner b498f9aff3 switch SRoA to use LoadAndStorePromoter instead of its own copy of the code.
llvm-svn: 123457
2011-01-14 19:50:47 +00:00
Chris Lattner 9987a6f49b split SROA into two passes: one that uses DomFrontiers (-scalarrepl)
and one that uses SSAUpdater (-scalarrepl-ssa)

llvm-svn: 123436
2011-01-14 08:13:00 +00:00
Chris Lattner 543384efb4 Implement full support for promoting allocas to registers using SSAUpdater
instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.

When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.

llvm-svn: 123434
2011-01-14 07:50:47 +00:00
Bob Wilson 328e91bbe1 Fix whitespace.
llvm-svn: 123396
2011-01-13 20:59:44 +00:00
Bob Wilson c8056a952e Check for empty structs, and for consistency, zero-element arrays.
llvm-svn: 123383
2011-01-13 18:26:59 +00:00
Bob Wilson 08713d3c5f Extend SROA to handle arrays accessed as homogeneous structs and vice versa.
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations.  Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type.  The corresponding return
types declared in the arm_neon.h header have equivalent arrays.  We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type.  SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array.  So, all that needs to be done is to check
for compatible arrays and homogeneous structs.

llvm-svn: 123381
2011-01-13 17:45:11 +00:00
Bob Wilson 12eec40c83 Make SROA more aggressive with allocas containing padding.
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.

llvm-svn: 123380
2011-01-13 17:45:08 +00:00
Devang Patel 30f3ebbc1f Use SmallVector instead of SmallPtrSet and avoid non-deterministic behavior.
llvm-svn: 123318
2011-01-12 19:12:45 +00:00
Chris Lattner dd5f60b7a7 revert 123144, reenabling the rest of memset formation.
llvm-svn: 123302
2011-01-12 03:25:15 +00:00
Chris Lattner 654098f411 revert r123146 which disabled code that wasn't the root cause
of the bootstrap miscompare issue.

llvm-svn: 123299
2011-01-12 01:52:23 +00:00
Chris Lattner fa7c29d255 revert r123149, reenabling an improvement to memcpyopt that wasn't
the source of the bootstrap problem.

llvm-svn: 123298
2011-01-12 01:43:46 +00:00
Jakob Stoklund Olesen 12cc296bd4 Remove the PR8954 workaround.
llvm-svn: 123288
2011-01-11 22:56:41 +00:00
Cameron Zwarich cb9c4f85ec Dial back the speculative fix for PR8954 a bit, so that we only recompute dominators
once at the beginning of GVN instead of once per iteration.

llvm-svn: 123278
2011-01-11 22:14:42 +00:00
Cameron Zwarich 51eb403907 Attempt to fix the bootstrap buildbot. Rafael says this works for him on x86-64 Linux.
llvm-svn: 123270
2011-01-11 20:23:34 +00:00
Chris Lattner 193ce7c4d1 update memdep when an instruction is deleted. This code isn't
actually reached in the testcase in PR8954, but it's safe and good
practice.

llvm-svn: 123224
2011-01-11 08:19:16 +00:00
Chris Lattner f6ae904e34 Fix FoldSingleEntryPHINodes to update memdep and AA when it deletes
phi nodes.  It is called from MergeBlockIntoPredecessor which is 
called from GVN, which claims to preserve these.

I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.

llvm-svn: 123222
2011-01-11 08:13:40 +00:00
Chris Lattner dfcfcb49fa random cleanups
llvm-svn: 123221
2011-01-11 08:00:40 +00:00
Chris Lattner 63fe78de68 remove a bogus assertion: the latch block of a loop is not
neccesarily an uncond branch to the header.  This fixes 
PR8955 (the assertion tripping).

llvm-svn: 123219
2011-01-11 07:47:59 +00:00
Chris Lattner 88bc848ab6 another random stab in the dark trying to fix llvm-gcc-i386-linux-selfhost
llvm-svn: 123149
2011-01-10 02:34:11 +00:00
Chris Lattner 4662bd4b13 another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost
back to life.

llvm-svn: 123146
2011-01-10 00:47:34 +00:00
Chris Lattner 1017fa6746 temporarily disable memset formation from memsets in an effort to restore buildbot stability.
llvm-svn: 123144
2011-01-09 23:52:48 +00:00
Chris Lattner caf5c0d037 fix a few old bugs (found by inspection) where we would zap instructions
without informing memdep.  This could cause nondeterminstic weirdness 
based on where instructions happen to get allocated, and will hopefully
breath some life into some broken testers.

llvm-svn: 123124
2011-01-09 19:26:10 +00:00
Cameron Zwarich a42e5915bf LoopInstSimplify preserves LoopSimplify.
llvm-svn: 123117
2011-01-09 12:35:16 +00:00
Chris Lattner a337f5ec5c reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec's
that have the bit set.

llvm-svn: 123104
2011-01-09 02:16:18 +00:00
Chris Lattner 7d6433ae76 fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't
updating memdep when fusing stores together.  This fixes the crash optimizing
the bullet benchmark.

llvm-svn: 123091
2011-01-08 22:19:21 +00:00
Chris Lattner ff6ed2ac5f tryMergingIntoMemset can only handle constant length memsets.
llvm-svn: 123090
2011-01-08 22:11:56 +00:00
Chris Lattner 9a1d63ba9f Merge memsets followed by neighboring memsets and other stores into
larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089
2011-01-08 21:19:19 +00:00
Chris Lattner 5120ebf184 fix an issue in IsPointerOffset that prevented us from recognizing that
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
2011-01-08 21:07:56 +00:00
Chris Lattner 4dc1fd938f enhance memcpyopt to merge a store and a subsequent
memset into a single larger memset.

llvm-svn: 123086
2011-01-08 20:54:51 +00:00
Chris Lattner c638147e9f constify TargetData references.
Split memset formation logic out into its own
"tryMergingIntoMemset" helper function.

llvm-svn: 123081
2011-01-08 20:24:01 +00:00
Chris Lattner 59c82f850d When loop rotation happens, it is *very* common for the duplicated condbr
to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079
2011-01-08 19:59:06 +00:00
Chris Lattner 30f318e5d1 split ssa updating code out to its own helper function. Don't bother
moving the OrigHeader block anymore: we just merge it away anyway so
its code layout doesn't matter.

llvm-svn: 123077
2011-01-08 19:26:33 +00:00
Chris Lattner 2615130e1d Implement a TODO: Enhance loopinfo to merge away the unconditional branch
that it was leaving in loops after rotation (between the original latch
block and the original header.

With this change, it is possible for rotated loops to have just a single
basic block, which is useful.

llvm-svn: 123075
2011-01-08 19:10:28 +00:00
Chris Lattner fee37c5fa3 inline preserveCanonicalLoopForm now that it is simple.
llvm-svn: 123073
2011-01-08 18:55:50 +00:00
Chris Lattner 063dca0f6a Three major changes:
1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072
2011-01-08 18:52:51 +00:00
Chris Lattner 7fab23bc1d LoopRotate requires canonical loop form, so it always has preheaders
and latch blocks.  Reorder entry conditions to make hte pass faster
and more logical.

llvm-svn: 123069
2011-01-08 18:06:22 +00:00
Chris Lattner d62691f4e8 use the LI ivar.
llvm-svn: 123068
2011-01-08 17:49:51 +00:00
Chris Lattner 385f2ec6d8 some cleanups: remove dead arguments and eliminate ivars
that are just passed to one function.

llvm-svn: 123067
2011-01-08 17:48:33 +00:00
Chris Lattner 25ba40a0cc fix an issue duncan pointed out, which could cause loop rotate
to violate LCSSA form

llvm-svn: 123066
2011-01-08 17:38:45 +00:00
Cameron Zwarich b4ab257bcc Fix coding style issues.
llvm-svn: 123065
2011-01-08 17:07:11 +00:00
Cameron Zwarich 84986b298a Make more passes preserve dominators (or state that they preserve dominators if
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.

The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.

llvm-svn: 123064
2011-01-08 17:01:52 +00:00
Cameron Zwarich 80bd9af7c5 Contract subloop bodies. However, it is still important to visit the phis at the
top of subloop headers, as the phi uses logically occur outside of the subloop.

llvm-svn: 123062
2011-01-08 15:52:22 +00:00
Chris Lattner 8c5defd0b0 Have loop-rotate simplify instructions (yay instsimplify!) as it clones
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations 
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060
2011-01-08 08:24:46 +00:00
Chris Lattner 43f8d16482 Revamp the ValueMapper interfaces in a couple ways:
1. Take a flags argument instead of a bool.  This makes
   it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
   map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
   more efficient.  For lookup failures, don't drop null values
   into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
   and LoopUnswitch, kill it.

No functionality change.

llvm-svn: 123058
2011-01-08 08:15:20 +00:00
Chris Lattner 2b3f20e6ec two minor changes: switch to the standard ValueToValueMapTy
map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.

llvm-svn: 123057
2011-01-08 07:21:31 +00:00
Cameron Zwarich 9ec19ea06a Add the CallInst optimizations that don't involve expanding inline assembly to
OptimizeInst() so that they can be used on a worklist instruction.

llvm-svn: 122945
2011-01-06 02:56:42 +00:00
Cameron Zwarich d28c78eb4f Move the GEP handling in CodeGenPrepare to OptimizeInst().
llvm-svn: 122944
2011-01-06 02:44:52 +00:00
Cameron Zwarich 14ac865ca9 Split the optimizations in CodeGenPrepare that don't manipulate the iterators
into a separate function, so that it can be called from a loop using a worklist
rather than a loop traversing a whole basic block.

llvm-svn: 122943
2011-01-06 02:37:26 +00:00
Jakob Stoklund Olesen 70be93a200 Zap the last two -Wself-assign warnings in llvm.
Simplify RALinScan::DowngradeRegister with TRI::getOverlaps while we are there.

llvm-svn: 122940
2011-01-06 01:33:22 +00:00
Cameron Zwarich ce3b930a98 Stop reallocating SunkAddrs for each basic block. When we move to an instruction
worklist, the key will need to become std::pair<BasicBlock*, Value*>.

llvm-svn: 122932
2011-01-06 00:42:50 +00:00
Cameron Zwarich b62ccb241b Add some more statistics to CodeGenPrepare.
llvm-svn: 122891
2011-01-05 17:47:38 +00:00
Cameron Zwarich ced753fadf Add some stats to CodeGenPrepare to make it easier to speed it up without
regressing code quality.

llvm-svn: 122887
2011-01-05 17:27:27 +00:00
Cameron Zwarich 6a78995369 Use pop_back_val instead of back followed by pop_back.
llvm-svn: 122876
2011-01-05 16:08:47 +00:00
Cameron Zwarich 5a2bb998ac Use a worklist for later iterations just like ordinary instsimplify. The next
step is to only process instructions in subloops if they have been modified by
an earlier simplification.

llvm-svn: 122869
2011-01-05 05:47:47 +00:00
Cameron Zwarich 4c51d122d5 Change LoopInstSimplify back to a LoopPass. It revisits subloops rather than
skipping them, but it should probably use a worklist and only revisit those
instructions in subloops that have actually changed. It should probably also
use a worklist after the first iteration like instsimplify now does. Regardless,
it's only 0.3% of opt -O2 time on 403.gcc if it replaces the instcombine placed
in the middle of the loop passes.

llvm-svn: 122868
2011-01-05 05:15:53 +00:00
Owen Anderson 7b25ff04bd Don't bother value numbering instructions with void types in GVN. In theory this should allow us to insert
fewer things into the value numbering maps, but any speedup is beneath the noise threshold on my machine
on 403.gcc.

llvm-svn: 122844
2011-01-04 22:15:21 +00:00
Owen Anderson e39cb57b09 Complete the NumberTable --> LeaderTable rename.
llvm-svn: 122828
2011-01-04 19:29:46 +00:00
Owen Anderson d7d06d3aaf Fix typo in a comment.
llvm-svn: 122827
2011-01-04 19:25:18 +00:00
Owen Anderson 51489b3b28 Prune #include's.
llvm-svn: 122826
2011-01-04 19:24:57 +00:00
Owen Anderson c7c3bc63f7 Clarify terminology, settling on referring to what was the "number table" as the "leader table", and
rename methods to make it much more clear what they're doing.

llvm-svn: 122823
2011-01-04 19:13:25 +00:00
Owen Anderson 83546f2fe0 When removing a value from GVN's leaders list, don't drop the Next pointer in a corner case.
llvm-svn: 122822
2011-01-04 19:10:54 +00:00
Owen Anderson 41a1550ef5 Branch instructions don't produce values, so there's no need to generate a value number for them. This
avoids adding them to the various value numbering tables, resulting in a minor (~3%) speedup for GVN
on 40.gcc.

llvm-svn: 122819
2011-01-04 18:54:18 +00:00
Owen Anderson 22c53e277a Remove commented out code.
llvm-svn: 122817
2011-01-04 18:22:08 +00:00
Cameron Zwarich b2a41e9388 Switch to the new style of asterisk placement.
llvm-svn: 122815
2011-01-04 18:19:19 +00:00
Chris Lattner 8643810ede Teach loop-idiom to turn a loop containing a memset into a larger memset
when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806
2011-01-04 07:46:33 +00:00
Chris Lattner a62b01dc37 restructure this a bit. Initialize the WeakVH with "I", the
instruction *after* the store.  The store will always be deleted
if the transformation kicks in, so we'd do an N^2 scan of every
loop block.  Whoops.

llvm-svn: 122805
2011-01-04 07:27:30 +00:00
Cameron Zwarich f4e13699e7 Avoid finding loop back edges when we are not splitting critical edges in
CodeGenPrepare (which is the default behavior).

llvm-svn: 122801
2011-01-04 04:43:31 +00:00
Cameron Zwarich e924969380 Address most of Duncan's review comments. Also, make LoopInstSimplify a simple
FunctionPass. It probably doesn't have a reason to be a LoopPass, as it will
probably drop the simple fixed point and either use RPO iteration or Duncan's
approach in instsimplify of only revisiting instructions that have changed.

The next step is to preserve LoopSimplify. This looks like it won't be too hard,
although the pass manager doesn't actually seem to respect when non-loop passes
claim to preserve LCSSA or LoopSimplify. This will have to be fixed.

llvm-svn: 122791
2011-01-04 00:12:46 +00:00
Chris Lattner 0ba473c218 use the very-handy getTruncateOrZeroExtend helper function, and
stop setting NSW: signed overflow is possible.  Thanks to Dan
for pointing these out.

llvm-svn: 122790
2011-01-04 00:06:55 +00:00
Owen Anderson 0839d3930a Fix comment.
llvm-svn: 122788
2011-01-03 23:51:56 +00:00
Owen Anderson d62d37225a Use the new addEscapingValue callback to update GlobalsModRef when GVN adds PHIs of GEPs. For the moment,
have GlobalsModRef handle this conservatively by simply removing the value from its maps.

llvm-svn: 122787
2011-01-03 23:51:43 +00:00
Chris Lattner bde6ec1db6 Duncan deftly points out that readnone functions aren't
invalidated by stores, so they can be handled as 'simple'
operations.

llvm-svn: 122785
2011-01-03 23:38:13 +00:00
Owen Anderson 3a33d0cc4a Simplify GVN's value expression structure, allowing the elimination of a lot of
almost-but-not-quite-identical code.  No intended functionality change.

llvm-svn: 122760
2011-01-03 19:00:11 +00:00
Chris Lattner 16ca19ffc5 stength reduce my previous patch a bit. The only instructions
that are allowed to have metadata operands are intrinsic calls,
and the only ones that take metadata currently return void.
Just reject all void instructions, which should not be value
numbered anyway.  To future proof things, add an assert to the
getHashValue impl for calls to check that metadata operands 
aren't present.

llvm-svn: 122759
2011-01-03 18:43:03 +00:00
Chris Lattner 142f1cd251 fix PR8895: metadata operands don't have a strong use of their
nested values, so they can change and drop to null, which can
change the hash and cause havok.

It turns out that it isn't a good idea to value number stuff
with metadata operands anyway, so... don't.

llvm-svn: 122758
2011-01-03 18:28:15 +00:00
Cameron Zwarich 43cecb1200 Switch a worklist in CodeGenPrepare to SmallVector and increase the inline
capacity on the Visited SmallPtrSet. On 403.gcc, this is about a 4.5% speedup of
CodeGenPrepare time (which itself is 10% of time spent in the backend).

This is progress towards PR8889.

llvm-svn: 122741
2011-01-03 06:33:01 +00:00
Chris Lattner 9e5e9ed79a earlycse can do trivial with-a-block dead store
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
2011-01-03 04:17:24 +00:00
Chris Lattner 4b9a525742 switch the load table to use a recycling bump pointer allocator,
speeding earlycse up by 6%.

llvm-svn: 122733
2011-01-03 03:53:50 +00:00
Chris Lattner e0e32a9ef0 now that loads are in their own table, we can implement
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
2011-01-03 03:46:34 +00:00
Chris Lattner 92bb0f9f9d split loads and calls into separate tables. Loads are now just indexed
by their pointer instead of using MemoryValue to wrap it.

llvm-svn: 122731
2011-01-03 03:41:27 +00:00
Chris Lattner 4cb365414f various cleanups, no functionality change.
llvm-svn: 122729
2011-01-03 03:28:23 +00:00
Chris Lattner b9a8efc960 Teach EarlyCSE to do trivial CSE of loads and read-only calls.
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
2011-01-03 03:18:43 +00:00
Chris Lattner 79d83067ee rename InstValue to SimpleValue, add some comments.
llvm-svn: 122725
2011-01-03 02:20:48 +00:00
Michael J. Spencer edb5bcdde5 CMake: Add missing source file.
llvm-svn: 122724
2011-01-03 02:13:05 +00:00
Chris Lattner d815f69b30 Allocate nodes for the scoped hash table from a recyling bump pointer
allocator.  This speeds up early cse by about 20%

llvm-svn: 122723
2011-01-03 01:42:46 +00:00
Chris Lattner 02a9776b64 reduce redundancy in the hashing code and other misc cleanups.
llvm-svn: 122720
2011-01-03 01:10:08 +00:00
Cameron Zwarich cab9a0abab Add a new loop-instsimplify pass, with the intention of replacing the instance
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.

llvm-svn: 122719
2011-01-03 00:25:16 +00:00
Chris Lattner 0844c76f9a fix some pastos
llvm-svn: 122718
2011-01-02 23:29:58 +00:00
Chris Lattner 8fac5db251 add DEBUG and -stats output to earlycse.
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
2011-01-02 23:19:45 +00:00
Chris Lattner 18ae5436b1 Enhance earlycse to do CSE of casts, instsimplify and die.
Add a testcase.

llvm-svn: 122715
2011-01-02 23:04:14 +00:00
Chris Lattner bf0aa927cc split dom frontier handling stuff out to its own DominanceFrontier header,
so that Dominators.h is *just* domtree.  Also prune #includes a bit.

llvm-svn: 122714
2011-01-02 22:09:33 +00:00
Chris Lattner 704541bb23 sketch out a new early cse pass. No functionality yet.
llvm-svn: 122713
2011-01-02 21:47:05 +00:00
Chris Lattner 9c69406f2b fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712
2011-01-02 21:14:18 +00:00
Chris Lattner 5702a43c09 If a loop iterates exactly once (has backedge count = 0) then don't
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
2011-01-02 20:24:21 +00:00
Chris Lattner 8455b6e45e enhance loop idiom recognition to scan *all* unconditionally executed
blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704
2011-01-02 19:01:03 +00:00
Chris Lattner 0cdc6f62a5 make inSubLoop much more efficient.
llvm-svn: 122703
2011-01-02 18:53:08 +00:00
Chris Lattner 27497ece96 rip out isExitBlockDominatedByBlockInLoop, calling DomTree::dominates instead.
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was 
*just* a tree and didn't have DFS numbers.  Checking DFS numbers is faster
and easier than "limiting the search of the tree".

llvm-svn: 122702
2011-01-02 18:45:39 +00:00
Chris Lattner 0469e01c02 add a list of opportunities for future improvement.
llvm-svn: 122701
2011-01-02 18:32:09 +00:00
Chris Lattner ddf58010bd Allow loop-idiom to run on multiple BB loops, but still only scan the loop
header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685
2011-01-02 07:58:36 +00:00
Chris Lattner 5b5a043d82 remove debugging code.
llvm-svn: 122683
2011-01-02 07:37:13 +00:00
Chris Lattner 12f91befce add some -stats output.
llvm-svn: 122682
2011-01-02 07:36:44 +00:00
Chris Lattner 679572e584 improve loop rotation to use CodeMetrics to analyze the
size of a loop header instead of its own code size estimator.
This allows it to handle bitcasts etc more precisely.

llvm-svn: 122681
2011-01-02 07:35:53 +00:00
Chris Lattner 85b6d81d41 teach loop idiom recognition to form memcpy's from simple loops.
llvm-svn: 122678
2011-01-02 03:37:56 +00:00
Chris Lattner a3514441e0 add a validity check that was missed, fixing a crash on the
new testcase.

llvm-svn: 122662
2011-01-01 20:12:04 +00:00
Chris Lattner 91a4435875 improve validity check to handle constant-trip-count loops more
aggressively.  In practice, this doesn't help anything though,
see the todo.

llvm-svn: 122660
2011-01-01 19:54:22 +00:00
Chris Lattner 8b3baf6d75 implement the "no aliasing accesses in loop" safety check. This pass
should be correct now.

llvm-svn: 122659
2011-01-01 19:39:01 +00:00
Chris Lattner 65a699d4d0 simplify this, isBytewiseValue handles the extra check. We still
check for "multiple of a byte" in size to make it clear that the
>> 3 below is safe.

llvm-svn: 122604
2010-12-28 18:53:48 +00:00
Duncan Sands 5cf10e691b Silence gcc warning about an unused variable when doing a release build.
llvm-svn: 122593
2010-12-28 09:41:15 +00:00
Chris Lattner cb18bfa3d2 fix some issues Frits noticed, add AliasAnalysis as a dependency
llvm-svn: 122585
2010-12-27 18:39:08 +00:00
Benjamin Kramer 7cba269dfb SimplifyLibCalls: Use IRBuilder to simplify code.
llvm-svn: 122575
2010-12-27 00:16:46 +00:00
Chris Lattner b9fe685b9a have loop-idiom nuke instructions that feed stores that get removed.
llvm-svn: 122574
2010-12-27 00:03:23 +00:00
Chris Lattner 29e14edc8d implement enough of the memset inference algorithm to recognize and insert
memsets.  This is still missing one important validity check, but this is enough
to compile stuff like this:

void test0(std::vector<char> &X) {
  for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
    *I = 0;
}

void test1(std::vector<int> &X) {
  for (long i = 0, e = X.size(); i != e; ++i)
    X[i] = 0x01010101;
}

With:
 $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 

to:

__Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rsi
	cmpq	%rsi, %rax
	je	LBB0_2
## BB#1:                                ## %bb.nph
	subq	%rax, %rsi
	movq	%rax, %rdi
	callq	___bzero
LBB0_2:                                 ## %for.end
	addq	$8, %rsp
	ret
...
__Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rdx
	subq	%rax, %rdx
	cmpq	$4, %rdx
	jb	LBB1_2
## BB#1:                                ## %for.body.preheader
	andq	$-4, %rdx
	movl	$1, %esi
	movq	%rax, %rdi
	callq	_memset
LBB1_2:                                 ## %for.end
	addq	$8, %rsp
	ret

llvm-svn: 122573
2010-12-26 23:42:51 +00:00
Chris Lattner 6cf8d6cc6e start using irbuilder to make mem intrinsics in a few passes.
llvm-svn: 122572
2010-12-26 22:57:41 +00:00
Chris Lattner 7c5f9c35d1 sketch more of this out.
llvm-svn: 122567
2010-12-26 20:45:45 +00:00
Chris Lattner 9cb1035f94 move isBytewiseValue out to ValueTracking.h/cpp
llvm-svn: 122565
2010-12-26 20:15:01 +00:00
Chris Lattner 81ae3f299a actually add the file...
llvm-svn: 122563
2010-12-26 19:39:38 +00:00
Chris Lattner 2ef535a4e4 Start of a pass for recognizing memset and memcpy idioms.
No functionality yet.

llvm-svn: 122562
2010-12-26 19:32:44 +00:00
Benjamin Kramer 30342fb1fd Simplify code.
llvm-svn: 122561
2010-12-26 15:23:45 +00:00
Benjamin Kramer b90b2f0635 Fix a thinko pointed out by Frits van Bommel: looking through global variables in isBytewiseValue is not safe.
llvm-svn: 122550
2010-12-24 22:23:59 +00:00
Benjamin Kramer ea9152e551 MemCpyOpt: Turn memcpys from a constant into a memset if possible.
This allows us to compile "int cst[] = {-1, -1, -1};" into
  movl  $-1, 16(%rsp)
  movq  $-1, 8(%rsp)
instead of
  movl  _cst+8(%rip), %eax
  movl  %eax, 16(%rsp)
  movq  _cst(%rip), %rax
  movq  %rax, 8(%rsp)

llvm-svn: 122548
2010-12-24 21:17:12 +00:00
Owen Anderson 5d690d4168 It is possible for SimplifyCFG to cause PHI nodes to become redundant too late in the optimization
pipeline to be caught by instcombine, and it's not feasible to catch them in SimplifyCFG because the
use-lists are in an inconsistent state at the point where it could know that it need to simplify them.
Instead, have CodeGenPrepare look for trivially redundant PHIs as part of its general cleanup effort.

llvm-svn: 122516
2010-12-23 20:57:35 +00:00
Mon P Wang 18b762a946 Preserve the address space when generating bitcasts for MemTransferInst in ConvertToScalarInfo
llvm-svn: 122462
2010-12-23 01:41:32 +00:00
Jeffrey Yasskin 9b43f33620 Change all self assignments X=X to (void)X, so that we can turn on a
new gcc warning that complains on self-assignments and
self-initializations.

llvm-svn: 122458
2010-12-23 00:58:24 +00:00
Owen Anderson 5ab8d4b5e5 Give GVN back the ability to perform simple conditional propagation on conditional branch values.
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.

llvm-svn: 122378
2010-12-21 23:54:34 +00:00
Owen Anderson 12470778d7 Remove dead code.
llvm-svn: 122371
2010-12-21 22:31:24 +00:00
Benjamin Kramer 43493c089f GVN's Expression is not POD-like (it contains a SmallVector). Simplify code while at it.
llvm-svn: 122362
2010-12-21 21:30:19 +00:00
Chris Lattner b6252a376a tidy up
llvm-svn: 122190
2010-12-19 20:24:28 +00:00
Chris Lattner 408a684d29 Enhance LICM to promote alias sets whose pointers themselves are stored,
which doesn't affect the memory address being promoted.

llvm-svn: 122172
2010-12-19 05:57:25 +00:00
Chris Lattner 3337a81450 fix PR8602, a bug in an assertion: a volatile store *of* a pointer
does not make the alias set for that pointer volatile, just stores
*to* the pointer.

llvm-svn: 122171
2010-12-19 05:51:54 +00:00
Chris Lattner fb888622c3 revert r122164, I'm going to go with a different approach.
llvm-svn: 122168
2010-12-19 04:23:03 +00:00
Chris Lattner 583ec6fa44 first step to fixing PR8642: don't fold away empty basic blocks
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.

llvm-svn: 122164
2010-12-19 03:02:34 +00:00
Dan Gohman 93dc2b808f Revert r64460. strtol and friends cannot be marked readonly, even with
a null endptr argument, because they may write to errno.

This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.

llvm-svn: 122014
2010-12-17 01:09:43 +00:00
Frits van Bommel 9bbe849fc3 Fix a bug in the loop in JumpThreading::ProcessThreadableEdges() where it could falsely produce a MultipleDestSentinel value if the first predecessor ended with an 'indirectbr'. If that happened, it caused an unnecessary FindMostPopularDest() call.
This wasn't a correctness problem, but it broke the fast path for single-predecessor blocks.

llvm-svn: 121966
2010-12-16 12:16:00 +00:00
Dan Gohman e1a17a3473 Make memcpyopt TBAA-aware.
llvm-svn: 121944
2010-12-16 02:51:19 +00:00
Dan Gohman 4467aa5294 Preserve TBAA tags when doing load PRE.
llvm-svn: 121921
2010-12-15 23:53:55 +00:00
Dan Gohman a4fcd2418d Move Value::getUnderlyingObject to be a standalone
function so that it can live in Analysis instead of
VMCore.

llvm-svn: 121885
2010-12-15 20:02:24 +00:00
Frits van Bommel 3d1803495e Teach jump threading to "look through" a select when the branch direction of a terminator depends on it.
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.

llvm-svn: 121859
2010-12-15 09:51:20 +00:00
Owen Anderson 35609d97ae Fix PR8790, another instance where unreachable code can cause instruction simplification to fail,
this case involve a select that simplifies to itself.

llvm-svn: 121817
2010-12-15 00:55:35 +00:00
Owen Anderson 15c85c916f Cleanup trailing whitespace.
llvm-svn: 121816
2010-12-15 00:52:44 +00:00
Chris Lattner 73a58627c3 simplify code and reduce indentation
llvm-svn: 121670
2010-12-13 02:38:13 +00:00
Chris Lattner bc4457e317 enhance memcpyopt to zap memcpy's that have the same src/dst.
llvm-svn: 121362
2010-12-09 07:45:45 +00:00
Chris Lattner fd51c52ef6 fix PR8753, eliminating a case where we'd infinitely make a
substitution because it doesn't actually change the IR.  Patch by
Jakub Staszak!

llvm-svn: 121361
2010-12-09 07:39:50 +00:00
Frits van Bommel d2f4b09e10 Remove some dead code from the jump threading pass.
The last uses of these functions were removed in r113852 when LazyValueInfo was permanently enabled and removed the need for them.

llvm-svn: 121133
2010-12-07 13:08:07 +00:00
Jay Foad 583abbc4df PR5207: Change APInt methods trunc(), sext(), zext(), sextOrTrunc() and
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.

llvm-svn: 121120
2010-12-07 08:25:19 +00:00
Frits van Bommel d9df6eaa9c Implement jump threading of 'indirectbr' by keeping track of whether we're looking for ConstantInt*s or BlockAddress*s.
llvm-svn: 121066
2010-12-06 23:36:56 +00:00
Chris Lattner 4dc53e37d9 Use a stronger predicate here, pointed out by Duncan
llvm-svn: 121040
2010-12-06 21:48:10 +00:00
Chris Lattner ca335e38cf add some DEBUG statements.
llvm-svn: 121038
2010-12-06 21:13:51 +00:00
Chris Lattner 94fbdf3814 Fix PR8728, a miscompilation I recently introduced. When optimizing
memcpy's like:
  memcpy(A, B)
  memcpy(A, C)

we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:

  memcpy(A, B)
  memcpy(A, A)

which is not correct to transform into:

  memcpy(A, A)

This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!

llvm-svn: 120974
2010-12-06 01:48:06 +00:00
Frits van Bommel 76244867cf Refactor jump threading.
Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output.
Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls.

llvm-svn: 120946
2010-12-05 19:06:41 +00:00
Frits van Bommel 5e75ef4a8e Remove trailing whitespace.
llvm-svn: 120945
2010-12-05 19:02:47 +00:00
Chris Lattner 1c577b54b0 fix a bozo bug I introduced in r119930, causing a miscompile of
20040709-1.c from the gcc testsuite.  I was using the size of a
pointer instead of the pointee.  This fixes rdar://8713376

llvm-svn: 120519
2010-12-01 01:24:55 +00:00
Chris Lattner 903add84d9 Enhance DSE to handle the variable index case in PR8657.
llvm-svn: 120498
2010-11-30 23:43:23 +00:00
Chris Lattner c0f3379ae0 teach DSE to use GetPointerBaseWithConstantOffset to analyze
may-aliasing stores that partially overlap with different base
pointers.  This implements PR6043 and the non-variable part of
PR8657

llvm-svn: 120485
2010-11-30 23:05:20 +00:00
Chris Lattner e28618de59 move GetPointerBaseWithConstantOffset out of GVN into ValueTracking.h
llvm-svn: 120476
2010-11-30 22:25:26 +00:00
Chris Lattner 50162e3c2a remove a fixed fixme
llvm-svn: 120474
2010-11-30 22:18:11 +00:00
Chris Lattner 6712251f41 Make DeleteDeadInstruction be a static function, move some code around.
llvm-svn: 120471
2010-11-30 21:58:14 +00:00
Chris Lattner 51d67ce2ff switch RemoveAccessedObjects to use AliasAnalysis::Location to simplify
the code.  We now get accurate sizes on Loads, though it surely doesn't
matter in practice.

llvm-svn: 120469
2010-11-30 21:47:58 +00:00
Chris Lattner f80b39986f two improvements to RemoveAccessedObjects:
1. if the underlying pointer passed in can be resolved
   to any argument or alloca, then we don't need to scan.
   Previously we would only avoid the scan if the alloca
   or byval was actually considered dead.
2. The dead store processing code is itself completely
   dead and didn't handle volatile stores right anyway,
   so delete it.  This allows simplifying the interface
   to RemoveAccessedObjects.

llvm-svn: 120467
2010-11-30 21:38:30 +00:00
Chris Lattner 7fe08b67fa remove the "undead" terminology, which is nonstandard and never
made sense to me.  We now have a set of dead stack objects, and
they become live when loaded.  Fix a theoretical problem where
we'd pass in the wrong pointer to the alias query.

llvm-svn: 120465
2010-11-30 21:32:12 +00:00
Chris Lattner 127818d746 move call handling in handleEndBlock up a bit, and simplify it.
If the call might read all the allocas, stop scanning early.
Convert a vector to smallvector, shrink SmallPtrSet to 16 instead
of 64 to avoid crazy linear scans.

llvm-svn: 120463
2010-11-30 21:18:46 +00:00
Dale Johannesen d3a58c8fa1 Avoid exponential growth of a table. It feels like
there should be a better way to do this.  PR 8679.

llvm-svn: 120457
2010-11-30 20:23:21 +00:00
Chris Lattner 60a8b3dab8 various cleanups and code simplification
llvm-svn: 120454
2010-11-30 19:48:15 +00:00
Chris Lattner 51c28a93cc make getPointerSize a static function. Add ivars to DSE for
AA and MD pass info instead of using getAnalysis<> all over.

llvm-svn: 120453
2010-11-30 19:34:42 +00:00
Chris Lattner 77d79fa25f reduce indentation, clean up TD use a bit.
llvm-svn: 120452
2010-11-30 19:28:23 +00:00
Chris Lattner b63ba73b1b enhance isRemovable to refuse to delete volatile mem transfers
now that DSE hacks on them.  This fixes a regression I introduced,
by generalizing DSE to hack on transfers.

llvm-svn: 120445
2010-11-30 19:12:10 +00:00
Chris Lattner 58b779e9c2 Rewrite the main DSE loop to be written in terms of reasoning
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate.  This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.

This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet.  Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.

llvm-svn: 120406
2010-11-30 07:23:21 +00:00
Anders Carlsson e3ea1cba79 Add a puts optimization that converts puts() to putchar('\n').
llvm-svn: 120398
2010-11-30 06:19:18 +00:00
Chris Lattner 3590ef817c rename a function and reduce some indentation, no functionality change.
llvm-svn: 120391
2010-11-30 05:30:45 +00:00
Chris Lattner 2227a8a192 rename doesClobberMemory -> hasMemoryWrite to be more specific, and
remove an actively-wrong comment.

llvm-svn: 120378
2010-11-30 01:37:52 +00:00
Chris Lattner 9d179d911d clean up handling of 'free', detangling it from everything else.
It can be seriously improved, but at least now it isn't intertwined
with the other logic.

llvm-svn: 120377
2010-11-30 01:28:33 +00:00
Chris Lattner 9a146372b5 Teach basicaa that memset's modref set is at worst "mod" and never
contains "ref".

Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.

llvm-svn: 120368
2010-11-30 00:28:45 +00:00
Chris Lattner c3c754f750 my previous patch would cause us to start deleting some volatile
stores, fix and add a testcase.

llvm-svn: 120363
2010-11-30 00:12:39 +00:00
Chris Lattner d4f1090948 two changes to DSE that shouldn't affect anything:
1. Don't bother trying to optimize:

lifetime.end(ptr)
store(ptr)

as it is undefined, and therefore shouldn't exist.

2. Move the 'storing a loaded pointer' xform up, simplifying
  the may-aliased store code.

llvm-svn: 120359
2010-11-30 00:01:19 +00:00
Chris Lattner b4df1d5a3e prune an llvmcontext include and simplify some code.
llvm-svn: 120347
2010-11-29 23:35:33 +00:00
Chris Lattner 2e8793482c fix PR8677, patch by Jakub Staszak!
llvm-svn: 120325
2010-11-29 21:59:31 +00:00
Owen Anderson 8ba5f39f70 Second attempt at fixing the performance regressions introduced
by my recent GVN improvement.  Looking through a single layer of
PHI nodes when attempting to sink GEPs, we need to iteratively
look through arbitrary PHI nests.

llvm-svn: 120202
2010-11-27 08:15:55 +00:00
Nick Lewycky b8de00ee07 Treat a call of function pointer like a load of the pointer when considering
whether the pointer can be replaced with the global variable it is a copy of.
Fixes PR8680.

llvm-svn: 120126
2010-11-24 22:04:20 +00:00
Duncan Sands 433c1679cf Replace calls to ConstantFoldInstruction with calls to SimplifyInstruction
in two places that are really interested in simplified instructions, not
constants.

llvm-svn: 120044
2010-11-23 20:26:33 +00:00
Duncan Sands bb2cd025a9 Constant folding here is pointless, because InstructionSimplify
(which does constant folding and more) is called a few lines
later.

llvm-svn: 120042
2010-11-23 20:24:21 +00:00
Chris Lattner fc9aead6fd fix comment
llvm-svn: 119948
2010-11-21 19:05:34 +00:00
Chris Lattner 5957229659 rework some DSE paths to use the newly-public "getPointerDependencyFrom"
method in MemDep instead of inserting an instruction, doing a query,
then removing it.  Neither operation is effectively cached.

llvm-svn: 119930
2010-11-21 08:06:10 +00:00
Chris Lattner e48c31ce33 implement PR8576, deleting dead stores with intervening may-alias stores.
llvm-svn: 119927
2010-11-21 07:34:32 +00:00
Chris Lattner 58f9f58716 Implement PR8644: forwarding a memcpy value to a byval,
allowing the memcpy to be eliminated.

Unfortunately, the requirements on byval's without explicit 
alignment are really weak and impossible to predict in the 
mid-level optimizer, so this doesn't kick in much with current
frontends.  The fix is to change clang to set alignment on all
byval arguments.

llvm-svn: 119916
2010-11-21 00:28:59 +00:00
Benjamin Kramer ddd1b7b801 Simplify code. No change in functionality.
llvm-svn: 119908
2010-11-20 18:43:35 +00:00
Owen Anderson ea326db47b Document the new GVN number table structure.
llvm-svn: 119865
2010-11-19 22:48:40 +00:00
Owen Anderson dfb8c3bbfc When folding addressing modes in CodeGenPrepare, attempt to look through PHI nodes
if all the operands of the PHI are equivalent.  This allows CodeGenPrepare to undo
unprofitable PRE transforms.

llvm-svn: 119853
2010-11-19 22:15:03 +00:00
Duncan Sands aef146b890 Factor code for testing whether replacing one value with another
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class.  Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form.  Fixes PR8622.

llvm-svn: 119727
2010-11-18 19:59:41 +00:00
Owen Anderson c21c100f3d Completely rework the datastructure GVN uses to represent the value number to leader mapping. Previously,
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed.  This led to really bad performance on tall, narrow CFGs.

We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree.  Because there are typically few duplicates of a given value, this scan tends to be
quite fast.  Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.

This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.

The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack.  If you see conditional propagation that's not happening, please file bugs against LVI or CVP.

llvm-svn: 119714
2010-11-18 18:32:40 +00:00
Chris Lattner 1385dff8c0 slightly simplify code and substantially improve comment. Instead of
saying "it would be bad", give an example of what is going on.

llvm-svn: 119695
2010-11-18 08:07:09 +00:00
Chris Lattner 731caac7c6 remove a pointless restriction from memcpyopt. It was
refusing to optimize two memcpy's like this:

copy A <- B
copy C <- A

if it couldn't prove that noalias(B,C).  We can eliminate
the copy by producing a memmove instead of memcpy.

llvm-svn: 119694
2010-11-18 08:00:57 +00:00
Chris Lattner c274a83442 remove another pointless noalias check: M is a memcpy, so the
source and dest are known to not overlap.

llvm-svn: 119692
2010-11-18 07:39:57 +00:00
Chris Lattner 75cfe98534 use AA::isNoAlias instead of open coding it. Remove an extraneous noalias check:
there is no need to check to see if the source and dest of a memcpy are noalias,
behavior is undefined if not.

llvm-svn: 119691
2010-11-18 07:38:43 +00:00
Chris Lattner 1e37bbafbb finish a thought.
llvm-svn: 119690
2010-11-18 07:32:33 +00:00
Chris Lattner 7e9b2ea3bf rearrange some code, splitting memcpy/memcpy optimization
out of processMemCpy into its own function.

llvm-svn: 119687
2010-11-18 07:02:37 +00:00
Chris Lattner ac5701319b allow eliminating an alloca that is just copied from an constant global
if it is passed as a byval argument.  The byval argument will just be a
read, so it is safe to read from the original global instead.  This allows
us to promote away the %agg.tmp alloca in PR8582

llvm-svn: 119686
2010-11-18 06:41:51 +00:00
Chris Lattner f183d5c4be enhance the "alloca is just a memcpy from constant global"
to ignore calls that obviously can't modify the alloca
because they are readonly/readnone.

llvm-svn: 119683
2010-11-18 06:26:49 +00:00
Chris Lattner 7aeae25c78 fix a small oversight in the "eliminate memcpy from constant global"
optimization.  If the alloca that is "memcpy'd from constant" also has
a memcpy from *it*, ignore it: it is a load.  We now optimize the testcase to:

define void @test2() {
  %B = alloca %T
  %a = bitcast %T* @G to i8*
  %b = bitcast %T* %B to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false)
  call void @bar(i8* %b)
  ret void
}

previously we would generate:

define void @test() {
  %B = alloca %T
  %b = bitcast %T* %B to i8*
  %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0
  %tmp3 = load i8* %G.0, align 4
  %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1
  %G.15 = bitcast [123 x i8]* %G.1 to i8*
  %1 = bitcast [123 x i8]* %G.1 to i984*
  %srcval = load i984* %1, align 1
  %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0
  store i8 %tmp3, i8* %B.0, align 4
  %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1
  %B.12 = bitcast [123 x i8]* %B.1 to i8*
  %2 = bitcast [123 x i8]* %B.1 to i984*
  store i984 %srcval, i984* %2, align 1
  call void @bar(i8* %b)
  ret void
}

llvm-svn: 119682
2010-11-18 06:20:47 +00:00
Dan Gohman 20d9ce21ef Move SCEV::dominates and properlyDominates to ScalarEvolution.
llvm-svn: 119570
2010-11-17 21:41:58 +00:00
Dan Gohman afd6db9932 Move SCEV::isLoopInvariant and hasComputableLoopEvolution to be member
functions of ScalarEvolution, in preparation for memoization and
other optimizations.

llvm-svn: 119562
2010-11-17 21:23:15 +00:00
Dan Gohman 1ee6d24072 Reference ScalarEvolution by name rather than directly in LICM,
to avoid an unneeded dependence.

llvm-svn: 119557
2010-11-17 20:50:07 +00:00
Duncan Sands 72313843d5 Remove dead code in GVN: now that SimplifyInstruction is called
systematically, CollapsePhi will always return null here.  Note
that CollapsePhi did an extra check, isSafeReplacement, which
the SimplifyInstruction logic does not do.  I think that check
was bogus - I guess we will soon find out!  (It was originally
added in commit 41998 without a testcase).

llvm-svn: 119456
2010-11-17 04:05:21 +00:00
Duncan Sands 637049515f Have a few places that want to simplify phi nodes use SimplifyInstruction
rather than calling hasConstantValue.  No intended functionality change.

llvm-svn: 119352
2010-11-16 17:41:24 +00:00
Duncan Sands b99f39b9f6 If dom tree information is available, make it possible to pass
it to get better phi node simplification.

llvm-svn: 119055
2010-11-14 18:36:10 +00:00
Duncan Sands 246b71c596 Have GVN simplify instructions as it goes. For example, consider
"%z = %x and %y".  If GVN can prove that %y equals %x, then it turns
this into "%z = %x and %x".  With the new code, %z will be replaced
with %x everywhere (and then deleted).  Previously %z would be value
numbered too, which is a waste of time.  Also, while a clever value
numbering algorithm would give %z the same value number as %x, our
current one doesn't do so (at least I don't think it does).  The new
logic has an essentially equivalent effect to what you would get if
%z was given the same value number as %x, i.e. it should make value
numbering smarter.  While there, get hold of target data once at the
start rather than a gazillion times all over the place.

llvm-svn: 118923
2010-11-12 21:10:24 +00:00
Dan Gohman d4b7fff2e8 Enhance DSE to handle the case where a free call makes more than
one store dead. This is especially noticeable in
SingleSource/Benchmarks/Shootout/objinst.

llvm-svn: 118875
2010-11-12 02:19:17 +00:00
Dan Gohman 65316d6749 Add helper functions for computing the Location of load, store,
and vaarg instructions.

llvm-svn: 118845
2010-11-11 21:50:19 +00:00
Dan Gohman 0cc4c7516e Make Sink tbaa-aware.
llvm-svn: 118788
2010-11-11 16:21:47 +00:00
Dan Gohman c3b4ea7b7d It's safe to sink some instructions which are not safe to speculatively
execute. Make Sink's predicate more precise.

llvm-svn: 118787
2010-11-11 16:20:28 +00:00
Dan Gohman 0a6021a54d Enhance GVN to do more precise alias queries for non-local memory
references. For example, this allows gvn to eliminate the load in
this example:

  void foo(int n, int* p, int *q) {
    p[0] = 0;
    p[1] = 1;
    if (n) {
      *q = p[0];
    }
  }

llvm-svn: 118714
2010-11-10 20:37:15 +00:00
Dan Gohman d209911642 Use getValueOperand() and getPointerOperand() on load and store
instructions instead of hard-coding operand numbers.

llvm-svn: 118698
2010-11-10 19:03:33 +00:00
Dan Gohman 0f17507478 Teach LICM and AliasSetTracker about AccessesArgumentsReadonly.
llvm-svn: 118618
2010-11-09 19:58:21 +00:00
Owen Anderson 374e1464ae Give up on doing in-line instruction simplification during correlated value propagation. Instruction simplification
needs to be guaranteed never to be run on an unreachable block.  However, earlier block simplifications may have
changed the CFG to make block that were reachable when we began our iteration unreachable by the time we try to
simplify them. (Note that this also means that our depth-first iterators were potentially being invalidated).

This should not have a large impact on code quality, since later runs of instcombine should pick up these simplifications.
Fixes PR8506.

llvm-svn: 117709
2010-10-29 21:05:17 +00:00
John Thompson e8360b7182 Inline asm multiple alternative constraints development phase 2 - improved basic logic, added initial platform support.
llvm-svn: 117667
2010-10-29 17:29:13 +00:00
Dan Gohman f372cf869b Reapply r116831 and r116839, converting AliasAnalysis to use
uint64_t, plus fixes for places I missed before.

llvm-svn: 116875
2010-10-19 22:54:46 +00:00
Dan Gohman b4aa503501 Revert r116831 and r116839, which are breaking selfhost builds.
llvm-svn: 116858
2010-10-19 21:06:16 +00:00
Owen Anderson a4fefc1949 Passes do not need to recursively initialize passes that they preserve, if
they do not also require them.  This allows us to reduce inter-pass linkage
dependencies.

llvm-svn: 116854
2010-10-19 20:08:44 +00:00
Dan Gohman 896ac62346 Oops, check in all the files for converting AliasAnalysis to
use uint64_t.

llvm-svn: 116839
2010-10-19 18:08:27 +00:00
Owen Anderson 6c18d1aac0 Get rid of static constructors for pass registration. Instead, every pass exposes an initializeMyPassFunction(), which
must be called in the pass's constructor.  This function uses static dependency declarations to recursively initialize
the pass's dependencies.

Clients that only create passes through the createFooPass() APIs will require no changes.  Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.

I have tested this with all standard configurations of clang and llvm-gcc on Darwin.  It is possible that there are problems
with the static dependencies that will only be visible with non-standard options.  If you encounter any crash in pass
registration/creation, please send the testcase to me directly.

llvm-svn: 116820
2010-10-19 17:21:58 +00:00
Dan Gohman 14fe8cf238 Consistently use AliasAnalysis::UnknownSize instead of hardcoding ~0u.
llvm-svn: 116815
2010-10-19 17:06:23 +00:00
Dan Gohman 71af9db0e8 Make AliasSetTracker TBAA-aware, enabling TBAA-enabled LICM.
llvm-svn: 116743
2010-10-18 20:44:50 +00:00
Benjamin Kramer 1dc34b48dd Eliminate some calls to Value::getNameStr.
llvm-svn: 116670
2010-10-16 11:28:23 +00:00
Owen Anderson 18e4fed3fa Generalize MemCpyOpt's handling of call slot forwarding to function properly when the call slot
forwarding is implemented with a load/store pair rather than a memcpy.

llvm-svn: 116637
2010-10-15 22:52:12 +00:00
Rafael Espindola 229e38f0fe Be more consistent in using ValueToValueMapTy.
llvm-svn: 116387
2010-10-13 01:36:30 +00:00
Owen Anderson 8ac477ffb5 Begin adding static dependence information to passes, which will allow us to
perform initialization without static constructors AND without explicit initialization
by the client.  For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve.  I hope to be able to relax
the latter requirement in the future.

llvm-svn: 116334
2010-10-12 19:48:12 +00:00
Dan Gohman 2fd85d7cd2 Filter out illegal formulae after updating offsets, not before, so that
formulae which become illegal as a result of the offset updating don't
escape.

This is for rdar://8529692. No testcase yet, because the given cases
hit use-list ordering differences.

llvm-svn: 116093
2010-10-08 19:33:26 +00:00
Daniel Dunbar d4e9c3b43a Update CMake.
llvm-svn: 116034
2010-10-08 02:30:03 +00:00
Dan Gohman 5947e1626a Delete the FormulaSorter class and inline its one method into its
one user. This code will be restructured soon and FormulaSorter
is getting in the way.

llvm-svn: 116012
2010-10-07 23:52:18 +00:00
Dan Gohman 1b61fd9bff Fix a spello.
llvm-svn: 116011
2010-10-07 23:43:09 +00:00
Dan Gohman 34f37e0d04 Charge a formula for explicit multiplies on scaled registers too,
not just base registers.

llvm-svn: 116010
2010-10-07 23:41:58 +00:00
Dan Gohman 49d638b45a Use size_t for consistency.
llvm-svn: 116009
2010-10-07 23:37:58 +00:00
Dan Gohman 8e72611058 When merging one use into another, transfer the offsets from
the old use to the new one.

llvm-svn: 116008
2010-10-07 23:36:45 +00:00
Dan Gohman a7b68d6d95 Fix LSR to keep the RegUseTracker up to date when combining users.
This doesn't usually matter, because the other heuristics usually
succeed regardless, but it's good to keep the register use
bookkeeping consistent.

llvm-svn: 116005
2010-10-07 23:33:43 +00:00
Devang Patel 57da4caa85 Remove LoopIndexSplit pass. It is neither maintained nor used by anyone.
llvm-svn: 116004
2010-10-07 23:29:37 +00:00
Owen Anderson df7a4f2515 Now with fewer extraneous semicolons!
llvm-svn: 115996
2010-10-07 22:25:06 +00:00
Owen Anderson 4698c5d7f7 Next step on the getting-rid-of-static-ctors train: begin adding per-library
initialization functions that initialize the set of passes implemented in
that library.  Add C bindings for these functions as well.

llvm-svn: 115927
2010-10-07 17:55:47 +00:00
Owen Anderson 13a642da0b Now that the profitable bits of EnableFullLoadPRE have been enabled by default, rip out the remainder.
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.

llvm-svn: 115337
2010-10-01 20:02:55 +00:00
Eric Christopher 3ad2f3a2f2 Fix the other half of the alignment changing issue by making sure that the
memcpy alignment is the minimum of the incoming alignments.

Fixes PR 8266.

llvm-svn: 115305
2010-10-01 09:02:05 +00:00
Dale Johannesen dd224d2333 Massive rewrite of MMX:
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.

Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics. 

MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces.  Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.

The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.

llvm-svn: 115243
2010-09-30 23:57:10 +00:00
Owen Anderson 3170a25a84 We do want to allow LoadPRE to perform LICM-like transformations: we already consider PHI nodes to be negligible for
code size (making this transform code size neutral), and it allows us to hoist values out of loops, which is always
a good thing.

llvm-svn: 115205
2010-09-30 20:53:04 +00:00
Jakob Stoklund Olesen eb12f49fb7 Try again to disable critical edge splitting in CodeGenPrepare.
The bug that broke i386 linux has been fixed in r115191.

llvm-svn: 115204
2010-09-30 20:51:52 +00:00