Commit Graph

2490 Commits

Author SHA1 Message Date
Chris Lattner 054d2a8525 merge tests into one crash.ll test.
llvm-svn: 123220
2011-01-11 07:50:07 +00:00
Chris Lattner 63fe78de68 remove a bogus assertion: the latch block of a loop is not
neccesarily an uncond branch to the header.  This fixes 
PR8955 (the assertion tripping).

llvm-svn: 123219
2011-01-11 07:47:59 +00:00
Chandler Carruth b1e7f557b7 Teach constant folding to perform conversions from constant floating
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.

llvm-svn: 123206
2011-01-11 01:07:24 +00:00
Chandler Carruth fdf4969149 FileCheck-ize a test, and move a no-longer calling test case to another
file and make it actually test something...

llvm-svn: 123205
2011-01-11 01:07:20 +00:00
Owen Anderson d490c2d2ae Fix a random missed optimization by making InstCombine more aggressive when determining which bits are demanded by
a comparison against a constant.

llvm-svn: 123203
2011-01-11 00:36:45 +00:00
Chandler Carruth cf414cf0a6 Teach instcombine about the rest of the SSE and SSE2 conversion
intrinsics element dependencies. Reviewed by Nick.

llvm-svn: 123161
2011-01-10 07:19:37 +00:00
Chandler Carruth 7bb282ebb1 Fold two related tests into the newly FileCheck-ized test, migrating
them to FileCheck as well.

llvm-svn: 123154
2011-01-10 02:53:58 +00:00
Chandler Carruth ef7aac5961 Clean up and FileCheck-ize a test.
llvm-svn: 123153
2011-01-10 02:53:54 +00:00
Chris Lattner ec1387cf4b fix typo
llvm-svn: 123148
2011-01-10 02:33:34 +00:00
Chris Lattner 4662bd4b13 another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhost
back to life.

llvm-svn: 123146
2011-01-10 00:47:34 +00:00
Chris Lattner 1017fa6746 temporarily disable memset formation from memsets in an effort to restore buildbot stability.
llvm-svn: 123144
2011-01-09 23:52:48 +00:00
Tobias Grosser cc21c4aa98 Instcombine: Fix pattern where the sext did not dominate the icmp using it
llvm-svn: 123121
2011-01-09 16:00:11 +00:00
Chris Lattner 9a1d63ba9f Merge memsets followed by neighboring memsets and other stores into
larger memsets.  Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:

_mad_synth_mute:                        ## @mad_synth_mute
## BB#0:                                ## %entry
	pushq	%rax
	movl	$4096, %esi             ## imm = 0x1000
	callq	___bzero
	popq	%rax
	ret

llvm-svn: 123089
2011-01-08 21:19:19 +00:00
Chris Lattner 5120ebf184 fix an issue in IsPointerOffset that prevented us from recognizing that
P and P+1 are relative to the same base pointer.

llvm-svn: 123087
2011-01-08 21:07:56 +00:00
Chris Lattner 4dc1fd938f enhance memcpyopt to merge a store and a subsequent
memset into a single larger memset.

llvm-svn: 123086
2011-01-08 20:54:51 +00:00
Chris Lattner 9dbbc49f74 merge two tests and filecheckify
llvm-svn: 123082
2011-01-08 20:27:22 +00:00
Chris Lattner 59c82f850d When loop rotation happens, it is *very* common for the duplicated condbr
to be foldable into an uncond branch.  When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.

Handle this case more aggressively.  For example, previously on
phi-duplicate.ll we would get this:


define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  %cmp1 = icmp slt i64 1, 1000
  br i1 %cmp1, label %bb.nph, label %for.end

bb.nph:                                           ; preds = %entry
  br label %for.body

for.body:                                         ; preds = %bb.nph, %for.cond
  %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.02
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.02, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.02, 1
  br label %for.cond

for.cond:                                         ; preds = %for.body
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge

for.cond.for.end_crit_edge:                       ; preds = %for.cond
  br label %for.end

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  ret void
}

Now we get the much nicer:

define void @test(i32 %N, double* %G) nounwind ssp {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds double* %G, i64 %j.01
  %tmp3 = load double* %arrayidx
  %sub = sub i64 %j.01, 1
  %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
  %tmp7 = load double* %arrayidx6
  %add = fadd double %tmp3, %tmp7
  %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
  store double %add, double* %arrayidx10
  %inc = add nsw i64 %j.01, 1
  %cmp = icmp slt i64 %inc, 1000
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

With all of these recent changes, we are now able to compile:

void foo(char *X) {
 for (int i = 0; i != 100; ++i) 
   for (int j = 0; j != 100; ++j)
     X[j+i*100] = 0;
}

into a single memset of 10000 bytes.  This series of changes
should also be helpful for other nested loop scenarios as well.

llvm-svn: 123079
2011-01-08 19:59:06 +00:00
Chris Lattner 063dca0f6a Three major changes:
1. Rip out LoopRotate's domfrontier updating code.  It isn't
   needed now that LICM doesn't use DF and it is super complex
   and gross.
2. Make DomTree updating code a lot simpler and faster.  The 
   old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
   SplitCriticalEdge instead of doing an overcomplex 
   reimplementation of it.

No behavior change, except for the name of the inserted preheader.

llvm-svn: 123072
2011-01-08 18:52:51 +00:00
Frits van Bommel 6a1fb8f235 Fix a bug in r123034 (trying to sext/zext non-integers) and clean up a little.
llvm-svn: 123061
2011-01-08 10:51:36 +00:00
Chris Lattner 8c5defd0b0 Have loop-rotate simplify instructions (yay instsimplify!) as it clones
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.

Not aggressively handling this is pessimizing later loop optimizations 
somethin' fierce by making "dominates all exit blocks" checks fail.

llvm-svn: 123060
2011-01-08 08:24:46 +00:00
Tobias Grosser fc3d7f664b InstCombine: Match min/max hidden by sext/zext
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X

Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866

llvm-svn: 123034
2011-01-07 21:33:14 +00:00
Benjamin Kramer 134cde912a Revert 122959, it needs more thought. Add it back to README.txt with additional notes.
llvm-svn: 123030
2011-01-07 20:42:20 +00:00
Benjamin Kramer ae67cc13a9 InstCombine: Turn _chk functions into the "unsafe" variant if length and max langth are equal.
This happens when we take the (non-constant) length from a malloc.

llvm-svn: 122961
2011-01-06 14:22:52 +00:00
Benjamin Kramer 799b011276 InstCombine: If we call llvm.objectsize on a malloc call we can replace it with the size passed to malloc.
llvm-svn: 122959
2011-01-06 13:11:05 +00:00
Benjamin Kramer a76cc117e0 InstCombine: Teach llvm.objectsize folding to look through GEPs.
llvm-svn: 122958
2011-01-06 13:07:49 +00:00
Chris Lattner 5858e091a6 implement constant folding support for an exotic constant expr:
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)

to "ret i64 1000".  This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array.  This
allows us to transform it into a memset with a constant size.

llvm-svn: 122950
2011-01-06 06:19:46 +00:00
Chris Lattner c86e67e110 fix an off-by-one bug that caused a crash analyzing
ashr's with huge shift amounts, PR8896

llvm-svn: 122814
2011-01-04 18:19:15 +00:00
Chris Lattner 8643810ede Teach loop-idiom to turn a loop containing a memset into a larger memset
when safe.

The testcase is basically this nested loop:
void foo(char *X) {
  for (int i = 0; i != 100; ++i) 
    for (int j = 0; j != 100; ++j)
      X[j+i*100] = 0;
}

which gets turned into a single memset now.  clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.

llvm-svn: 122806
2011-01-04 07:46:33 +00:00
Chris Lattner bde6ec1db6 Duncan deftly points out that readnone functions aren't
invalidated by stores, so they can be handled as 'simple'
operations.

llvm-svn: 122785
2011-01-03 23:38:13 +00:00
Chris Lattner 9e5e9ed79a earlycse can do trivial with-a-block dead store
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
2011-01-03 04:17:24 +00:00
Chris Lattner e0e32a9ef0 now that loads are in their own table, we can implement
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
2011-01-03 03:46:34 +00:00
Chris Lattner 0446bb23f8 add a testcase for readonly call CSE
llvm-svn: 122730
2011-01-03 03:33:47 +00:00
Chris Lattner b9a8efc960 Teach EarlyCSE to do trivial CSE of loads and read-only calls.
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
2011-01-03 03:18:43 +00:00
Chris Lattner 8fac5db251 add DEBUG and -stats output to earlycse.
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
2011-01-02 23:19:45 +00:00
Chris Lattner 18ae5436b1 Enhance earlycse to do CSE of casts, instsimplify and die.
Add a testcase.

llvm-svn: 122715
2011-01-02 23:04:14 +00:00
Chris Lattner 9c69406f2b fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712
2011-01-02 21:14:18 +00:00
Chris Lattner 5702a43c09 If a loop iterates exactly once (has backedge count = 0) then don't
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
2011-01-02 20:24:21 +00:00
Chris Lattner 8455b6e45e enhance loop idiom recognition to scan *all* unconditionally executed
blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704
2011-01-02 19:01:03 +00:00
Duncan Sands 64f1c0dcda Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described
in the PR, the pass could break LCSSA form when inserting preheaders.  It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.

llvm-svn: 122695
2011-01-02 13:38:21 +00:00
Chris Lattner ddf58010bd Allow loop-idiom to run on multiple BB loops, but still only scan the loop
header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685
2011-01-02 07:58:36 +00:00
Chris Lattner 85b6d81d41 teach loop idiom recognition to form memcpy's from simple loops.
llvm-svn: 122678
2011-01-02 03:37:56 +00:00
Chris Lattner 1903c42b97 fix a globalopt crash on two Adobe-C++ testcases that the recent
loop idiom pass exposed.

llvm-svn: 122674
2011-01-01 22:31:46 +00:00
Chris Lattner a3514441e0 add a validity check that was missed, fixing a crash on the
new testcase.

llvm-svn: 122662
2011-01-01 20:12:04 +00:00
Duncan Sands 772749aea1 Revert commit 122654 at the request of Chris, who reckons that instsimplify
is the wrong hammer for this nail, and is probably right.

llvm-svn: 122661
2011-01-01 20:08:02 +00:00
Chris Lattner 91a4435875 improve validity check to handle constant-trip-count loops more
aggressively.  In practice, this doesn't help anything though,
see the todo.

llvm-svn: 122660
2011-01-01 19:54:22 +00:00
Chris Lattner 8b3baf6d75 implement the "no aliasing accesses in loop" safety check. This pass
should be correct now.

llvm-svn: 122659
2011-01-01 19:39:01 +00:00
Duncan Sands e3c539581c Fix a README item by having InstructionSimplify do a mild form of value
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.

llvm-svn: 122654
2011-01-01 16:12:09 +00:00
NAKAMURA Takumi a834008959 test/Transforms/ConstProp/logicaltest.ll: FileCheck-ize.
llvm-svn: 122620
2010-12-29 03:58:56 +00:00
Chris Lattner 29e14edc8d implement enough of the memset inference algorithm to recognize and insert
memsets.  This is still missing one important validity check, but this is enough
to compile stuff like this:

void test0(std::vector<char> &X) {
  for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
    *I = 0;
}

void test1(std::vector<int> &X) {
  for (long i = 0, e = X.size(); i != e; ++i)
    X[i] = 0x01010101;
}

With:
 $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 

to:

__Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rsi
	cmpq	%rsi, %rax
	je	LBB0_2
## BB#1:                                ## %bb.nph
	subq	%rax, %rsi
	movq	%rax, %rdi
	callq	___bzero
LBB0_2:                                 ## %for.end
	addq	$8, %rsp
	ret
...
__Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
## BB#0:                                ## %entry
	subq	$8, %rsp
	movq	(%rdi), %rax
	movq	8(%rdi), %rdx
	subq	%rax, %rdx
	cmpq	$4, %rdx
	jb	LBB1_2
## BB#1:                                ## %for.body.preheader
	andq	$-4, %rdx
	movl	$1, %esi
	movq	%rax, %rdi
	callq	_memset
LBB1_2:                                 ## %for.end
	addq	$8, %rsp
	ret

llvm-svn: 122573
2010-12-26 23:42:51 +00:00
Chris Lattner 6cf8d6cc6e start using irbuilder to make mem intrinsics in a few passes.
llvm-svn: 122572
2010-12-26 22:57:41 +00:00
Benjamin Kramer ea9152e551 MemCpyOpt: Turn memcpys from a constant into a memset if possible.
This allows us to compile "int cst[] = {-1, -1, -1};" into
  movl  $-1, 16(%rsp)
  movq  $-1, 8(%rsp)
instead of
  movl  _cst+8(%rip), %eax
  movl  %eax, 16(%rsp)
  movq  _cst(%rip), %rax
  movq  %rax, 8(%rsp)

llvm-svn: 122548
2010-12-24 21:17:12 +00:00
Owen Anderson 226ac14afb When determining if we can fold (x >> C1) << C2, the bits that we need to verify are zero
are not the low bits of x, but the bits that WILL be the low bits after the operation completes.

llvm-svn: 122529
2010-12-23 23:56:24 +00:00
Benjamin Kramer 8ef5001b27 InstCombine: creating selects from -1 and 0 is fine, they combine into a sext from i1.
llvm-svn: 122453
2010-12-22 23:12:15 +00:00
Duncan Sands a45cfbd405 When determining whether the new instruction was already present in
the original instruction, half the cases were missed (making it not
wrong but suboptimal).  Also correct a typo (A <-> B) in the second
chunk. 

llvm-svn: 122414
2010-12-22 17:15:25 +00:00
Duncan Sands bab9456f81 Make this test not depend on how the variable is named.
llvm-svn: 122413
2010-12-22 17:08:04 +00:00
Duncan Sands fbb9ac3cca Add a generic expansion transform: A op (B op' C) -> (A op B) op' (A op C)
if both A op B and A op C simplify.  This fires fairly often but doesn't
make that much difference.  On gcc-as-one-file it removes two "and"s and
turns one branch into a select.

llvm-svn: 122399
2010-12-22 13:36:08 +00:00
Owen Anderson 5ab8d4b5e5 Give GVN back the ability to perform simple conditional propagation on conditional branch values.
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.

llvm-svn: 122378
2010-12-21 23:54:34 +00:00
Duncan Sands 76befde93a Add an additional InstructionSimplify factorization test.
llvm-svn: 122333
2010-12-21 15:12:22 +00:00
Duncan Sands fecc642224 While I don't think any later transforms can fire, it seems cleaner to
not assume this (for example in case more transforms get added below
it).  Suggested by Frits van Bommel.

llvm-svn: 122332
2010-12-21 15:03:43 +00:00
Duncan Sands 07c17132d7 Fix typo in comment, spotted by Deewiant.
llvm-svn: 122329
2010-12-21 13:39:20 +00:00
Duncan Sands ee3ec6eb94 Teach InstructionSimplify about distributive laws. These transforms fire
quite often, but don't make much difference in practice presumably because
instcombine also knows them and more.

llvm-svn: 122328
2010-12-21 13:32:22 +00:00
Duncan Sands 6c7a52cf80 Add generic simplification of associative operations, generalizing
a couple of existing transforms.  This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.

llvm-svn: 122326
2010-12-21 08:49:00 +00:00
Benjamin Kramer 68531baea9 Teach InstCombine to merge (icmp ult (X + CA), C1) | (icmp eq X, C2) into (icmp ult (X + CA), C1 + 1) if C2 + CA == C1.
InstCombine creates these so now we compile x == 23 || x == 24 || x == 25 to
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 3
instead of
  %x.off = add i32 %x, -23
  %1 = icmp ult i32 %x.off, 2
  %cmp3 = icmp eq i32 %x, 25
  %ret2 = or i1 %1, %cmp3

llvm-svn: 122248
2010-12-20 16:18:51 +00:00
Duncan Sands ed6d6c33dd Have SimplifyBinOp dispatch Xor, Add and Sub to the corresponding methods
(they had just been forgotten before).  Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.

llvm-svn: 122245
2010-12-20 14:47:04 +00:00
Chris Lattner 27ca8ebd4b fix PR8807 by making transformConstExprCastCall aware of byval arguments.
llvm-svn: 122238
2010-12-20 08:36:38 +00:00
Chris Lattner 0f11495289 when eliding a byval copy due to inlining a readonly function, we have
to make sure that the reused alloca has sufficient alignment.

llvm-svn: 122236
2010-12-20 08:10:40 +00:00
Chris Lattner 0099744506 pull byval processing out to its own helper function.
llvm-svn: 122235
2010-12-20 07:57:41 +00:00
Chris Lattner 7394680a00 fix PR8769, a miscompilation by inliner when inlining a function with a byval
argument.  The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.

llvm-svn: 122234
2010-12-20 07:45:28 +00:00
Chris Lattner a9a5c59dd1 merge two tests.
llvm-svn: 122233
2010-12-20 07:39:57 +00:00
Chris Lattner 6f3ddbd5bc filecheckize
llvm-svn: 122232
2010-12-20 07:38:24 +00:00
Mon P Wang 7bcead020d Test case for r122215 when InstCombine optimizes memset
llvm-svn: 122216
2010-12-20 01:06:23 +00:00
Chris Lattner 1e8c032a6e X86 supports i8/i16 overflow ops (except i8 multiplies), we should
generate them.  

Now we compile:

define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
entry:
  %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
  %cmp = extractvalue %0 %0, 1
  br i1 %cmp, label %if.then, label %if.end

into:

_X:                                     ## @X
## BB#0:                                ## %entry
	subl	$12, %esp
	movb	16(%esp), %al
	addb	20(%esp), %al
	jo	LBB0_2

Before we were generating:

_X:                                     ## @X
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movb	12(%ebp), %al
	testb	%al, %al
	setge	%cl
	movb	8(%ebp), %dl
	testb	%dl, %dl
	setge	%ah
	cmpb	%cl, %ah
	sete	%cl
	addb	%al, %dl
	testb	%dl, %dl
	setge	%al
	cmpb	%al, %ah
	setne	%al
	andb	%cl, %al
	testb	%al, %al
	jne	LBB0_2

llvm-svn: 122186
2010-12-19 20:03:11 +00:00
Chris Lattner 5e0c0c72e9 recognize an unsigned add with overflow idiom into uadd.
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.

Previously we got:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	cmpq	%rdi, %rsi
	movl	$42, %eax
	cmovaq	%rsi, %rax
	ret

Now we get:

_test7:                                 ## @test7
	addq	%rsi, %rdi
	movl	$42, %eax
	cmovbq	%rsi, %rax
	ret

llvm-svn: 122182
2010-12-19 19:37:52 +00:00
Chris Lattner 33dc3f0cfa optimize uadd(x, cst) into a comparison when the normal
result is dead.  This is required for my next patch to not
regress the testsuite.

llvm-svn: 122181
2010-12-19 19:35:32 +00:00
Chris Lattner 79874566ce generalize the sadd creation code to not require that the
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:

unsigned char X(char a, char b) {
  int res = a+b;
  if ((unsigned )(res+128) > 255U)
    abort();
  return res;
}

llvm-svn: 122178
2010-12-19 18:35:09 +00:00
Chris Lattner c56c845377 fix another miscompile in the llvm.sadd formation logic: it wasn't
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.

This is likely to be the fix for 8816.

llvm-svn: 122177
2010-12-19 18:22:06 +00:00
Chris Lattner f29562db25 fix a bug (possibly 8816) in the sadd forming xform: it isn't
profitable (or safe) to promote code when the add-with-constant
has other uses.

llvm-svn: 122175
2010-12-19 17:59:02 +00:00
Chris Lattner 408a684d29 Enhance LICM to promote alias sets whose pointers themselves are stored,
which doesn't affect the memory address being promoted.

llvm-svn: 122172
2010-12-19 05:57:25 +00:00
Chris Lattner 3337a81450 fix PR8602, a bug in an assertion: a volatile store *of* a pointer
does not make the alias set for that pointer volatile, just stores
*to* the pointer.

llvm-svn: 122171
2010-12-19 05:51:54 +00:00
Chris Lattner fb888622c3 revert r122164, I'm going to go with a different approach.
llvm-svn: 122168
2010-12-19 04:23:03 +00:00
Chris Lattner 583ec6fa44 first step to fixing PR8642: don't fold away empty basic blocks
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.

llvm-svn: 122164
2010-12-19 03:02:34 +00:00
Chris Lattner 2b43f2df2b move this test into the ARM test so that it is only run when the arm backend
is enabled.

llvm-svn: 122163
2010-12-19 02:58:14 +00:00
Nate Begeman 7aa18bf46a Add vector versions of some existing scalar transforms to aid codegen in matching psign & pblend operations to the IR produced by clang/gcc for their C idioms.
llvm-svn: 122105
2010-12-17 23:12:19 +00:00
Owen Anderson 1294ea7d53 Reapply r121905 (automatic synthesis of @llvm.sadd.with.overflow) with a fix for a bug that manifested itself
on the DragonEgg self-host bot.  Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.

llvm-svn: 122072
2010-12-17 18:08:00 +00:00
Benjamin Kramer e5f49c4ff2 SimplifyCFG: Ranges can be larger than 64 bits. Fixes Release-selfhost build.
llvm-svn: 122054
2010-12-17 10:48:14 +00:00
Chris Lattner d14b0f1db7 improve switch formation to handle small range
comparisons formed by comparisons.  For example,
this:

void foo(unsigned x) {
  if (x == 0 || x == 1 || x == 3 || x == 4 || x == 6) 
    bar();
}

compiles into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$6, %edi
	ja	LBB0_2
## BB#1:                                ## %entry
	movl	%edi, %eax
	movl	$91, %ecx
	btq	%rax, %rcx
	jb	LBB0_3

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	cmpl	$2, %edi
	jb	LBB0_4
## BB#1:                                ## %switch.early.test
	cmpl	$6, %edi
	ja	LBB0_3
## BB#2:                                ## %switch.early.test
	movl	%edi, %eax
	movl	$88, %ecx
	btq	%rax, %rcx
	jb	LBB0_4

This catches a bunch of cases in GCC, which look like this:

 %804 = load i32* @which_alternative, align 4, !tbaa !0
 %805 = icmp ult i32 %804, 2
 %806 = icmp eq i32 %804, 3
 %or.cond121 = or i1 %805, %806
 %807 = icmp eq i32 %804, 4
 %or.cond124 = or i1 %or.cond121, %807
 br i1 %or.cond124, label %.thread, label %808

turning this into a range comparison.

llvm-svn: 122045
2010-12-17 06:20:15 +00:00
Dan Gohman 93dc2b808f Revert r64460. strtol and friends cannot be marked readonly, even with
a null endptr argument, because they may write to errno.

This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.

llvm-svn: 122014
2010-12-17 01:09:43 +00:00
Duncan Sands 8d1ab6f6e1 Speculatively revert commit 121905 since it looks like it might have broken the
dragonegg self-host buildbot.  Original commit message:

Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

llvm-svn: 121965
2010-12-16 09:40:54 +00:00
Dan Gohman 4467aa5294 Preserve TBAA tags when doing load PRE.
llvm-svn: 121921
2010-12-15 23:53:55 +00:00
Owen Anderson 1cf8881299 Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics.  This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.

Fixes <rdar://problem/8558713>.

llvm-svn: 121905
2010-12-15 22:32:38 +00:00
Frits van Bommel 3d1803495e Teach jump threading to "look through" a select when the branch direction of a terminator depends on it.
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.

llvm-svn: 121859
2010-12-15 09:51:20 +00:00
Owen Anderson 35609d97ae Fix PR8790, another instance where unreachable code can cause instruction simplification to fail,
this case involve a select that simplifies to itself.

llvm-svn: 121817
2010-12-15 00:55:35 +00:00
Chris Lattner 7499b452c1 - Insert new instructions before DomBlock's terminator,
which is simpler than finding a place to insert in BB.
 - Don't perform the 'if condition hoisting' xform on certain
   i1 PHIs, as it interferes with switch formation.

This re-fixes "example 7", without breaking the world hopefully.

llvm-svn: 121764
2010-12-14 08:46:09 +00:00
Chris Lattner 335f0e4ad4 fix two significant issues with FoldTwoEntryPHINode:
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.

second, it doesn't clean up the "if diamond" that it just 
eliminated away.  This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.

llvm-svn: 121762
2010-12-14 08:01:53 +00:00
Chris Lattner f130661688 fix yet anohter broken line
llvm-svn: 121750
2010-12-14 06:09:07 +00:00
Chris Lattner 5a9d59d918 reapply my recent change that disables a piece of the switch formation
work, but fixes 400.perlbmk.

llvm-svn: 121749
2010-12-14 05:57:30 +00:00
Owen Anderson 3e5648896e Fix recent buildbot breakage by pulling SimplifyCFG back to its state as of r121694, the most recent state
where I'm confident there were no crashes or miscompilations.  XFAIL the test added since then for now.

llvm-svn: 121733
2010-12-13 23:49:28 +00:00
Chris Lattner a6e5d5694a temporarily disable part of my previous patch, which causes an iterator invalidation issue, causing a crash on some versions of perlbmk.
llvm-svn: 121728
2010-12-13 23:02:19 +00:00
Benjamin Kramer 1e155ab7e1 Fix sort predicate. qsort(3)'s predicate semantics differ from std::sort's. Fixes PR 8780.
llvm-svn: 121705
2010-12-13 18:20:38 +00:00
Chris Lattner fb836f8c1a reinstate my patch: the miscompile was caused by an inverted branch in the
'and' case.

llvm-svn: 121695
2010-12-13 08:12:19 +00:00
Chris Lattner 79db357d80 Completely disable the optimization I added in r121680 until
I can track down a miscompile.  This should bring the buildbots
back to life

llvm-svn: 121693
2010-12-13 07:41:29 +00:00
Chris Lattner fbeb55844b Make simplifycfg reprocess newly formed "br (cond1 | cond2)" conditions
when simplifying, allowing them to be eagerly turned into switches.  This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320

On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:

_crud:                                  ## @crud
## BB#0:                                ## %entry
	cmpb	$33, %dil
	jb	LBB0_4
## BB#1:                                ## %switch.early.test
	addb	$-34, %dil
	cmpb	$58, %dil
	ja	LBB0_3
## BB#2:                                ## %switch.early.test
	movzbl	%dil, %eax
	movabsq	$288230376537592865, %rcx ## imm = 0x400000017001421
	btq	%rax, %rcx
	jb	LBB0_4
LBB0_3:                                 ## %lor.rhs
	xorl	%eax, %eax
	ret
LBB0_4:                                 ## %lor.end
	movl	$1, %eax
	ret

llvm-svn: 121690
2010-12-13 07:00:06 +00:00
Chris Lattner cb570f87e5 fix a bug in r121680 that upset the various buildbots.
llvm-svn: 121687
2010-12-13 05:34:18 +00:00
Chris Lattner bc9e6d9dbe make these tests a bit less fragile
llvm-svn: 121682
2010-12-13 05:10:30 +00:00
Chris Lattner a442f24a36 enhance the "change or icmp's into switch" xform to handle one value in an
'or sequence' that it doesn't understand.  This allows us to optimize
something insane like this:

int crud (unsigned char c, unsigned x)
 {
   if(((((((((( (int) c <= 32 ||
                    (int) c == 46) || (int) c == 44)
                  || (int) c == 58) || (int) c == 59) || (int) c == 60)
               || (int) c == 62) || (int) c == 34) || (int) c == 92)
            || (int) c == 39) != 0)
     foo();
 }

into:

define i32 @crud(i8 zeroext %c, i32 %x) nounwind ssp noredzone {
entry:
  %cmp = icmp ult i8 %c, 33
  br i1 %cmp, label %if.then, label %switch.early.test

switch.early.test:                                ; preds = %entry
  switch i8 %c, label %if.end [
    i8 39, label %if.then
    i8 44, label %if.then
    i8 58, label %if.then
    i8 59, label %if.then
    i8 60, label %if.then
    i8 62, label %if.then
    i8 46, label %if.then
    i8 92, label %if.then
    i8 34, label %if.then
  ]

by pulling the < comparison out ahead of the newly formed switch.

llvm-svn: 121680
2010-12-13 04:50:38 +00:00
Chris Lattner a737721d14 merge two tests
llvm-svn: 121679
2010-12-13 04:45:56 +00:00
Chris Lattner 62cc76e9cc Fix my previous patch to handle a degenerate case that the llvm-gcc
bootstrap buildbot tripped over.

llvm-svn: 121674
2010-12-13 03:43:57 +00:00
Chris Lattner d9bacc088a fix a fairly serious oversight with switch formation from
or'd conditions.  Previously we'd compile something like this:

int crud (unsigned char c) {
   return c == 62 || c == 34 || c == 92;
}

into:

  switch i8 %c, label %lor.rhs [
    i8 62, label %lor.end
    i8 34, label %lor.end
  ]

lor.rhs:                                          ; preds = %entry
  %cmp8 = icmp eq i8 %c, 92
  br label %lor.end

lor.end:                                          ; preds = %entry, %entry, %lor.rhs
  %0 = phi i1 [ true, %entry ], [ %cmp8, %lor.rhs ], [ true, %entry ]
  %lor.ext = zext i1 %0 to i32
  ret i32 %lor.ext

which failed to merge the compare-with-92 into the switch.  With this patch
we simplify this all the way to:

  switch i8 %c, label %lor.rhs [
    i8 62, label %lor.end
    i8 34, label %lor.end
    i8 92, label %lor.end
  ]

lor.rhs:                                          ; preds = %entry
  br label %lor.end

lor.end:                                          ; preds = %entry, %entry, %entry, %lor.rhs
  %0 = phi i1 [ true, %entry ], [ false, %lor.rhs ], [ true, %entry ], [ true, %entry ]
  %lor.ext = zext i1 %0 to i32
  ret i32 %lor.ext

which is much better for codegen's switch lowering stuff.  This kicks in 33 times
on 176.gcc (for example) cutting 103 instructions off the generated code.

llvm-svn: 121671
2010-12-13 03:18:54 +00:00
Benjamin Kramer c4169cebe3 Generalize the and-icmp-select instcombine further by allowing selects of the form
(x & 2^n) ? 2^m+C : C

we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.

llvm-svn: 121609
2010-12-11 10:49:22 +00:00
Benjamin Kramer c8b035d006 Factor the (x & 2^n) ? 2^m : 0 instcombine into its own method and generalize it
to catch cases where n != m with a shift.

llvm-svn: 121608
2010-12-11 09:42:59 +00:00
Chris Lattner bc4457e317 enhance memcpyopt to zap memcpy's that have the same src/dst.
llvm-svn: 121362
2010-12-09 07:45:45 +00:00
Chris Lattner fd51c52ef6 fix PR8753, eliminating a case where we'd infinitely make a
substitution because it doesn't actually change the IR.  Patch by
Jakub Staszak!

llvm-svn: 121361
2010-12-09 07:39:50 +00:00
Dan Gohman a32986e899 Really check that the bits that will become zero are actually already zero
before eliminating the operation that zeros them. This fixes rdar://8739316.

llvm-svn: 121353
2010-12-09 02:52:17 +00:00
Chris Lattner 0d71c4f564 reapply r121100 with a tweak to constant fold ConstExprs with TargetData
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.

llvm-svn: 121111
2010-12-07 04:33:29 +00:00
Eric Christopher f10dcfb9fb Temporarily revert r121100 as it's causing clang to fail
CodeGenCXX/virtual-base-ctor.cpp.

llvm-svn: 121102
2010-12-07 02:41:11 +00:00
Chris Lattner 287f4366c1 fix PR8710 - teach global opt that some constantexprs are too complex to
put in a global variable's initializer.

llvm-svn: 121100
2010-12-07 01:59:32 +00:00
Frits van Bommel d9df6eaa9c Implement jump threading of 'indirectbr' by keeping track of whether we're looking for ConstantInt*s or BlockAddress*s.
llvm-svn: 121066
2010-12-06 23:36:56 +00:00
Chris Lattner 94fbdf3814 Fix PR8728, a miscompilation I recently introduced. When optimizing
memcpy's like:
  memcpy(A, B)
  memcpy(A, C)

we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:

  memcpy(A, B)
  memcpy(A, A)

which is not correct to transform into:

  memcpy(A, A)

This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!

llvm-svn: 120974
2010-12-06 01:48:06 +00:00
Frits van Bommel 8fb69ee805 Teach SimplifyCFG to turn
(indirectbr (select cond, blockaddress(@fn, BlockA),
                            blockaddress(@fn, BlockB)))
into
  (br cond, BlockA, BlockB).

llvm-svn: 120943
2010-12-05 18:29:03 +00:00
Chris Lattner 1c577b54b0 fix a bozo bug I introduced in r119930, causing a miscompile of
20040709-1.c from the gcc testsuite.  I was using the size of a
pointer instead of the pointee.  This fixes rdar://8713376

llvm-svn: 120519
2010-12-01 01:24:55 +00:00
Chris Lattner 903add84d9 Enhance DSE to handle the variable index case in PR8657.
llvm-svn: 120498
2010-11-30 23:43:23 +00:00
Chris Lattner c0f3379ae0 teach DSE to use GetPointerBaseWithConstantOffset to analyze
may-aliasing stores that partially overlap with different base
pointers.  This implements PR6043 and the non-variable part of
PR8657

llvm-svn: 120485
2010-11-30 23:05:20 +00:00
Chris Lattner b63ba73b1b enhance isRemovable to refuse to delete volatile mem transfers
now that DSE hacks on them.  This fixes a regression I introduced,
by generalizing DSE to hack on transfers.

llvm-svn: 120445
2010-11-30 19:12:10 +00:00
Chris Lattner 58b779e9c2 Rewrite the main DSE loop to be written in terms of reasoning
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate.  This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.

This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet.  Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.

llvm-svn: 120406
2010-11-30 07:23:21 +00:00
Anders Carlsson e3ea1cba79 Add a puts optimization that converts puts() to putchar('\n').
llvm-svn: 120398
2010-11-30 06:19:18 +00:00
Anders Carlsson 77e9892afd Fix a typo.
llvm-svn: 120394
2010-11-30 06:03:55 +00:00
Anders Carlsson 631d06bbce Rename this test to FPuts.ll since it actually tests fputs.
llvm-svn: 120393
2010-11-30 05:59:26 +00:00
Chris Lattner 6c7f64e0bc remove a use of llvm-dis
llvm-svn: 120383
2010-11-30 02:04:15 +00:00
Chris Lattner c2e3445273 merge one more away
llvm-svn: 120375
2010-11-30 01:06:43 +00:00
Chris Lattner 7578d0df51 I already merged partial-overwrite.ll -> PartialStore.ll
Merge context-sensitive.ll -> simple.ll and upgrade it.

llvm-svn: 120374
2010-11-30 01:05:07 +00:00
Chris Lattner 43e3a98675 clean up DSE tests, removing some poorly reduced and useless old test,
merging more into other larger .ll files, filecheckizing along the way.

llvm-svn: 120373
2010-11-30 01:00:34 +00:00
Chris Lattner 90c4947df7 enhance basicaa to return "Mod" for a memcpy call when the
queried location doesn't overlap the source, and add a testcase.

llvm-svn: 120370
2010-11-30 00:43:16 +00:00
Chris Lattner 9a146372b5 Teach basicaa that memset's modref set is at worst "mod" and never
contains "ref".

Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.

llvm-svn: 120368
2010-11-30 00:28:45 +00:00
Chris Lattner c3c754f750 my previous patch would cause us to start deleting some volatile
stores, fix and add a testcase.

llvm-svn: 120363
2010-11-30 00:12:39 +00:00
Benjamin Kramer e6840ef4b3 Fix some broken CHECK lines.
llvm-svn: 120332
2010-11-29 22:34:55 +00:00
Chris Lattner 2e8793482c fix PR8677, patch by Jakub Staszak!
llvm-svn: 120325
2010-11-29 21:59:31 +00:00
Frits van Bommel 28218aa8f1 Transform (extractvalue (load P), ...) to (load (gep P, 0, ...)) if the load has no other uses, shrinking the load.
llvm-svn: 120323
2010-11-29 21:56:20 +00:00
Frits van Bommel 40a80ac963 Update this test to keep testing the -instcombine transform it's supposed to be testing instead of triggering the improved constant folding for insertvalue and extractvalue.
llvm-svn: 120319
2010-11-29 20:55:40 +00:00
Frits van Bommel a98214de10 Teach ConstantFoldInstruction() how to fold insertvalue and extractvalue.
llvm-svn: 120316
2010-11-29 20:36:52 +00:00
Nick Lewycky b8de00ee07 Treat a call of function pointer like a load of the pointer when considering
whether the pointer can be replaced with the global variable it is a copy of.
Fixes PR8680.

llvm-svn: 120126
2010-11-24 22:04:20 +00:00
Benjamin Kramer 94a622af4c The srem -> urem transform is not safe for any divisor that's not a power of two.
E.g. -5 % 5 is 0 with srem and 1 with urem.

Also addresses Frits van Bommel's comments.

llvm-svn: 120049
2010-11-23 20:33:57 +00:00
Benjamin Kramer b5afa65b0a InstCombine: Reduce "X shift (A srem B)" to "X shift (A urem B)" iff B is positive.
This allows to transform the rem in "1 << ((int)x % 8);" to an and.

llvm-svn: 120028
2010-11-23 18:52:42 +00:00
Duncan Sands adc7771f18 Exploit distributive laws (eg: And distributes over Or, Mul over Add, etc) in a
fairly systematic way in instcombine.  Some of these cases were already dealt
with, in which case I removed the existing code.  The case of Add has a bunch of
funky logic which covers some of this plus a few variants (considers shifts to be
a form of multiplication), which I didn't touch.  The simplification performed is:
A*B+A*C -> A*(B+C).  The improvement is to do this in cases that were not already
handled [such as A*B-A*C -> A*(B-C), which was reported on the mailing list], and
also to do it more often by not checking for "only one use" if "B+C" simplifies.

llvm-svn: 120024
2010-11-23 14:23:47 +00:00
Chris Lattner e5afa15b77 duncan's spider sense was right, I completely reversed the condition
on this instcombine xform.  This fixes a miscompilation of 403.gcc.

llvm-svn: 119988
2010-11-23 02:42:04 +00:00
Benjamin Kramer f1ebb63161 InstCombine: Implement X - A*-B -> X + A*B.
llvm-svn: 119984
2010-11-22 20:31:27 +00:00
Duncan Sands c133c54426 If a GEP index simply advances by multiples of a type of zero size,
then replace the index with zero.

llvm-svn: 119974
2010-11-22 16:32:50 +00:00
Duncan Sands cf4bceba49 Add a rather pointless InstructionSimplify transform, inspired by recent constant
folding improvements: if P points to a type of size zero, turn "gep P, N" into "P".
More generally, if a gep index type has size zero, instcombine could replace the
index with zero, but that is not done here.

llvm-svn: 119942
2010-11-21 13:53:09 +00:00
Chris Lattner e48c31ce33 implement PR8576, deleting dead stores with intervening may-alias stores.
llvm-svn: 119927
2010-11-21 07:34:32 +00:00
Chris Lattner 6e22221b37 file checkize
llvm-svn: 119926
2010-11-21 07:32:40 +00:00
Chris Lattner f7e896138e optimize:
void a(int x) { if (((1<<x)&8)==0) b(); }

into "x != 3", which occurs over 100 times in 403.gcc but in no
other program in llvm-test.

llvm-svn: 119922
2010-11-21 06:44:42 +00:00
Chris Lattner 58f9f58716 Implement PR8644: forwarding a memcpy value to a byval,
allowing the memcpy to be eliminated.

Unfortunately, the requirements on byval's without explicit 
alignment are really weak and impossible to predict in the 
mid-level optimizer, so this doesn't kick in much with current
frontends.  The fix is to change clang to set alignment on all
byval arguments.

llvm-svn: 119916
2010-11-21 00:28:59 +00:00
Owen Anderson c0177aeb71 Add a test for CodeGenPrepare's ability to look through PHI nodes when performing
addressing mode folding, introduced in r119853.

llvm-svn: 119857
2010-11-19 22:34:53 +00:00
Duncan Sands aef146b890 Factor code for testing whether replacing one value with another
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class.  Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form.  Fixes PR8622.

llvm-svn: 119727
2010-11-18 19:59:41 +00:00
Owen Anderson c21c100f3d Completely rework the datastructure GVN uses to represent the value number to leader mapping. Previously,
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed.  This led to really bad performance on tall, narrow CFGs.

We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree.  Because there are typically few duplicates of a given value, this scan tends to be
quite fast.  Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.

This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.

The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack.  If you see conditional propagation that's not happening, please file bugs against LVI or CVP.

llvm-svn: 119714
2010-11-18 18:32:40 +00:00
Dan Gohman 2e1fc849b2 Add support for PHI-translating sext, zext, and trunc instructions,
enabling more PRE. PR8586.

llvm-svn: 119704
2010-11-18 17:05:13 +00:00
Chris Lattner 731caac7c6 remove a pointless restriction from memcpyopt. It was
refusing to optimize two memcpy's like this:

copy A <- B
copy C <- A

if it couldn't prove that noalias(B,C).  We can eliminate
the copy by producing a memmove instead of memcpy.

llvm-svn: 119694
2010-11-18 08:00:57 +00:00
Chris Lattner bbb0f9661d filecheckize, this is still not optimal, see PR8643
llvm-svn: 119693
2010-11-18 07:49:32 +00:00
Chris Lattner ac5701319b allow eliminating an alloca that is just copied from an constant global
if it is passed as a byval argument.  The byval argument will just be a
read, so it is safe to read from the original global instead.  This allows
us to promote away the %agg.tmp alloca in PR8582

llvm-svn: 119686
2010-11-18 06:41:51 +00:00
Chris Lattner f183d5c4be enhance the "alloca is just a memcpy from constant global"
to ignore calls that obviously can't modify the alloca
because they are readonly/readnone.

llvm-svn: 119683
2010-11-18 06:26:49 +00:00
Chris Lattner 7aeae25c78 fix a small oversight in the "eliminate memcpy from constant global"
optimization.  If the alloca that is "memcpy'd from constant" also has
a memcpy from *it*, ignore it: it is a load.  We now optimize the testcase to:

define void @test2() {
  %B = alloca %T
  %a = bitcast %T* @G to i8*
  %b = bitcast %T* %B to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false)
  call void @bar(i8* %b)
  ret void
}

previously we would generate:

define void @test() {
  %B = alloca %T
  %b = bitcast %T* %B to i8*
  %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0
  %tmp3 = load i8* %G.0, align 4
  %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1
  %G.15 = bitcast [123 x i8]* %G.1 to i8*
  %1 = bitcast [123 x i8]* %G.1 to i984*
  %srcval = load i984* %1, align 1
  %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0
  store i8 %tmp3, i8* %B.0, align 4
  %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1
  %B.12 = bitcast [123 x i8]* %B.1 to i8*
  %2 = bitcast [123 x i8]* %B.1 to i984*
  store i984 %srcval, i984* %2, align 1
  call void @bar(i8* %b)
  ret void
}

llvm-svn: 119682
2010-11-18 06:20:47 +00:00
Chris Lattner 9434184142 filecheckize
llvm-svn: 119681
2010-11-18 06:16:43 +00:00
Benjamin Kramer 07726c7d52 InstCombine: Add a missing irem identity (X % X -> 0).
llvm-svn: 119538
2010-11-17 19:11:46 +00:00
Duncan Sands 5ffc298bc7 In which I discover the existence of loops. Threading an operation
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this).  So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.

llvm-svn: 119347
2010-11-16 12:16:38 +00:00
Duncan Sands f12ba1dfe1 Teach InstructionSimplify the trick of skipping incoming phi
values that are equal to the phi itself.

llvm-svn: 119161
2010-11-15 17:52:45 +00:00
Duncan Sands 01157b027b Move PHI tests to phi.ll, out of select.ll.
llvm-svn: 119153
2010-11-15 16:43:28 +00:00
Duncan Sands 4581ddc123 Teach InstructionSimplify about phi nodes. I chose to have it simply
offload the work to hasConstantValue rather than do something more
complicated (such handling mutually recursive phis) because (1) it is
not clear it is worth it; and (2) if it is worth it, maybe such logic
would be better placed in hasConstantValue.  Adjust some GVN tests
which are now cleaned up much further (eg: all phi nodes are removed).

llvm-svn: 119043
2010-11-14 13:30:18 +00:00
Chris Lattner 8d22ff90a3 rename test.
llvm-svn: 119033
2010-11-14 07:03:49 +00:00
Chris Lattner befbf975e8 filecheckize, remove an old and useless test
llvm-svn: 119032
2010-11-14 07:03:38 +00:00
Chris Lattner 4e44cb8bcb this test is pretty pointless and "propogation" isn't a word (or so Misha claims).
llvm-svn: 119031
2010-11-14 07:02:03 +00:00
Duncan Sands 8c58ba4199 Testcase to go along with commit 118923 ("Have GVN simplify instructions
as it goes").  Before -std-compile-opts only got it down to
  %a = tail call i32 @foo(i32 0) readnone
  %x = tail call i32 @foo(i32 %a) readnone
  %y = tail call i32 @foo(i32 %a) readnone
  %z = icmp eq i32 %x, %y
  ret i1 %z
while now -basicaa -gvn alone reduce it to
  %a = call i32 @foo(i32 0) readnone
  %x = call i32 @foo(i32 %a) readnone
  ret i1 true

llvm-svn: 119009
2010-11-13 21:33:19 +00:00
Duncan Sands 641baf1646 Generalize the reassociation transform in SimplifyCommutative (now renamed to
SimplifyAssociativeOrCommutative) "(A op C1) op C2" -> "A op (C1 op C2)",
which previously was only done if C1 and C2 were constants, to occur whenever
"C1 op C2" simplifies (a la InstructionSimplify).  Since the simplifying operand
combination can no longer be assumed to be the right-hand terms, consider all of
the possible permutations.  When compiling "gcc as one big file", transform 2
(i.e. using right-hand operands) fires about 4000 times but it has to be said
that most of the time the simplifying operands are both constants.  Transforms
3, 4 and 5 each fired once.  Transform 6, which is an existing transform that
I didn't change, never fired.  With this change, the testcase is now optimized
perfectly with one run of instcombine (previously it required instcombine +
reassociate + instcombine, and it may just have been luck that this worked).

llvm-svn: 119002
2010-11-13 15:10:37 +00:00
Dan Gohman d4b7fff2e8 Enhance DSE to handle the case where a free call makes more than
one store dead. This is especially noticeable in
SingleSource/Benchmarks/Shootout/objinst.

llvm-svn: 118875
2010-11-12 02:19:17 +00:00
Dan Gohman 620c38f030 Filecheckize.
llvm-svn: 118874
2010-11-12 02:02:39 +00:00
Dan Gohman a826a88755 Factor out Instruction::isSafeToSpeculativelyExecute's code for
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer.  Teach it how to reason about GEPs
with simple non-zero indices.

Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.

llvm-svn: 118840
2010-11-11 21:23:25 +00:00
Dan Gohman 0a6021a54d Enhance GVN to do more precise alias queries for non-local memory
references. For example, this allows gvn to eliminate the load in
this example:

  void foo(int n, int* p, int *q) {
    p[0] = 0;
    p[1] = 1;
    if (n) {
      *q = p[0];
    }
  }

llvm-svn: 118714
2010-11-10 20:37:15 +00:00
Duncan Sands f3b1bf1606 Teach InstructionSimplify how to look through PHI nodes. Since PHI
nodes can be used in loops, this could result in infinite looping
if there is no recursion limit, so add such a limit.  It is also
used for the SelectInst case because in theory there could be an
infinite loop there too if the basic block is unreachable.

llvm-svn: 118694
2010-11-10 18:23:01 +00:00
Dale Johannesen 0171dc30ff When checking that the necessary bits are zero in
order to reduce ((x<<30)>>24) to x<<6, check the
correct bits.  PR 8547.

llvm-svn: 118665
2010-11-10 01:30:56 +00:00
Dan Gohman 2694e14087 Make ModRefBehavior a lattice. Use this to clean up AliasAnalysis
chaining and simplify FunctionAttrs' GetModRefBehavior logic.

llvm-svn: 118660
2010-11-10 01:02:18 +00:00
Duncan Sands 52be6b41d3 Add an additional test for icmp of select folding.
llvm-svn: 118441
2010-11-08 20:56:28 +00:00
Dan Gohman 9130bad71f Extend the AliasAnalysis::pointsToConstantMemory interface to allow it
to optionally look for constant or local (alloca) memory.

Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.

Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.

llvm-svn: 118412
2010-11-08 16:45:26 +00:00
Dan Gohman 86449d705a Make FunctionAttrs use AliasAnalysis::getModRefBehavior, now that it
knows about intrinsic functions.

llvm-svn: 118410
2010-11-08 16:10:15 +00:00
Duncan Sands a620bd1fa3 Add simplification of floating point comparisons with the result
of a select instruction, the same as already exists for integer
comparisons.

llvm-svn: 118379
2010-11-07 16:46:25 +00:00
Duncan Sands f532d31198 Fix a README item: when doing a comparison with the result
of a select instruction, see if doing the compare with the
true and false values of the select gives the same result.
If so, that can be used as the value of the comparison.

llvm-svn: 118378
2010-11-07 16:12:23 +00:00
Owen Anderson 6186c96765 When folding away a (shl (shr)) pair, we need to check that the bits that will BECOME the low
bits are zero, not that the current low bits are zero.  Fixes <rdar://problem/8606771>.

llvm-svn: 117953
2010-11-01 21:08:20 +00:00
Duncan Sands b8f3b14dfb If a function does a volatile load from a global constant, do not
consider it to be readonly.  In fact, don't even consider it to be
readonly if it does a volatile load from an AllocaInst either (it
is debatable as to whether readonly would be correct or not in this
case; play safe for the moment).  This fixes PR8279.

llvm-svn: 117783
2010-10-30 12:59:44 +00:00
Bob Wilson 11ee456e23 Change instcombine's getShuffleMask to represent undef with negative values.
This code had previously used 2*N, where N is the mask length, to represent
undef.  That is not safe because the shufflevector operands may have more
than N elements -- they don't have to match the result type.

llvm-svn: 117721
2010-10-29 22:03:05 +00:00
Bob Wilson cb11b48e7a Make instcombine a little more aggressive in combining vector shuffles.
Allow splats even if they don't match either of the original shuffles,
possibly due to undef entries in the shuffles masks.  Radar 8597790.
Also fix some 80-column violations.

llvm-svn: 117719
2010-10-29 22:02:50 +00:00
Owen Anderson 8e7b73743d Update testcase since we're no longer doing the constant forwarding inline with correlated value propagation.
llvm-svn: 117712
2010-10-29 21:18:23 +00:00
NAKAMURA Takumi 959807fa37 test/Transforms/SimplifyLibCalls/floor.ll: Mark as XFAIL:win32 due to lack of nearbyintf on MSVC. [PR8466]
llvm-svn: 117529
2010-10-28 06:46:04 +00:00
Dale Johannesen 16bb87a90e Teach InstCombine not to use Add and Neg on FP. PR 8490.
llvm-svn: 117510
2010-10-27 23:45:18 +00:00
Dan Gohman 2e20dfb0f2 Fix a case where instcombine was stripping metadata (and alignment)
from stores when folding in bitcasts.

llvm-svn: 117265
2010-10-25 16:16:27 +00:00
Duncan Sands 31c803b2ba Fix PR8445: a block with no predecessors may be the entry block, in which case
it isn't unreachable and should not be zapped.  The check for the entry block
was missing in one case: a block containing a unwind instruction.  While there,
do some small cleanups: "M" is not a great name for a Function* (it would be
more appropriate for a Module*), change it to "Fn"; use Fn in more places.

llvm-svn: 117224
2010-10-24 12:23:30 +00:00
Bob Wilson a4e231c880 Teach instcombine to set the alignment arguments for NEON load/store intrinsics.
llvm-svn: 117154
2010-10-22 21:41:48 +00:00
Mikhail Glushenkov 2072db24ed GlobalOpt: EvaluateFunction() must not evaluate stores to weak_odr globals.
Fixes PR8389.

llvm-svn: 116812
2010-10-19 16:47:23 +00:00
Dan Gohman 02538ac4d3 Make BasicAliasAnalysis a normal AliasAnalysis implementation which
does normal initialization and normal chaining. Change the default
AliasAnalysis implementation to NoAlias.

Update StandardCompileOpts.h and friends to explicitly request
BasicAliasAnalysis.

Update tests to explicitly request -basicaa.

llvm-svn: 116720
2010-10-18 18:04:47 +00:00
Owen Anderson 18e4fed3fa Generalize MemCpyOpt's handling of call slot forwarding to function properly when the call slot
forwarding is implemented with a load/store pair rather than a memcpy.

llvm-svn: 116637
2010-10-15 22:52:12 +00:00
Chris Lattner b9681ad442 fix a bug I introduced, no idea how this didn't repro right.
llvm-svn: 116462
2010-10-14 00:30:00 +00:00
Chris Lattner c7bd5740eb hack to unbreak buildbots
llvm-svn: 116461
2010-10-14 00:26:10 +00:00
Chris Lattner 698661c741 add uadd_ov/usub_ov to apint, consolidate constant folding
logic to use the new APInt methods.  Among other things this
implements rdar://8501501 - llvm.smul.with.overflow.i32 should constant fold

which comes from "clang -ftrapv", originally brought to my attention from PR8221.

llvm-svn: 116457
2010-10-14 00:05:07 +00:00
Kenneth Uildriks b8d7efe785 Now using a variant of the existing inlining heuristics to decide whether to create a given specialization of a function in PartialSpecialization. If the total performance bonus across all callsites passing the same constant exceeds the specialization cost, we create the specialization.
llvm-svn: 116158
2010-10-09 22:06:36 +00:00
Devang Patel 57da4caa85 Remove LoopIndexSplit pass. It is neither maintained nor used by anyone.
llvm-svn: 116004
2010-10-07 23:29:37 +00:00
Owen Anderson 13a642da0b Now that the profitable bits of EnableFullLoadPRE have been enabled by default, rip out the remainder.
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.

llvm-svn: 115337
2010-10-01 20:02:55 +00:00
Chris Lattner c663a67384 fix PR8267 - Instcombine shouldn't optimizer away volatile memcpy's.
llvm-svn: 115296
2010-10-01 05:51:02 +00:00
Chris Lattner 98e82678bd upgrade this test.
llvm-svn: 115295
2010-10-01 05:47:16 +00:00
Owen Anderson 3170a25a84 We do want to allow LoadPRE to perform LICM-like transformations: we already consider PHI nodes to be negligible for
code size (making this transform code size neutral), and it allows us to hoist values out of loops, which is always
a good thing.

llvm-svn: 115205
2010-09-30 20:53:04 +00:00
Benjamin Kramer 2b76c66fd6 Add constant folding for strspn and strcspn to SimplifyLibCalls.
llvm-svn: 115116
2010-09-30 00:58:35 +00:00
Benjamin Kramer 38d22f69fc Add strpbrk folding to SimplifyLibCalls.
llvm-svn: 115111
2010-09-29 23:52:12 +00:00
Benjamin Kramer 8e861d7eee Simplify the loop in StrChrOptimizer. FileCheckize test.
llvm-svn: 115095
2010-09-29 22:29:12 +00:00
Benjamin Kramer 824645abc9 Teach SimplifyLibCalls how to optimize strrchr.
llvm-svn: 115091
2010-09-29 21:50:51 +00:00
Owen Anderson 99c985c37d Fix PR8247: JumpThreading can cause a block to become unreachable while still having predecessor, if it is part of a self-loop.
Because of this, we cannot use the Simplify* APIs, as they can assert-fail on unreachable code.  Since it's not easy to determine
if a given threading will cause a block to become unreachable, simply defer simplifying simplification to later InstCombine and/or
DCE passes.

llvm-svn: 115082
2010-09-29 20:34:41 +00:00
Jakob Stoklund Olesen 1083573796 Don't try to constant fold libm functions with non-finite arguments.
Usually we wouldn't do this anyway because llvm_fenv_testexcept would return an
exception, but we have seen some cases where neither errno nor fenv detect an
exception on arm-linux.

llvm-svn: 114893
2010-09-27 21:29:20 +00:00
Owen Anderson b590a927cd LoadPRE was not properly checking that the load it was PRE'ing post-dominated the block it was being hoisted to.
Splitting critical edges at the merge point only addressed part of the issue; it is also possible for non-post-domination
to occur when the path from the load to the merge has branches in it.  Unfortunately, full anticipation analysis is
time-consuming, so for now approximate it.  This is strictly more conservative than real anticipation, so we will miss
some cases that real PRE would allow, but we also no longer insert loads into paths where they didn't exist before. :-)

This is a very slight net positive on SPEC for me (0.5% on average).  Most of the benchmarks are largely unaffected, but
when it pays off it pays off decently: 181.mcf improves by 4.5% on my machine.

llvm-svn: 114785
2010-09-25 05:26:18 +00:00
Jakob Stoklund Olesen 9d06adcf88 Be more precise when trying to XFAIL this tester: http://google1.osuosl.org:8011/builders/llvm-arm-linux
llvm-svn: 114755
2010-09-24 20:34:49 +00:00
Dan Gohman 49c15c0f9f Attempt to XFAIL this test on arm-linux, which is inexplicably failing.
llvm-svn: 114241
2010-09-18 00:04:37 +00:00
Dan Gohman f3a9c464b4 Fix this test to avoid an "inexact" fold.
llvm-svn: 114202
2010-09-17 20:25:43 +00:00
Dan Gohman 695312637c Fix this test so that folding doesn't depend on a potentially
"inexact" result.

llvm-svn: 114198
2010-09-17 20:15:53 +00:00
Dan Gohman 18fa17cf3d Fix the folding of floating-point math library calls, like sin(infinity),
so that it detects errors on platforms where libm doesn't set errno.
It's still subject to host libm details though.

llvm-svn: 114148
2010-09-17 01:38:06 +00:00
Owen Anderson 20154b3ed4 Add missing RUN line to this test.
llvm-svn: 114106
2010-09-16 18:46:23 +00:00
Owen Anderson 140296f5c0 It is possible, under specific circumstances involving ptrtoint ConstantExpr's, for LVI to end up trying to merge
a Constant into a ConstantRange.  Handle this conservatively for now, rather than asserting.  The testcase is
more complex that I would like, but the manifestation of the problem is sensitive to iteration orders and the state of the
LVI cache, and I have not been able to reproduce it with manually constructed or simplified cases.

Fixes PR8162.

llvm-svn: 114103
2010-09-16 18:28:33 +00:00
Owen Anderson 94532cb297 Fix PR8161, in which an unreachable loop causes recursive instruction simplification to try
to replace an instruction with itself.  Add a predicate to the simplifier to prevent this case.

llvm-svn: 114097
2010-09-16 17:42:36 +00:00
Chris Lattner 67e534505d fix PR8144, a bug where constant merge would merge globals marked
attribute(used).

llvm-svn: 113911
2010-09-15 00:30:11 +00:00
Owen Anderson f6fe577e88 Remove dead option from tests.
llvm-svn: 113855
2010-09-14 21:03:40 +00:00
Chris Lattner f1144f0929 fix PR8102, a case where we'd copyValue from a value that we already
deleted.  Fix this by doing the copyValue's before we delete stuff!

The testcase only repros the problem on my system with valgrind.

llvm-svn: 113820
2010-09-14 00:19:00 +00:00
Owen Anderson fe12f23024 Add a reduced testcase for the infinite loop fixed in r113763.
llvm-svn: 113770
2010-09-13 18:28:40 +00:00
Owen Anderson c237a849e3 Re-apply r113679, which was reverted in r113720, which added a paid of new instcombine transforms
to expose greater opportunities for store narrowing in codegen.  This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.

llvm-svn: 113763
2010-09-13 17:59:27 +00:00
Eric Christopher 26abd3e0c2 Revert 113679, it was causing an infinite loop in a testcase that I've sent
on to Owen.

llvm-svn: 113720
2010-09-12 06:09:23 +00:00
Owen Anderson 70f4524427 Invert and-of-or into or-of-and when doing so would allow us to clear bits of the and's mask.
This can result in increased opportunities for store narrowing in code generation.  Update a number of
tests for this change.  This fixes <rdar://problem/8285027>.

Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally.  Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order 
of the two ors, to give the non-constant or a chance for simplification instead.

llvm-svn: 113679
2010-09-11 05:48:06 +00:00
Benjamin Kramer 8c35fb0739 Teach InstructionSimplify to fold (A & B) & A -> A & B and (A | B) | A -> A | B.
Reassociate does this but it doesn't catch all cases (e.g. if the operands are i1).

llvm-svn: 113651
2010-09-10 22:39:55 +00:00
Owen Anderson 6270515918 Revert r113439, which relaxed the requirement that loops containing calls cannot be unrolled. After some discussion,
there seems to be a better way to achieve the same effect.

llvm-svn: 113528
2010-09-09 20:02:23 +00:00
Owen Anderson 8084dbaf8e Relax the "don't unroll loops containing calls" rule. Instead, when a loop contains a call, lower the
unrolling threshold to the optimize-for-size threshold.  Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.

llvm-svn: 113439
2010-09-08 23:10:07 +00:00
Owen Anderson 3fe002dfb5 Generalize instcombine's support for combining multiple bit checks into a single test. Patch by Dirk Steinke!
llvm-svn: 113423
2010-09-08 22:16:17 +00:00
Chris Lattner 6e27b3e004 Fix a serious performance regression introduced by r108687 on linux:
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x)  is great, but we have
to delete the original sqrt as well.  Not doing so causes us to do 
two sqrt's when building with -fmath-errno (the default on linux).

llvm-svn: 113260
2010-09-07 20:01:38 +00:00
Chris Lattner 29570bd695 rename test.
llvm-svn: 113257
2010-09-07 19:57:06 +00:00
Chris Lattner be9019090e fix PR8067, an over-aggressive assertion in LICM.
llvm-svn: 113146
2010-09-06 05:11:24 +00:00
Chris Lattner b01c24a945 Teach loop rotate to hoist trivially invariant instructions
in the duplicated block instead of duplicating them.  

Duplicating them into the end of the loop and the preheader 
means that we got a phi node in the header of the loop, 
which prevented LICM from hoisting them.  GVN would
usually come around later and merge the duplicated 
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted.  In PR5319 (which this fixes), a memory value
didn't get promoted.

llvm-svn: 113134
2010-09-06 01:10:22 +00:00
Chris Lattner 72d283c826 fix PR8063, a crash in globalopt in the malloc analysis code.
llvm-svn: 113109
2010-09-05 17:20:46 +00:00
Dan Gohman 487e250109 Fix LoopSimplify to notify ScalarEvolution when splitting a loop backedge
into an inner loop, as the new loop iteration may differ substantially.
This fixes PR8078.

llvm-svn: 113057
2010-09-04 02:42:48 +00:00
Chris Lattner 50506787d1 fix a bug in my licm rewrite when a load from the promoted memory
location is being re-stored to the memory location.  We would get
a dangling pointer from the SSAUpdate data structure and miss a 
use.  This fixes PR8068

llvm-svn: 113042
2010-09-04 00:12:30 +00:00
Owen Anderson c91c1a205a Propagate non-local comparisons. Fixes PR1757.
llvm-svn: 113025
2010-09-03 22:47:08 +00:00
Owen Anderson c725462245 Add support for simplifying a load from a computed value to a load from a global when it
is provable that they're equivalent.  This fixes PR4855.

llvm-svn: 112994
2010-09-03 19:08:37 +00:00
Owen Anderson 064cb4c807 Add a test for PR4413, which was apparently fixed at some point in the past.
llvm-svn: 112987
2010-09-03 18:33:08 +00:00
Owen Anderson 50d8c8888c Add PR number to test.
llvm-svn: 112971
2010-09-03 16:58:25 +00:00
Chris Lattner 65fb25a257 more test cleanup
llvm-svn: 112892
2010-09-02 22:38:56 +00:00
Chris Lattner bb451461ec remove some noise from tests.
llvm-svn: 112889
2010-09-02 22:35:33 +00:00
Chris Lattner affc0e42f0 fix more AST updating bugs, correcting miscompilation in PR8041
llvm-svn: 112878
2010-09-02 22:19:10 +00:00
Owen Anderson 67dee4dcac Fix typo. I accidentally edited the wrong file before my last commit.
llvm-svn: 112851
2010-09-02 19:52:06 +00:00
Owen Anderson a8c896b704 Fix a bug in LazyValueInfo that CorrelatedValuePropagation exposed: In the LVI lattice, undef and the full set ConstantRange should not
be treated as equivalent.

llvm-svn: 112843
2010-09-02 18:23:58 +00:00
Duncan Sands 8dda07428a Print the number of uses of a function in the .ll since it can be informative
and there seems to be no reason not to.

llvm-svn: 112812
2010-09-02 08:52:23 +00:00
Chris Lattner 8af45a889d deepen my MMX/SRoA hack to avoid hurting non-x86 codegen.
llvm-svn: 112763
2010-09-01 23:09:27 +00:00
Dan Gohman 0ad7d9c24e Fix loop unswitching's assumption that a code path which either
infinite loops or exits will eventually exit. This fixes PR5373.

llvm-svn: 112745
2010-09-01 21:46:45 +00:00
Bill Wendling 6456efaffd The output of opt -stats must be sent to stderr. Patch by NAKAMURA Takumi!
llvm-svn: 112724
2010-09-01 18:32:56 +00:00
Chris Lattner 34e5361eb5 add a gross hack to work around a problem that Argiris reported
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.

In the short term, force off MMX datatypes.  In the long term,
the X86 backend should not select generic vector types to MMX
registers.  This is being worked on, but won't be done in time
for 2.8.  rdar://8380055

llvm-svn: 112696
2010-09-01 05:14:33 +00:00
Chris Lattner b9ed4f252f filecheckize
llvm-svn: 112695
2010-09-01 05:10:14 +00:00
Chris Lattner 030f02021b licm is wasting time hoisting constant foldable operations,
instead of hoisting them, just fold them away.  This occurs in the
testcase for PR8041, for example.

llvm-svn: 112669
2010-08-31 23:00:16 +00:00
Owen Anderson a5e6b3eca4 Merge 2010-08-31-InfiniteRecursion.ll into crash.ll.
llvm-svn: 112635
2010-08-31 20:27:17 +00:00
Owen Anderson 799a08ae48 Add a test for the duplicated-conditional situation illutrated by PR5652.
llvm-svn: 112621
2010-08-31 18:49:12 +00:00
Chris Lattner e2295f1c80 merge two tests.
llvm-svn: 112617
2010-08-31 18:44:03 +00:00
Owen Anderson 3931c85956 Manually reduce this testcase.
llvm-svn: 112615
2010-08-31 18:16:29 +00:00
Chris Lattner fbcd165b59 merge two tests and convert to filecheck.
llvm-svn: 112613
2010-08-31 18:05:08 +00:00
Owen Anderson ada0623725 Add a micro-test for the transforms I added to JumpThreading.
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
   ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
   can't handle.  Since it already handles all the cases without other instructions in the def-use chain
   between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
   as well.

llvm-svn: 112611
2010-08-31 17:59:07 +00:00
Owen Anderson 064b139c8d Rename test directory to reflect new pass name.
llvm-svn: 112592
2010-08-31 07:50:31 +00:00
Owen Anderson 48d58ad64c Rename ValuePropagation to a more descriptive CorrelatedValuePropagation.
llvm-svn: 112591
2010-08-31 07:48:34 +00:00
Owen Anderson 3997a07fb9 More Chris-inspired JumpThreading fixes: use ConstantExpr to correctly constant-fold undef, and be more careful with its return value.
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before.  This patch adds a tracking set to prevent this case.

llvm-svn: 112589
2010-08-31 07:36:34 +00:00
Owen Anderson 376597c13e Remove r111665, which implemented store-narrowing in InstCombine. Chris discovered a miscompilation in it, and it's not easily
fixable at the optimizer level. I'll investigate reimplementing it in DAGCombine.

llvm-svn: 112575
2010-08-31 04:41:06 +00:00
Owen Anderson 70b17c50e2 Combine these two tests, and make sure there's a newline at the end of the file.
llvm-svn: 112554
2010-08-30 23:37:41 +00:00
Duncan Sands 68c30907cc Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Chris Lattner 263f804699 LICM does get dead instructions input to it. Instead of sinking them
out of loops, just delete them.

llvm-svn: 112451
2010-08-29 18:22:25 +00:00
Chris Lattner 504e5100d3 remove the ABCD and SSI passes. They don't have any clients that
I'm aware of, aren't maintained, and LVI will be replacing their value.
nlewycky approved this on irc.

llvm-svn: 112355
2010-08-28 03:51:24 +00:00
Chris Lattner d0214f3efe handle the constant case of vector insertion. For something
like this:

struct S { float A, B, C, D; };

struct S g;
struct S bar() { 
  struct S A = g;
  ++A.B;
  A.A = 42;
  return A;
}

we now generate:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	pshufd	$16, %xmm2, %xmm2
	movss	LCPI1_1(%rip), %xmm0
	pshufd	$16, %xmm0, %xmm0
	unpcklps	%xmm2, %xmm0
	ret

instead of:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	movd	%xmm2, %eax
	shlq	$32, %rax
	addq	$1109917696, %rax       ## imm = 0x42280000
	movd	%rax, %xmm0
	ret

llvm-svn: 112345
2010-08-28 01:50:57 +00:00
Chris Lattner dd6601048e optimize bitcasts from large integers to vector into vector
element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi.  We now compile something like
this:

struct S { float A, B, C, D; };
struct S g;
struct S bar() { 
  struct S A = g;
  ++A.A;
  ++A.C;
  return A;
}

into all nice vector operations:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	12(%rax), %xmm3
	pshufd	$16, %xmm2, %xmm2
	unpcklps	%xmm2, %xmm0
	addss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	pshufd	$16, %xmm3, %xmm2
	unpcklps	%xmm2, %xmm1
	ret

instead of icky integer operations:

_bar:                                   ## @bar
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %ecx
	movl	4(%rax), %edx
	movl	12(%rax), %esi
	shlq	$32, %rdx
	addq	%rcx, %rdx
	movd	%rdx, %xmm0
	addss	8(%rax), %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rsi
	addq	%rax, %rsi
	movd	%rsi, %xmm1
	ret

This resolves rdar://8360454

llvm-svn: 112343
2010-08-28 01:20:38 +00:00
Owen Anderson cf7f941121 Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.

llvm-svn: 112325
2010-08-27 23:31:36 +00:00
Chris Lattner 954e9557e3 tidy up test.
llvm-svn: 112321
2010-08-27 23:15:21 +00:00
Chris Lattner 6c1395f62a Enhance the shift propagator to handle the case when you have:
A = shl x, 42
...
B = lshr ..., 38

which can be transformed into:
A = shl x, 4
...

iff we can prove that the would-be-shifted-in bits
are already zero.  This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.

llvm-svn: 112314
2010-08-27 22:53:44 +00:00
Chris Lattner 18d7fc8fc6 Implement a pretty general logical shift propagation
framework, which is good at ripping through bitfield
operations.  This generalize a bunch of the existing
xforms that instcombine does, such as 
  (x << c) >> c -> and
to handle intermediate logical nodes.  This is useful for
ripping up the "promote to large integer" code produced by
SRoA.

llvm-svn: 112304
2010-08-27 22:24:38 +00:00
Chris Lattner 606b76eba6 merge and filecheckize test
llvm-svn: 112289
2010-08-27 20:44:45 +00:00
Chris Lattner c665156e9f merge two tests
llvm-svn: 112288
2010-08-27 20:42:10 +00:00
Chris Lattner 7398434675 teach the truncation optimization that an entire chain of
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.

llvm-svn: 112285
2010-08-27 20:32:06 +00:00
Chris Lattner 90cd746e63 Add an instcombine to clean up a common pattern produced
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:

   %94 = zext i16 %93 to i32                       ; <i32> [#uses=2]
   %96 = lshr i32 %94, 8                           ; <i32> [#uses=1]
   %101 = trunc i32 %96 to i8                      ; <i8> [#uses=1]

This also unblocks other xforms from happening, now clang is able to compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm2
	addss	%xmm0, %xmm2
	movdqa	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	pshufd	$1, %xmm1, %xmm0
	addss	%xmm3, %xmm0
	ret

on x86-64, instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

This seems pretty close to optimal to me, at least without
using horizontal adds.  This also triggers in lots of other
code, including SPEC.

llvm-svn: 112278
2010-08-27 18:31:05 +00:00
Owen Anderson 6ebbd92380 Use LVI to eliminate conditional branches where we've tested a related condition previously. Update tests for this change.
This fixes PR5652.

llvm-svn: 112270
2010-08-27 17:12:29 +00:00
Chris Lattner c188b96bbe filecheckize
llvm-svn: 112235
2010-08-26 22:23:39 +00:00
Chris Lattner 387d6bcdcb rename test.
llvm-svn: 112234
2010-08-26 22:20:47 +00:00
Chris Lattner bfd2228182 optimize "integer extraction out of the middle of a vector" as produced
by SRoA.  This is part of rdar://7892780, but needs another xform to
expose this.

llvm-svn: 112232
2010-08-26 22:14:59 +00:00
Chris Lattner d4ebd6df5a optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x'
is a vector to be a vector element extraction.  This allows clang to
compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movd	%eax, %xmm0
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	addss	%xmm2, %xmm1
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm1, %xmm0
	ret

... eliminating half of the horribleness.

llvm-svn: 112227
2010-08-26 21:55:42 +00:00
Chris Lattner 3c19d3d5c3 filecheckize
llvm-svn: 112225
2010-08-26 21:51:41 +00:00
Chris Lattner 7717c616bd rename test
llvm-svn: 112224
2010-08-26 21:50:56 +00:00
Owen Anderson bd2ecc7e68 Make JumpThreading smart enough to properly thread StrSwitch when it's compiled with clang++.
llvm-svn: 112198
2010-08-26 17:40:24 +00:00
Devang Patel 01262e129e DIGlobalVariable can be used to encode debug info for globals that are directly folded into a constant by FE.
llvm-svn: 112072
2010-08-25 18:52:02 +00:00
Owen Anderson 4afea9e3c6 In the default address space, any GEP off of null results in a trap value if you try to load it. Thus,
any load in the default address space that completes implies that the base value that it GEP'd from
was not null.

llvm-svn: 112015
2010-08-25 01:16:47 +00:00
Owen Anderson 84c29a096b Re-apply r111568 with a fix for the clang self-host.
llvm-svn: 111665
2010-08-20 18:24:43 +00:00
Owen Anderson 3323651ec7 Previous revert failed to remove this file.
llvm-svn: 111582
2010-08-19 23:45:15 +00:00
Owen Anderson 43057cd56a Revert r111568 to unbreak clang self-host.
llvm-svn: 111571
2010-08-19 23:25:16 +00:00
Owen Anderson bb723b228a When a set of bitmask operations, typically from a bitfield initialization, only modifies the low bytes of a value,
we can narrow the store to only over-write the affected bytes.

llvm-svn: 111568
2010-08-19 22:15:40 +00:00
Kenneth Uildriks d4b6ab9888 Fixed and reactivated a partial specialization test
llvm-svn: 111516
2010-08-19 12:42:38 +00:00
Chris Lattner 3c603024bb Fix PR7755: knowing something about an inval for a pred
from the LHS should disable reconsidering that pred on the
RHS.  However, knowing something about the pred on the RHS
shouldn't disable subsequent additions on the RHS from
happening.

llvm-svn: 111349
2010-08-18 03:14:36 +00:00
Eric Christopher 51edc7b7e1 Temporarily revert r110987 as it's causing some miscompares in
vector heavy code.  I'll re-enable when we've tracked down the problem.

llvm-svn: 111318
2010-08-17 22:55:27 +00:00
Dan Gohman 5047ca0c02 When rotating loops, put the original header at the bottom of the
loop, making the resulting loop significantly less ugly.  Also, zap
its trivial PHI nodes, since it's easy.

llvm-svn: 111255
2010-08-17 17:39:21 +00:00
Dan Gohman 250b754428 Instead, teach SimplifyCFG to trim non-address-taken blocks from
indirectbr destination lists.

llvm-svn: 111122
2010-08-16 14:41:14 +00:00
Dan Gohman aa445c0751 LoopSimplify shouldn't split loop backedges that use indirectbr. PR7867.
llvm-svn: 111061
2010-08-14 00:43:09 +00:00
Dan Gohman 4a63fad976 Teach SimplifyCFG how to simplify indirectbr instructions.
- Eliminate redundant successors.
 - Convert an indirectbr with one successor into a direct branch.

Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.

llvm-svn: 111060
2010-08-14 00:29:42 +00:00
Nate Begeman 2a0ca3e937 Reapply this transformation now that it is passing the external test which it previously failed.
llvm-svn: 110987
2010-08-13 00:17:53 +00:00
Chris Lattner 363226dfe8 fix PR7876: If ipsccp decides that a function's address is taken
before it rewrites the code, we need to use that in the post-rewrite pass.

llvm-svn: 110962
2010-08-12 22:25:23 +00:00
Eric Christopher ac40d49c70 Temporarily revert 110737 and 110734, they were causing failures
in an external testsuite.

llvm-svn: 110905
2010-08-12 07:01:22 +00:00
Nate Begeman 3ec892c167 Add test for recent instcombine vector shuffle enhancement
llvm-svn: 110737
2010-08-10 21:58:00 +00:00
Eli Friedman f99e7e6643 PR7853: fix a silly mistake introduced in r101899, and add a test to make sure
it doesn't regress again.

llvm-svn: 110597
2010-08-09 20:49:43 +00:00
Dan Gohman c53ee449a5 Move x86-specific tests out of test/Transforms/LoopStrengthReduce and
into test/CodeGen/X86, so that they aren't run when the x86 target is
not enabled.

Fix uglygep.ll to not be x86-specific.

llvm-svn: 110343
2010-08-05 17:04:15 +00:00
Dan Gohman 3619660529 Make instcombine set explicit alignments on load or store
instructions with alignment 0, so that subsequent passes don't
need to bother checking the TargetData ABI size manually.

llvm-svn: 110128
2010-08-03 18:20:32 +00:00
Peter Collingbourne ddaaf40d24 Add an atomic lowering pass
llvm-svn: 110113
2010-08-03 16:19:16 +00:00
Owen Anderson 8f306a779b Re-apply the infamous r108614, with a fix pointed out by Dirk Steinke.
llvm-svn: 110036
2010-08-02 09:32:13 +00:00
Daniel Dunbar 0b636a24c7 Speculatively revert r108614, "Another attempt at getting the clang self-host to
like my instcombine patch.", in an attempt to fix Clang i386 bootstrap.
 - Also PR7719.

llvm-svn: 109953
2010-07-31 19:51:11 +00:00
Owen Anderson bb4c4b59a4 Fix a test with malformed IR. Not sure why this didn't fail before.
llvm-svn: 109422
2010-07-26 18:44:56 +00:00
Dan Gohman cd83870faf Fix SCEVExpander::visitAddRecExpr so that it remembers the induction variable
it inserted rather than using LoopInfo::getCanonicalInductionVariable to
rediscover it, since that doesn't work on non-canonical loops. This fixes
infinite recurrsion on such loops; PR7562.

llvm-svn: 109419
2010-07-26 18:28:14 +00:00
Dan Gohman b0961f2443 Avoid depending on LCSSA implicitly pulling in LoopSimplify.
llvm-svn: 109410
2010-07-26 18:00:43 +00:00
Owen Anderson 3ccd81864f Testcase for r108687.
llvm-svn: 108689
2010-07-19 08:14:26 +00:00
Owen Anderson 7d2818b073 Another attempt at getting the clang self-host to like my instcombine patch.
llvm-svn: 108614
2010-07-17 06:56:35 +00:00
Nick Lewycky 375efe3157 Arrays and vectors with different numbers of elements are not equivalent.
llvm-svn: 108517
2010-07-16 06:31:12 +00:00
Tobias Grosser 3d84c9c793 LoopSimplify does not update domfrontier correctly.
This fixes PR7649.

llvm-svn: 108513
2010-07-16 05:59:45 +00:00
Eric Christopher 15a81cddb4 Also revert 108422, it's causing some test failures.
Working on testcases for Owen.

llvm-svn: 108494
2010-07-16 01:36:12 +00:00
Dan Gohman c6eefe4d4e Fix this test.
llvm-svn: 108491
2010-07-16 01:28:45 +00:00
Dan Gohman fbbdfcaea7 Fix the order that SCEVExpander considers add operands in so that
it doesn't miss an opportunity to form a GEP, regardless of the
relative loop depths of the operands. This fixes rdar://8197217.

llvm-svn: 108475
2010-07-15 23:38:13 +00:00
Owen Anderson 7151dfd48a Reapply r108378, with bugfixes, testcase, and improved comment formatting.
This now passes LIT, nighty test, and llvm-gcc bootstrap on my machine.

llvm-svn: 108422
2010-07-15 15:00:23 +00:00
Chris Lattner 19eff2a9f6 Fix PR7647, handling the case when 'To' ends up being
mutated by recursive simplification.  This also enhances
ReplaceAndSimplifyAllUses to actually do a real RAUW
at the end of it, which updates any value handles
pointing to "From" to start pointing to "To".  This
seems useful for debug info and random other VH users.

llvm-svn: 108415
2010-07-15 06:36:08 +00:00
Chris Lattner ec0e7b1643 revert r108320, I see the failures now...
llvm-svn: 108322
2010-07-14 06:16:35 +00:00
Chris Lattner 658680b2f5 reapply benjamin's instcombine patch, I don't see anything wrong with it and can't repro any problems with a manual self-host.
llvm-svn: 108320
2010-07-14 05:59:13 +00:00
Duncan Sands f88a284579 Handle the case of a tail recursion in which the tail call is followed
by a return that returns a constant, while elsewhere in the function
another return instruction returns a different constant.  This is a
special case of accumulator recursion, so just generalize the existing
logic a bit.

llvm-svn: 108241
2010-07-13 15:41:41 +00:00
Benjamin Kramer 8f36402ac2 Nope, still breaks the release selfhost bots :(
llvm-svn: 108153
2010-07-12 16:38:48 +00:00
Benjamin Kramer 07b695e052 Reapply the "or" half of r108136, which seems to be less problematic.
llvm-svn: 108152
2010-07-12 16:15:48 +00:00
Benjamin Kramer c719e8ae9e Revert r108141 again, sigh.
llvm-svn: 108148
2010-07-12 14:42:04 +00:00
Benjamin Kramer f578c36035 Reapply 108136 with an ugly pasto fixed.
llvm-svn: 108141
2010-07-12 13:44:00 +00:00
Benjamin Kramer 9675e759cf Revert r108136 until I figure out why it broke selfhost.
llvm-svn: 108139
2010-07-12 12:35:49 +00:00
Benjamin Kramer 35473faa50 instcombine: fold (x & y) | (~x & z) and (x & y) ^ (~x & z) into ((y ^ z) & x) ^ z which is one instruction shorter. (PR6773)
before:
  %and = and i32 %y, %x
  %neg = xor i32 %x, -1
  %and4 = and i32 %z, %neg
  %xor = xor i32 %and4, %and

after:
  %xor1 = xor i32 %z, %y
  %and2 = and i32 %xor1, %x
  %xor = xor i32 %and2, %z

llvm-svn: 108136
2010-07-12 11:54:45 +00:00
Chris Lattner 25eea4db66 fix PR7311 by avoiding breaking casts when a bitcast from scalar->vector
is involved.

llvm-svn: 108117
2010-07-12 01:19:22 +00:00
Chris Lattner bbc25ff5cc if jump threading is able to infer interesting values on both
the LHS and RHS of an and/or instruction, don't multiply add
known predecessor values.  This fixes the crash on testcase
from PR7498

llvm-svn: 108114
2010-07-12 00:47:34 +00:00
Chris Lattner fd4a09fc0a fix PR7429, a crash turning a load from a string into a float.
llvm-svn: 108113
2010-07-12 00:22:51 +00:00
Chris Lattner f8feba368c convert to filechecconvert to filecheckk
llvm-svn: 108112
2010-07-12 00:21:10 +00:00
Chris Lattner 9338b0a1e2 merge two tests.
llvm-svn: 108111
2010-07-12 00:19:47 +00:00
Benjamin Kramer 2321e6a4d4 Teach instcombine to transform
(X >s -1) ? C1 : C2 and (X <s  0) ? C2 : C1
into ((X >>s 31) & (C2 - C1)) + C1, avoiding the conditional.

This optimization could be extended to take non-const C1 and C2 but we better
stay conservative to avoid code size bloat for now.

for
int sel(int n) {
     return n >= 0 ? 60 : 100;
}

we now generate
  sarl  $31, %edi
  andl  $40, %edi
  leal  60(%rdi), %eax

instead of
  testl %edi, %edi
  movl  $60, %ecx
  movl  $100, %eax
  cmovnsl %ecx, %eax

llvm-svn: 107866
2010-07-08 11:39:10 +00:00
Chris Lattner efa3c824cc Fix the second half of PR7437: scalarrepl wasn't preserving
address spaces when SRoA'ing memcpy's.

llvm-svn: 107846
2010-07-08 00:27:05 +00:00
Dale Johannesen 744c74c444 Prevent test from hanging waiting for input.
llvm-svn: 107446
2010-07-01 22:57:11 +00:00
Devang Patel 2b434e12cd Debugging infomration is encoded in llvm IR using metadata. This is designed
such a way that debug info for symbols preserved even if symbols are
optimized away by the optimizer. 

Add new special pass to remove debug info for such symbols.

llvm-svn: 107416
2010-07-01 19:49:20 +00:00
Devang Patel db735cbbab Remove all debug info related named mdnodes.
llvm-svn: 107323
2010-06-30 21:29:00 +00:00
Dan Gohman ae36b1ed42 Fix ScalarEvolution's tripcount computation for chains of loops
where each loop's induction variable's start value is the exit
value of a preceding loop.

llvm-svn: 107224
2010-06-29 23:43:06 +00:00
Dan Gohman e697a6f24f Constant fold x == undef to undef.
llvm-svn: 107074
2010-06-28 21:30:07 +00:00
Chris Lattner 93e63a0218 this test is failing nondeterministically and blaming me, just disable
it for now.

llvm-svn: 106960
2010-06-26 22:08:30 +00:00
Benjamin Kramer c1ecfd86a3 Fix test weirdness.
llvm-svn: 106959
2010-06-26 22:06:50 +00:00
Benjamin Kramer 3bbc52ce3e Fix some tests that didn't test anything.
llvm-svn: 106954
2010-06-26 20:05:06 +00:00
Kenneth Uildriks 7228d98b85 Partial specialization test should not depend on the order of specialization operations or the names assigned to the specialized functions
llvm-svn: 106953
2010-06-26 18:47:40 +00:00
Duncan Sands 3a5cb69cb8 Fix PR7328: when turning a tail recursion into a loop, need to preserve
the returned value after the tail call if it differs from other return
values.  The optimal thing to do would be to introduce a phi node for
the return value, but for the moment just fix the miscompile.

llvm-svn: 106947
2010-06-26 12:53:31 +00:00
Dan Gohman f3aea7aecf Disable indvars on loops when LoopSimplify form is not available.
This fixes PR7333.

llvm-svn: 106267
2010-06-18 01:35:11 +00:00
Rafael Espindola 29dda21e96 Remove arm_apcscc from the test files. It is the default and doing this
matches what llvm-gcc and clang now produce.

llvm-svn: 106221
2010-06-17 15:18:27 +00:00
Rafael Espindola a20e2dfe86 Make sure that simplify libcalls does not replace a call with one calling
convention with a new call with a different calling convention.

llvm-svn: 106134
2010-06-16 19:34:01 +00:00
Benjamin Kramer a13bd20396 simplify-libcalls: fold strncmp(x, y, 1) -> memcmp(x, y, 1)
The memcmp will be optimized further and even the pathological case
'strstr(x, "x") == x' generates optimal code now.

llvm-svn: 106097
2010-06-16 10:30:29 +00:00
Benjamin Kramer 1118860e3a simplify-libcalls: fold strstr(a, b) == a -> strncmp(a, b, strlen(b)) == 0
llvm-svn: 106047
2010-06-15 21:34:25 +00:00
Rafael Espindola 5a24a56e1e Remove the arm_aapcscc marker from the tests. It is the default
for the linux targets.

llvm-svn: 106029
2010-06-15 19:04:29 +00:00
Chris Lattner 329ea064ed jump threading can't split a critical edge from an indirectbr. This
fixes PR7356.

llvm-svn: 105950
2010-06-14 19:45:43 +00:00
Benjamin Kramer 6e42d53cb3 Test case for r105914.
llvm-svn: 105915
2010-06-13 16:16:54 +00:00
Kenneth Uildriks 1850444000 Partial specialization was not checking the callsite to make sure it was using the same constants as the specialization, leading to calls to the wrong specialization. Patch by Takumi Nakamura\!
llvm-svn: 105528
2010-06-05 14:50:21 +00:00
Devang Patel 36da24b546 Copy location info for current function argument from dbg.declare if respective store instruction does not have any location info.
llvm-svn: 105490
2010-06-04 22:27:30 +00:00
Duncan Sands 4c904fa797 Fix PR7272: when inlining through a callsite with byval arguments,
the newly created allocas may be used by inlined calls, so these
need to have their tail call flags cleared.  Fixes PR7272.

llvm-svn: 105255
2010-05-31 21:00:26 +00:00
Nick Lewycky aee2632be3 The memcpy intrinsic only takes i8* for %src and %dst, so cast them to that
first. Fixes PR7265.

llvm-svn: 105206
2010-05-31 06:16:35 +00:00
Dale Johannesen 526bd59aaf Add missing space; works for me.
llvm-svn: 104992
2010-05-28 18:45:59 +00:00
Dan Gohman df5d7dcef1 Teach instcombine to promote alloca array sizes.
llvm-svn: 104945
2010-05-28 15:09:00 +00:00
Dan Gohman 71505aa4de Add a testcase for getelementptr index promotion.
llvm-svn: 104944
2010-05-28 15:07:59 +00:00
Devang Patel 7a9dedf0ab Do not drop location info for inlined function args.
llvm-svn: 104884
2010-05-27 20:25:04 +00:00
Duncan Sands f162eace49 Teach instCombine to remove malloc+free if malloc's only uses are comparisons
to null.  Patch by Matti Niemenmaa.

llvm-svn: 104871
2010-05-27 19:09:06 +00:00
Benjamin Kramer 9439084cea Properly promote operands when optimizing a single-character memcmp.
llvm-svn: 104648
2010-05-25 22:53:43 +00:00
Nick Lewycky 23b545ca4b Actually run the test. Thanks Daniel Dunbar!
llvm-svn: 103720
2010-05-13 17:41:06 +00:00
Nick Lewycky 3230f0ac25 Add testcase for r103653.
llvm-svn: 103699
2010-05-13 06:00:14 +00:00
Chris Lattner 84d4618659 make simplifycfg insert an llvm.trap before the 'unreachable' it introduces
when it detects undefined behavior.  llvm.trap generally codegens into some
thing really small (e.g. a 2 byte ud2 instruction on x86) and debugging this
sort of thing is "nontrivial".  For example, we now compile:

void foo() { *(int*)0 = 42; }

into:

_foo:
	pushl	%ebp
	movl	%esp, %ebp
	ud2

Some may even claim that this is a security hole, though that seems dubious
to me.  This addresses rdar://7958343 - Optimizing away null dereference 
potentially allows arbitrary code execution

llvm-svn: 103356
2010-05-08 22:15:59 +00:00
Chris Lattner 02b0df5338 Teach instcombine to transform a bitcast/(zext|trunc)/bitcast sequence
with a vector input and output into a shuffle vector.  This sort of 
sequence happens when the input code stores with one type and reloads
with another type and then SROA promotes to i96 integers, which make
everyone sad.

This fixes rdar://7896024

llvm-svn: 103354
2010-05-08 21:50:26 +00:00
Chris Lattner 5a62d6e578 Fix PR7052, patch by Jakub Staszak!
llvm-svn: 103347
2010-05-08 20:01:44 +00:00
Devang Patel be8ee1a09e Update test to use valid debug info.
llvm-svn: 103287
2010-05-07 20:34:00 +00:00
Dan Gohman 5d5b8b1b8c Add an LLVM IR version of code sinking. This uses the same simple algorithm
as MachineSink, but it isn't constrained by MachineInstr-level details.

llvm-svn: 103257
2010-05-07 15:40:13 +00:00
Duncan Sands 687900ed83 Use llvm.foo as the intrinsic, rather than llvm.dbg.value. Since the
values passed to llvm.dbg.value were not valid for the intrinsic, it
might have caused trouble one day if the verifier ever started checking
for valid debug info.

llvm-svn: 103038
2010-05-04 20:09:25 +00:00
Duncan Sands c2928c6ef5 Fix a variant of PR6112 found by thinking about it: when doing
RAUW of a global variable with a local variable in function F,
if function local metadata M in function G was using the global
then M would become function-local to both F and G, which is not
allowed.  See the testcase for an example.  Fixed by detecting
this situation and zapping the metadata operand when it occurs.

llvm-svn: 103007
2010-05-04 12:43:36 +00:00
Devang Patel 9f5200a122 Check for side effects before splitting loop.
Patch by Jakub Staszak!

llvm-svn: 102928
2010-05-03 18:06:58 +00:00
Chris Lattner b49a622fe9 revert r102831. We already delete dead readonly calls in
other places, killing a valid transformation is not the right
answer.

llvm-svn: 102850
2010-05-01 17:19:38 +00:00
Owen Anderson 550986ea90 Disable the call-deletion transformation introduced in r86975. Without
halting analysis, it is illegal to delete a call to a read-only function.
The correct solution is almost certainly to add a "must halt" attribute and
only allow deletions in its presence.

XFAIL the relevant testcase for now.

llvm-svn: 102831
2010-05-01 08:34:28 +00:00
Chris Lattner 532112b98a fix PR5009 by making CGSCCPM realize that a call was devirtualized
if an indirect call site was removed and a direct one was added, not
just if an indirect call site was modified to be direct.

llvm-svn: 102830
2010-05-01 06:38:43 +00:00
Chris Lattner c3bc80a082 rename test
llvm-svn: 102829
2010-05-01 06:34:13 +00:00
Chris Lattner fc8d9ee6c3 Implement rdar://6295824 and PR6724 with two tiny changes
that can have a big effect :).  The first is to enable the
iterative SCC passmanager juice that kicks in when the
scc passmgr detects that a function pass has devirtualized
a call.  In this case, it will rerun all the passes it 
manages on the SCC, up to the iteration count limit (4). This
is useful because a function pass may devirualize a call, and
we want the inliner to inline it, or pruneeh to infer stuff
about it, etc.

The second patch is to add *all* call sites to the 
DevirtualizedCalls list the inliner uses.  This list is
about to get renamed, but the jist of this is that the 
inliner now reconsiders *all* inlined call sites as candidates
for further inlining.  The intuition is this that in cases 
like this:

f() { g(1); }     g(int x) { h(x); }

We analyze this bottom up, and may decide that it isn't 
profitable to inline H into G.  Next step, we decide that it is
profitable to inline G into F, and do so, which means that F 
now calls H.  Even though the call from G -> H may not have been
profitable to inline, the call from F -> H may be (in this case
because a constant allows folding etc).

In my spot checks, this doesn't have a big impact on code.  For
example, the LLC output for 252.eon grew from 0.02% (from
317252 to 317308) and 176.gcc actually shrunk by .3% (from 1525612
to 1520964 bytes).  252.eon never iterated in the SCC Passmgr,
176.gcc iterated at most 1 time.

llvm-svn: 102823
2010-05-01 01:15:56 +00:00
Chris Lattner e8262675a3 The inliner has traditionally not considered call sites
that appear due to inlining a callee as candidates for
futher inlining, but a recent patch made it do this if
those call sites were indirect and became direct.

Unfortunately, in bizarre cases (see testcase) doing this
can cause us to infinitely inline mutually recursive
functions into callers not in the cycle.  Fix this by
keeping track of the inline history from which callsite
inline candidates got inlined from.

This shouldn't affect any "real world" code, but is required
for a follow on patch that is coming up next.

llvm-svn: 102822
2010-05-01 01:05:10 +00:00
Chris Lattner a9bac86d16 Dan recently disabled recursive inlining within a function, but we
were still inlining self-recursive functions into other functions.

Inlining a recursive function into itself has the potential to
reduce recursion depth by a factor of 2, inlining a recursive
function into something else reduces recursion depth by exactly 
1.  Since inlining a recursive function into something else is a
weird form of loop peeling, turn this off.

The deleted testcase was added by Dale in r62107, since then
we're leaning towards not inlining recursive stuff ever.  In any
case, if we like inlining recursive stuff, it should be done 
within the recursive function itself to get the algorithm 
recursion depth win.

llvm-svn: 102798
2010-04-30 22:37:22 +00:00
Devang Patel 3ca9a9b59c Preserve debug info attached with call instruction while eliminating dead argument.
Radar 7927803

llvm-svn: 102760
2010-04-30 20:23:54 +00:00
Chris Lattner 669064a772 fix this to work with objdir != srcdir
llvm-svn: 102547
2010-04-28 22:34:35 +00:00
Chris Lattner 450e29cb4c fix PR6112 - When globalopt (or any other pass) does RAUW(@G, %G),
metadata references in non-function-local MDNodes should drop to 
null.

llvm-svn: 102519
2010-04-28 20:16:12 +00:00
Chris Lattner 87aa2243e2 fix PR6940: sitofp(undef) folds to 0.0, not undef.
llvm-svn: 102358
2010-04-26 18:21:23 +00:00
Chris Lattner 11d1df442d no longer xfail
llvm-svn: 102220
2010-04-23 22:39:33 +00:00
Chris Lattner 126a58e084 fix some failures my callgraph dump format change broke.
llvm-svn: 102197
2010-04-23 18:38:40 +00:00
Chris Lattner 6f41ef9d31 testcase for the bug that required a patch to be reverted.
llvm-svn: 102195
2010-04-23 18:31:01 +00:00
Chris Lattner d8d898dbd3 disable my previous inliner patch, it appears to be busting self-host.
llvm-svn: 102153
2010-04-23 00:41:03 +00:00
Chris Lattner 2eee5d3467 The inliner was choosing to not consider call sites
that appear in the SCC as a result of inlining as candidates
for inlining.  Change this so that it *does* consider call 
sites that change from being indirect to being direct as a
result of inlining.  This allows it to completely 
"devirtualize" the testcase.

llvm-svn: 102146
2010-04-22 23:37:35 +00:00
Chris Lattner 055cf267db add a DEBUG call so that -debug lists when CGSCCPM iterates.
Fix RefreshCallGraph to use CGN->replaceCallEdge instead of hand
rolling its own loop.  replaceCallEdge properly maintains the
reference counts of the nodes, fixing a crash exposed by the
iterative callgraph stuff.

llvm-svn: 102120
2010-04-22 20:42:33 +00:00
Chris Lattner 6fbe704932 Implement (but don't enable) PR6724 and rdar://6295824. In short,
we have RefreshCallGraph detect when a function pass devirtualizes
a call, and have CGSCCPassMgr iterate (up to a count) when this 
happens.  This allows (in the example) GVN to devirtualize the 
call in foo, then the inliner to inline it away.

This is not currently enabled because I haven't done any analysis
on the (potentially substantial) code size or performance impact of
doing this, and guess what, it exposes callgraph updating bugs in
various passes.  This is progress though, and you can play with it
by passing -max-cg-scc-iterations=5 to opt.

llvm-svn: 101973
2010-04-21 00:47:40 +00:00
Dan Gohman 4398308fa7 Revert r101471. For tight recursive functions which have multiple
recursive callsites, inlining can reduce the number of calls by
exponential factors, as it does in
MultiSource/Benchmarks/Olden/treeadd. More involved heuristics
will be needed.

llvm-svn: 101969
2010-04-21 00:43:30 +00:00
Chris Lattner 5814d9d9da RewriteLoopBodyWithConditionConstant can end up rewriting the
condition we're unswitching on.  In this case, don't try to
simplify the second copy of the loop which may be dead or not,
but is probably a constant now.  This fixes PR6879

llvm-svn: 101870
2010-04-20 05:09:16 +00:00
Chris Lattner e93846762a Fix rdar://7879828 - crash in CallGraph, a self host issue.
Arg promotion was deleting call graph nodes that still had references
from the 'indirect' CGN.  Like the inliner, it should only delete the
function if all references are gone.

llvm-svn: 101845
2010-04-20 00:46:50 +00:00
Dan Gohman e637ff5e9a Remove the Expr member from IVUsers. Instead of remembering the expression,
just ask ScalarEvolution for it on demand. This helps IVUsers be more robust
in the case of expressions changing underneath it. This fixes PR6862.

llvm-svn: 101819
2010-04-19 21:48:58 +00:00
Nick Lewycky fbe8d2803d Fix declarations in a few more tests.
llvm-svn: 101676
2010-04-17 21:29:25 +00:00
Nick Lewycky d4c0f86a5e Fix intrinsic signature in this test.
llvm-svn: 101674
2010-04-17 21:12:55 +00:00
Bob Wilson ca51425d94 Re-commit my previous SSAUpdater changes. The previous version naively tried
to determine where to place PHIs by iteratively comparing reaching definitions
at each block.  That was just plain wrong.  This version now computes the
dominator tree within the subset of the CFG where PHIs may need to be placed,
and then places the PHIs in the iterated dominance frontier of each definition.
The rest of the patch is mostly the same, with a few more performance
improvements added in.

llvm-svn: 101612
2010-04-17 03:08:24 +00:00
Dan Gohman f13f69f296 Disable inlining of recursive calls. It can complicate tailcallelim and
dependent analyses, and increase code size, so doing it profitably would
require more complex heuristics.

llvm-svn: 101471
2010-04-16 16:01:18 +00:00
Dan Gohman 99e5327bfd Refine the detection of seemingly infinitely recursive calls where the
callee is expected to be expanded to something else by codegen, so that
normal infinitely recursive calls are still transformed.

llvm-svn: 101468
2010-04-16 15:57:50 +00:00
Chris Lattner 393e08536d move comment.
llvm-svn: 101433
2010-04-16 01:05:52 +00:00
Chris Lattner 1146d326a7 fix PR6832: we were using the alignment of a pointer when we
wanted the alignment of the pointee.

llvm-svn: 101432
2010-04-16 01:05:38 +00:00
Evan Cheng 9e100384cb Trim tests and convert to FileCheck.
llvm-svn: 101277
2010-04-14 20:22:17 +00:00
Nick Lewycky ca615eb0d6 Revert r101213.
llvm-svn: 101231
2010-04-14 04:51:58 +00:00
Nick Lewycky 8408f33deb Commit testcase for r101213.
llvm-svn: 101214
2010-04-14 03:46:42 +00:00
Dan Gohman 7ef0dc2163 Teach ScalarEvolution to simplify smax and umax when it can prove
that one operand is always greater than another.

llvm-svn: 101142
2010-04-13 16:51:03 +00:00
Dan Gohman 5867a56db8 Teach IndVarSimplify how to eliminate remainder operators where the
numerator is an induction variable. For example, with code like this:

  for (i=0;i<n;++i)
    x[i%n] = 0;

IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the remainder.

llvm-svn: 101113
2010-04-13 01:46:36 +00:00
Dan Gohman 4a645b88ef Suppress LinearFunctionTestReplace when the computed backedge-taken
expression is a UDiv and it doesn't appear that the UDiv came from
the user's source.

ScalarEvolution has recently figured out how to compute a tripcount
expression for the inner loop in
SingleSource/Benchmarks/Shootout/sieve.c, using a udiv. Emitting a
udiv instruction dramatically slows down the enclosing loop.

llvm-svn: 101068
2010-04-12 21:13:43 +00:00
Eric Christopher 1f272f7fd8 Verify function prototypes before trying to optimize functions. We also
need TargetData, just return false if we don't have it.

Update testcases accordingly.

Fixes PR6807.

llvm-svn: 101011
2010-04-12 04:48:00 +00:00
Dan Gohman fa5ad797e3 Re-apply r101000, with a fix: Don't eliminate an icmp which is part of
the loop exit test. This usually doesn't come up for a variety of
reasons, but it isn't impossible, so make IndVarSimplify handle it
conservatively.

llvm-svn: 101008
2010-04-12 02:21:50 +00:00
Dan Gohman c0f1efaf8d Revert 101000, which is breaking self-host builds.
llvm-svn: 101002
2010-04-12 00:17:10 +00:00
Dan Gohman af4ab1b681 Teach IndVarSimplify how to eliminate comparisons involving induction
variables. For example, with code like this:

  for (i=0;i<n;++i)
    if (i<n)
      x[i] = 0;

IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the if.

llvm-svn: 101000
2010-04-11 23:10:12 +00:00
Chris Lattner 9ae28b141f fix PR6743, a case where we'd delete an instruction before using it
in some cases.

llvm-svn: 100937
2010-04-10 18:26:57 +00:00
Chris Lattner b9801ffcb5 fix PR6760, a missing check in heap SRoA.
llvm-svn: 100936
2010-04-10 18:19:22 +00:00
Dan Gohman 607e02b33a When determining a canonical insert position, don't climb deeper
into adjacent loops. Also, ensure that the insert position is
dominated by the loop latch of any loop in the post-inc set which
has a latch.

llvm-svn: 100906
2010-04-09 22:07:05 +00:00
Dan Gohman 3295a6e5bc When emitting code for an add, don't force a SCEVUnknown wrapper around
a hoisted intermediate result if the intermediate result isn't an
Instruction.

llvm-svn: 100884
2010-04-09 19:14:31 +00:00
Dan Gohman ee6451dca1 Fix a bug in IVUsers which was permitting non-affine addrecs to
be sent to LSR, which it isn't prepared to handle.

llvm-svn: 100839
2010-04-09 01:22:56 +00:00
Chris Lattner c6c153be45 fix a SCCP miscompilation that could happen when a
forced constant is changed to a constant, we would end
up adding the instruction to the wrong worklist, 
preventing it from being properly revisited.  This fixes
rdar://7832370

llvm-svn: 100837
2010-04-09 01:14:31 +00:00
Dan Gohman 386e01e879 Print empty structs as {} rather than { }.
llvm-svn: 100787
2010-04-08 18:03:05 +00:00
Chris Lattner 3ae2dd2ba5 add newlines at the end of files.
llvm-svn: 100705
2010-04-07 22:53:17 +00:00
Dan Gohman d006ab90dd Generalize IVUsers to track arbitrary expressions rather than expressions
explicitly split into stride-and-offset pairs. Also, add the
ability to track multiple post-increment loops on the same expression.

This refines the concept of "normalizing" SCEV expressions used for
to post-increment uses, and introduces a dedicated utility routine for
normalizing and denormalizing expressions.

This fixes the expansion of expressions which are post-increment users
of more than one loop at a time. More broadly, this takes LSR another
step closer to being able to reason about more than one loop at a time.

llvm-svn: 100699
2010-04-07 22:27:08 +00:00
Mon P Wang c576ee9040 Reapply address space patch after fixing an issue in MemCopyOptimizer.
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)

llvm-svn: 100304
2010-04-04 03:10:48 +00:00
Chris Lattner 40060d33f6 add integer overflow check for the fp induction variable
checker.  Amusingly, we already had tests that we should
have rejects because they would be miscompiled in the
testsuite.

The remaining issue with this is that we don't check that
the branch causes us to exit the loop if it fails, so we
don't actually know if we remain in bounds.

llvm-svn: 100284
2010-04-03 07:18:48 +00:00
Chris Lattner 40ea690f39 fix PR6761, a miscompilation due to the fp->int IV conversion
stuff.  More bugs remain though.

llvm-svn: 100282
2010-04-03 06:30:03 +00:00
Chris Lattner 2508bcf176 convert to filecheck
llvm-svn: 100281
2010-04-03 06:27:56 +00:00
Chris Lattner 8ca668497f rename feature test.
llvm-svn: 100279
2010-04-03 06:24:28 +00:00
Chris Lattner fc9e88ae76 actually just remove this, will move the real feature test here.
llvm-svn: 100278
2010-04-03 06:24:03 +00:00
Chris Lattner 290f42a9c3 rename test since it is a feature test.
llvm-svn: 100277
2010-04-03 06:22:52 +00:00
Chris Lattner c558b49f14 first half of a pass through IndVarSimplify::HandleFloatingPointIV,
this cleans up a bunch of code and also fixes several crashes and
miscompiles.  More to come unfortunately, this optimization
is quite broken.

llvm-svn: 100270
2010-04-03 05:54:59 +00:00
Bob Wilson f1aa4743d9 Revert all my SSAUpdater patches. The PHI placement algorithm is not correct
(what was I thinking?) and there's also a problem with LCSSA.  I'll try again
later with fixes.

--- Reverse-merging r100263 into '.':
U    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100177 into '.':
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100148 into '.':
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100147 into '.':
U    include/llvm/Transforms/Utils/SSAUpdater.h
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100131 into '.':
G    include/llvm/Transforms/Utils/SSAUpdater.h
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100130 into '.':
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100126 into '.':
G    include/llvm/Transforms/Utils/SSAUpdater.h
G    lib/Transforms/Utils/SSAUpdater.cpp
--- Reverse-merging r100050 into '.':
D    test/Transforms/GVN/2010-03-31-RedundantPHIs.ll
--- Reverse-merging r100047 into '.':
G    include/llvm/Transforms/Utils/SSAUpdater.h
G    lib/Transforms/Utils/SSAUpdater.cpp

llvm-svn: 100264
2010-04-03 03:50:38 +00:00
Mon P Wang 999c1b927b Revert r100191 since it breaks objc in clang
llvm-svn: 100199
2010-04-02 18:43:02 +00:00
Mon P Wang a972ab8564 Reapply address space patch after fixing an issue in MemCopyOptimizer.
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)

llvm-svn: 100191
2010-04-02 18:04:15 +00:00
Dan Gohman f7239102fe Manually notify ScalarEvolution before making an operand replacement, since
it can't currently observe such changes automatically.

llvm-svn: 100186
2010-04-02 14:48:31 +00:00
Dan Gohman 4bd755419f Revert the recent alignment changes. They're broken for -Os because,
in particular, they end up aligning strings at 16-byte boundaries, and
there's no way for GlobalOpt to check OptForSize.

llvm-svn: 100172
2010-04-02 03:04:37 +00:00
Dan Gohman c671347fcb Make globalopt refine global variable alignment.
llvm-svn: 100160
2010-04-02 00:14:16 +00:00
Bob Wilson b9fb48bff7 Add a redundant PHI testcase for SSAUpdater to go with svn r100047.
llvm-svn: 100050
2010-03-31 21:38:43 +00:00
Gabor Greif 6882a5eea1 testcase for r99914, provided by baldrick!
llvm-svn: 100043
2010-03-31 20:37:13 +00:00
Bob Wilson 6f7fd28824 Revert Mon Ping's change 99928, since it broke all the llvm-gcc buildbots.
llvm-svn: 99948
2010-03-30 22:27:04 +00:00
Mon P Wang 7460571381 Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
A update of langref will occur in a subsequent checkin.

llvm-svn: 99928
2010-03-30 20:55:56 +00:00
Chris Lattner 0563804982 fix PR6642, GVN forwarding from memset to load of the base of the memset.
llvm-svn: 99488
2010-03-25 05:58:19 +00:00
Eric Christopher b1a382d8b9 Reapply r99451 with a fix to move the NoInline check to the cost functions
instead of InlineFunction.

llvm-svn: 99483
2010-03-25 04:49:10 +00:00
Eric Christopher 1d38538fb6 Temporarily revert this, it's causing an issue with an internal project.
llvm-svn: 99451
2010-03-24 23:35:21 +00:00
Chris Lattner 00eeac4179 add some accessors to callsite/callinst/invokeinst to check
for the noinline attribute, and make the inliner refuse to
inline a call site when the call site is marked noinline even
if the callee isn't.  This fixes PR6682.

llvm-svn: 99341
2010-03-23 22:59:07 +00:00
Evan Cheng d9e822345c Teach simplify libcall to transform __strcpy_chk to __memcpy_chk to enable optimizations down stream.
llvm-svn: 99282
2010-03-23 15:48:04 +00:00
Evan Cheng 3f7842232e Fix an incorrect logic causing instcombine to miss some _chk -> non-chk transformations.
llvm-svn: 99263
2010-03-23 06:06:09 +00:00
Evan Cheng 2a65429671 Fix a typo in ValueTracking that's causing instcombine to delete needed shift instructions.
llvm-svn: 98416
2010-03-13 02:20:29 +00:00
Duncan Sands 8c35506fbd When constant folding GEP of GEP, do not crash if an index of
the inner GEP is not a ConstantInt.

llvm-svn: 98359
2010-03-12 17:55:20 +00:00
Dan Gohman 93452cebda Make isLCSSA ignore uses in blocks not reachable from the entry block,
as LCSSA no longer transforms such uses.

llvm-svn: 98033
2010-03-09 01:53:33 +00:00