Commit Graph

10777 Commits

Author SHA1 Message Date
Chris Lattner 50506787d1 fix a bug in my licm rewrite when a load from the promoted memory
location is being re-stored to the memory location.  We would get
a dangling pointer from the SSAUpdate data structure and miss a 
use.  This fixes PR8068

llvm-svn: 113042
2010-09-04 00:12:30 +00:00
Owen Anderson c91c1a205a Propagate non-local comparisons. Fixes PR1757.
llvm-svn: 113025
2010-09-03 22:47:08 +00:00
Dale Johannesen 367afb5a00 Remove the rest of the nonexistent 64-bit AVX instructions.
Bruno, please review.

llvm-svn: 113014
2010-09-03 21:23:00 +00:00
David Greene 2a9de4d828 Generalize getFieldType to work on all TypedInits. Add a couple of testcases from
Amaury Pouly.

llvm-svn: 113010
2010-09-03 21:00:49 +00:00
Owen Anderson c725462245 Add support for simplifying a load from a computed value to a load from a global when it
is provable that they're equivalent.  This fixes PR4855.

llvm-svn: 112994
2010-09-03 19:08:37 +00:00
Jim Grosbach 03f4be86ba Re-apply r112883:
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."

r112986 fixed a latent bug exposed by the above.

llvm-svn: 112989
2010-09-03 18:37:12 +00:00
Owen Anderson 064cb4c807 Add a test for PR4413, which was apparently fixed at some point in the past.
llvm-svn: 112987
2010-09-03 18:33:08 +00:00
Owen Anderson 50d8c8888c Add PR number to test.
llvm-svn: 112971
2010-09-03 16:58:25 +00:00
Daniel Dunbar 2ac3386ef3 Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6.
This reverts commit 8d6e29cfda270be483abf638850311670829ee65.

llvm-svn: 112962
2010-09-03 15:26:42 +00:00
NAKAMURA Takumi 24d039ebe3 test/CodeGen/X86: Add explicit -mtriple=(i686|x86_64)-linux for Win32 host.
llvm-svn: 112947
2010-09-03 03:24:08 +00:00
Bruno Cardoso Lopes d6634a5b2e AVX doesn't support mm operations neither its instrinsics.
The AVX versions of PALIGN and PABS* should only exist for
128-bit. Remove the unnecessary stuff.

llvm-svn: 112944
2010-09-03 02:08:45 +00:00
Bob Wilson f65c9ef720 Replace NEON vabdl, vaba, and vabal intrinsics with combinations of the
vabd intrinsic and add and/or zext operations.  In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests.  Auto-upgrade the old intrinsics.

llvm-svn: 112941
2010-09-03 01:35:08 +00:00
Chris Lattner 7bf4b82e97 update one more test
llvm-svn: 112910
2010-09-02 23:32:55 +00:00
Chris Lattner 7f2f0930a7 add a new "llvm-dis -show-annotations" option, which causes it to print
#uses comments, with a testcase.

llvm-svn: 112906
2010-09-02 23:21:44 +00:00
Anton Korobeynikov a5a645559c Properly emit __chkstk call instead of __alloca on non-mingw windows targets.
Patch by Cameron Esfahani!

llvm-svn: 112902
2010-09-02 23:03:46 +00:00
Chris Lattner 65fb25a257 more test cleanup
llvm-svn: 112892
2010-09-02 22:38:56 +00:00
Chris Lattner bb451461ec remove some noise from tests.
llvm-svn: 112889
2010-09-02 22:35:33 +00:00
Chris Lattner a18d7ec4fb we are past the point where these tests are useful.
llvm-svn: 112887
2010-09-02 22:32:02 +00:00
Jim Grosbach 7fd9aea67c For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.

rdar://7352504
rdar://8374540
rdar://8355680

llvm-svn: 112883
2010-09-02 22:29:01 +00:00
Chris Lattner affc0e42f0 fix more AST updating bugs, correcting miscompilation in PR8041
llvm-svn: 112878
2010-09-02 22:19:10 +00:00
Dan Gohman 3c9b5f394b Don't narrow the load and store in a load+twiddle+store sequence unless
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.

This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.

llvm-svn: 112861
2010-09-02 21:18:42 +00:00
Sandeep Patel 0ca17f7e8a Fix an unnecessary XFAIL
llvm-svn: 112853
2010-09-02 20:19:24 +00:00
Owen Anderson 67dee4dcac Fix typo. I accidentally edited the wrong file before my last commit.
llvm-svn: 112851
2010-09-02 19:52:06 +00:00
Benjamin Kramer e39017cb97 Add AsmParser support for the ELF .previous directive. Patch by Roman Divacky.
llvm-svn: 112849
2010-09-02 18:53:37 +00:00
Owen Anderson a8c896b704 Fix a bug in LazyValueInfo that CorrelatedValuePropagation exposed: In the LVI lattice, undef and the full set ConstantRange should not
be treated as equivalent.

llvm-svn: 112843
2010-09-02 18:23:58 +00:00
Jim Grosbach 66c681a644 Now that register allocation properly considers reserved regs, simplify the
ARM register class allocation order functions to take advantage of that.

llvm-svn: 112841
2010-09-02 18:14:29 +00:00
Bob Wilson 75a6408f88 Convert VLD1 and VLD2 instructions to use pseudo-instructions until
after regalloc.

llvm-svn: 112825
2010-09-02 16:00:54 +00:00
Duncan Sands 8dda07428a Print the number of uses of a function in the .ll since it can be informative
and there seems to be no reason not to.

llvm-svn: 112812
2010-09-02 08:52:23 +00:00
NAKAMURA Takumi a224e5563e test/loop-strength-reduce4: Add explicit triplet for Win32 host.
llvm-svn: 112802
2010-09-02 03:45:58 +00:00
NAKAMURA Takumi 54ce546865 test/twoaddr-coalesce: Do not use @main.
Win32 codegen emits implicit invoking __main into, to fail.

llvm-svn: 112801
2010-09-02 03:45:51 +00:00
Bob Wilson 38ab35a911 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Chris Lattner 8af45a889d deepen my MMX/SRoA hack to avoid hurting non-x86 codegen.
llvm-svn: 112763
2010-09-01 23:09:27 +00:00
Bruno Cardoso Lopes fea81b4831 Using target specific nodes for shuffle nodes makes the mask
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:

  movaps  (%rdi), %xmm0
  movaps  (%rax), %xmm1
  movaps  %xmm0, %xmm2
  movss %xmm1, %xmm2
  shufps  $36, %xmm2, %xmm0

now is generated as:

  movaps  (%rdi), %xmm0
  movaps  %xmm0, %xmm1
  movlps  (%rax), %xmm1
  shufps  $36, %xmm1, %xmm0

llvm-svn: 112753
2010-09-01 22:33:20 +00:00
Jakob Stoklund Olesen 4b6fd48bba Teach RemoveCopyByCommutingDef to check all aliases, not just subregisters.
This caused a miscompilation in WebKit where %RAX had conflicting defs when
RemoveCopyByCommutingDef was commuting a %EAX use.

llvm-svn: 112751
2010-09-01 22:15:35 +00:00
Dale Johannesen 78d95e0089 Apparently only Darwin passes long double misaligned. Compensate.
llvm-svn: 112748
2010-09-01 21:57:20 +00:00
Dan Gohman 0ad7d9c24e Fix loop unswitching's assumption that a code path which either
infinite loops or exits will eventually exit. This fixes PR5373.

llvm-svn: 112745
2010-09-01 21:46:45 +00:00
Bill Wendling 6456efaffd The output of opt -stats must be sent to stderr. Patch by NAKAMURA Takumi!
llvm-svn: 112724
2010-09-01 18:32:56 +00:00
Chris Lattner 39eccb4754 temporarily revert r112664, it is causing a decoding conflict, and
the testcases should be merged.

llvm-svn: 112711
2010-09-01 16:00:50 +00:00
Michael J. Spencer d8e5dfccc1 COFF: Update tests to reflect changes in last commit.
llvm-svn: 112704
2010-09-01 14:15:31 +00:00
Dale Johannesen e13c04d6da Attempt to fix buildbot.
llvm-svn: 112697
2010-09-01 05:19:06 +00:00
Chris Lattner 34e5361eb5 add a gross hack to work around a problem that Argiris reported
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.

In the short term, force off MMX datatypes.  In the long term,
the X86 backend should not select generic vector types to MMX
registers.  This is being worked on, but won't be done in time
for 2.8.  rdar://8380055

llvm-svn: 112696
2010-09-01 05:14:33 +00:00
Chris Lattner b9ed4f252f filecheckize
llvm-svn: 112695
2010-09-01 05:10:14 +00:00
Dan Gohman 110ed64fbb Revert 112442 and 112440 until the compile time problems introduced
by 112440 are resolved.

llvm-svn: 112692
2010-09-01 01:45:53 +00:00
Dale Johannesen 52bd0dc3bb Testcase for llvm checkin 112674.
llvm-svn: 112675
2010-08-31 23:43:55 +00:00
Chris Lattner 030f02021b licm is wasting time hoisting constant foldable operations,
instead of hoisting them, just fold them away.  This occurs in the
testcase for PR8041, for example.

llvm-svn: 112669
2010-08-31 23:00:16 +00:00
Bill Wendling 6789f8b6ae We have a chance for an optimization. Consider this code:
int x(int t) {
  if (t & 256)
    return -26;
  return 0;
}

We generate this:

     tst.w   r0, #256
     mvn     r0, #25
     it      eq
     moveq   r0, #0

while gcc generates this:

     ands    r0, r0, #256
     it      ne
     mvnne   r0, #25
     bx      lr

Scandalous really!

During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):

  %r0 = ISD::AND ...
  ARMISD::CMPZ %r0, 0         @ sets [CPSR]
  %r0 = ARMISD::MOVCC 0, -26  @ reads [CPSR]

All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!

llvm-svn: 112664
2010-08-31 22:41:22 +00:00
Devang Patel 86ec8b3a3f Reapply r112623. Included additional check for unused byval argument.
llvm-svn: 112659
2010-08-31 22:22:42 +00:00
Owen Anderson a5e6b3eca4 Merge 2010-08-31-InfiniteRecursion.ll into crash.ll.
llvm-svn: 112635
2010-08-31 20:27:17 +00:00
Devang Patel 529f248eb4 Revert r112623. It is causing self host build failures.
llvm-svn: 112631
2010-08-31 19:41:03 +00:00
Devang Patel 8559932d36 Remember byval argument's frame index during argument lowering and use this info to emit debug info.
Fixes Radar 8367011.

llvm-svn: 112623
2010-08-31 18:50:09 +00:00
Owen Anderson 799a08ae48 Add a test for the duplicated-conditional situation illutrated by PR5652.
llvm-svn: 112621
2010-08-31 18:49:12 +00:00
Chris Lattner e2295f1c80 merge two tests.
llvm-svn: 112617
2010-08-31 18:44:03 +00:00
Owen Anderson 3931c85956 Manually reduce this testcase.
llvm-svn: 112615
2010-08-31 18:16:29 +00:00
Chris Lattner fbcd165b59 merge two tests and convert to filecheck.
llvm-svn: 112613
2010-08-31 18:05:08 +00:00
Owen Anderson ada0623725 Add a micro-test for the transforms I added to JumpThreading.
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
   ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
   can't handle.  Since it already handles all the cases without other instructions in the def-use chain
   between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
   as well.

llvm-svn: 112611
2010-08-31 17:59:07 +00:00
Jim Grosbach ad9b6de3b6 Update test for 112609
llvm-svn: 112610
2010-08-31 17:58:47 +00:00
Owen Anderson 064b139c8d Rename test directory to reflect new pass name.
llvm-svn: 112592
2010-08-31 07:50:31 +00:00
Owen Anderson 48d58ad64c Rename ValuePropagation to a more descriptive CorrelatedValuePropagation.
llvm-svn: 112591
2010-08-31 07:48:34 +00:00
Owen Anderson 3997a07fb9 More Chris-inspired JumpThreading fixes: use ConstantExpr to correctly constant-fold undef, and be more careful with its return value.
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before.  This patch adds a tracking set to prevent this case.

llvm-svn: 112589
2010-08-31 07:36:34 +00:00
Owen Anderson 376597c13e Remove r111665, which implemented store-narrowing in InstCombine. Chris discovered a miscompilation in it, and it's not easily
fixable at the optimizer level. I'll investigate reimplementing it in DAGCombine.

llvm-svn: 112575
2010-08-31 04:41:06 +00:00
Anton Korobeynikov 3a1d87a7ba Fix borken test
llvm-svn: 112555
2010-08-30 23:41:49 +00:00
Owen Anderson 70b17c50e2 Combine these two tests, and make sure there's a newline at the end of the file.
llvm-svn: 112554
2010-08-30 23:37:41 +00:00
Bob Wilson 4cd8a126c3 Remove NEON vmovn intrinsic, replacing it with vector truncate operations.
Auto-upgrade the old intrinsic and update tests.

llvm-svn: 112507
2010-08-30 20:02:30 +00:00
Chris Lattner 34bfab0ad5 two changes:
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
  which was completely wrong.  Specifically, it doesn't 
  make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
  because it is for readonly data.  templates (it turns out)
  go to const_coal_nt.  The real fix for rdar://8018335 was
  to give ConstTextCoalSection a section kind of ReadOnly 
  instead of Text.

llvm-svn: 112496
2010-08-30 18:12:35 +00:00
Michael J. Spencer 2f997cdedf Partially revert r112480. Caused test failures.
llvm-svn: 112486
2010-08-30 15:34:08 +00:00
NAKAMURA Takumi e53cf6f85d coff-dump.py: Fix PR7996. Now it is compatible to Python-2.4.
llvm-svn: 112485
2010-08-30 15:19:56 +00:00
Michael J. Spencer 7983340465 Fix constant-over-index.ll test on windows.
llvm-svn: 112483
2010-08-30 15:08:02 +00:00
Michael J. Spencer 41c18853c8 Test: Fix LLVMC tests on CMake.
The CMake build didn't define TEST_COMPILE_CXX_CMD. The tests assumed gcc.

llvm-svn: 112480
2010-08-30 14:49:00 +00:00
Duncan Sands 68c30907cc Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Chris Lattner 263f804699 LICM does get dead instructions input to it. Instead of sinking them
out of loops, just delete them.

llvm-svn: 112451
2010-08-29 18:22:25 +00:00
Dan Gohman 3a08ed7904 Make IVUsers iterative instead of recursive.
This has the side effect of reversing the order of most of
IVUser's results.

llvm-svn: 112442
2010-08-29 16:40:03 +00:00
Dan Gohman 6665550bca Make this test less dependent on register allocation choices.
llvm-svn: 112426
2010-08-29 14:49:42 +00:00
Dan Gohman 883fa863f8 Use exec.
llvm-svn: 112425
2010-08-29 14:49:00 +00:00
Kalle Raiskila 1e616572d9 Fix lowering of INSERT_VECTOR_ELT in SPU.
The IDX was treated as byte index, not element index.

llvm-svn: 112422
2010-08-29 12:41:50 +00:00
Bob Wilson d0c054886c Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
2010-08-29 05:57:34 +00:00
Chris Lattner c2887bc283 merge a bunch of shuffle tests into sse2.ll
llvm-svn: 112398
2010-08-29 03:19:04 +00:00
Chris Lattner b1ff978406 add some nounwind's
llvm-svn: 112396
2010-08-29 03:07:47 +00:00
Chris Lattner 112b6ee3f2 fixme accomplished
llvm-svn: 112386
2010-08-28 20:40:28 +00:00
Chris Lattner 94656b1c8c fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner bcb6090ad0 fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Benjamin Kramer 2e5c14713c Update ocaml test.
llvm-svn: 112364
2010-08-28 10:29:41 +00:00
Chris Lattner 13ee795c42 remove unions from LLVM IR. They are severely buggy and not
being actively maintained, improved, or extended.

llvm-svn: 112356
2010-08-28 04:09:24 +00:00
Chris Lattner 504e5100d3 remove the ABCD and SSI passes. They don't have any clients that
I'm aware of, aren't maintained, and LVI will be replacing their value.
nlewycky approved this on irc.

llvm-svn: 112355
2010-08-28 03:51:24 +00:00
Chris Lattner d0214f3efe handle the constant case of vector insertion. For something
like this:

struct S { float A, B, C, D; };

struct S g;
struct S bar() { 
  struct S A = g;
  ++A.B;
  A.A = 42;
  return A;
}

we now generate:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	pshufd	$16, %xmm2, %xmm2
	movss	LCPI1_1(%rip), %xmm0
	pshufd	$16, %xmm0, %xmm0
	unpcklps	%xmm2, %xmm0
	ret

instead of:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	movd	%xmm2, %eax
	shlq	$32, %rax
	addq	$1109917696, %rax       ## imm = 0x42280000
	movd	%rax, %xmm0
	ret

llvm-svn: 112345
2010-08-28 01:50:57 +00:00
Chris Lattner dd6601048e optimize bitcasts from large integers to vector into vector
element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi.  We now compile something like
this:

struct S { float A, B, C, D; };
struct S g;
struct S bar() { 
  struct S A = g;
  ++A.A;
  ++A.C;
  return A;
}

into all nice vector operations:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	12(%rax), %xmm3
	pshufd	$16, %xmm2, %xmm2
	unpcklps	%xmm2, %xmm0
	addss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	pshufd	$16, %xmm3, %xmm2
	unpcklps	%xmm2, %xmm1
	ret

instead of icky integer operations:

_bar:                                   ## @bar
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %ecx
	movl	4(%rax), %edx
	movl	12(%rax), %esi
	shlq	$32, %rdx
	addq	%rcx, %rdx
	movd	%rdx, %xmm0
	addss	8(%rax), %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rsi
	addq	%rax, %rsi
	movd	%rsi, %xmm1
	ret

This resolves rdar://8360454

llvm-svn: 112343
2010-08-28 01:20:38 +00:00
Dan Gohman e06905d1f0 Completely disable tail calls when fast-isel is enabled, as fast-isel
doesn't currently support dealing with this.

llvm-svn: 112341
2010-08-28 00:51:03 +00:00
Owen Anderson cf7f941121 Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.

llvm-svn: 112325
2010-08-27 23:31:36 +00:00
Bob Wilson 13ce07fa92 Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like
all the other LDM/STM instructions.  This fixes asm printer crashes when
compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
was not aware of these special cases.  The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode.  I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322
2010-08-27 23:18:17 +00:00
Chris Lattner 954e9557e3 tidy up test.
llvm-svn: 112321
2010-08-27 23:15:21 +00:00
Chris Lattner b8b7d52631 no really, fix the test.
llvm-svn: 112317
2010-08-27 23:05:54 +00:00
Chris Lattner c8908b4cdb fix this test. It's not clear what it's really testing.
llvm-svn: 112316
2010-08-27 23:05:27 +00:00
Chris Lattner 6c1395f62a Enhance the shift propagator to handle the case when you have:
A = shl x, 42
...
B = lshr ..., 38

which can be transformed into:
A = shl x, 4
...

iff we can prove that the would-be-shifted-in bits
are already zero.  This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.

llvm-svn: 112314
2010-08-27 22:53:44 +00:00
Chris Lattner 18d7fc8fc6 Implement a pretty general logical shift propagation
framework, which is good at ripping through bitfield
operations.  This generalize a bunch of the existing
xforms that instcombine does, such as 
  (x << c) >> c -> and
to handle intermediate logical nodes.  This is useful for
ripping up the "promote to large integer" code produced by
SRoA.

llvm-svn: 112304
2010-08-27 22:24:38 +00:00
Chris Lattner 606b76eba6 merge and filecheckize test
llvm-svn: 112289
2010-08-27 20:44:45 +00:00
Chris Lattner c665156e9f merge two tests
llvm-svn: 112288
2010-08-27 20:42:10 +00:00
Chris Lattner 7398434675 teach the truncation optimization that an entire chain of
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.

llvm-svn: 112285
2010-08-27 20:32:06 +00:00
Chris Lattner 7413e87b6d get this test passing on linux builders.
llvm-svn: 112280
2010-08-27 18:49:08 +00:00
Chris Lattner 90cd746e63 Add an instcombine to clean up a common pattern produced
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:

   %94 = zext i16 %93 to i32                       ; <i32> [#uses=2]
   %96 = lshr i32 %94, 8                           ; <i32> [#uses=1]
   %101 = trunc i32 %96 to i8                      ; <i8> [#uses=1]

This also unblocks other xforms from happening, now clang is able to compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm2
	addss	%xmm0, %xmm2
	movdqa	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	pshufd	$1, %xmm1, %xmm0
	addss	%xmm3, %xmm0
	ret

on x86-64, instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

This seems pretty close to optimal to me, at least without
using horizontal adds.  This also triggers in lots of other
code, including SPEC.

llvm-svn: 112278
2010-08-27 18:31:05 +00:00
Bob Wilson edf722add3 Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Owen Anderson 6ebbd92380 Use LVI to eliminate conditional branches where we've tested a related condition previously. Update tests for this change.
This fixes PR5652.

llvm-svn: 112270
2010-08-27 17:12:29 +00:00