Commit Graph

3670 Commits

Author SHA1 Message Date
Dale Johannesen 367afb5a00 Remove the rest of the nonexistent 64-bit AVX instructions.
Bruno, please review.

llvm-svn: 113014
2010-09-03 21:23:00 +00:00
Jim Grosbach 03f4be86ba Re-apply r112883:
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."

r112986 fixed a latent bug exposed by the above.

llvm-svn: 112989
2010-09-03 18:37:12 +00:00
Daniel Dunbar 2ac3386ef3 Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6.
This reverts commit 8d6e29cfda270be483abf638850311670829ee65.

llvm-svn: 112962
2010-09-03 15:26:42 +00:00
NAKAMURA Takumi 24d039ebe3 test/CodeGen/X86: Add explicit -mtriple=(i686|x86_64)-linux for Win32 host.
llvm-svn: 112947
2010-09-03 03:24:08 +00:00
Bruno Cardoso Lopes d6634a5b2e AVX doesn't support mm operations neither its instrinsics.
The AVX versions of PALIGN and PABS* should only exist for
128-bit. Remove the unnecessary stuff.

llvm-svn: 112944
2010-09-03 02:08:45 +00:00
Bob Wilson f65c9ef720 Replace NEON vabdl, vaba, and vabal intrinsics with combinations of the
vabd intrinsic and add and/or zext operations.  In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests.  Auto-upgrade the old intrinsics.

llvm-svn: 112941
2010-09-03 01:35:08 +00:00
Anton Korobeynikov a5a645559c Properly emit __chkstk call instead of __alloca on non-mingw windows targets.
Patch by Cameron Esfahani!

llvm-svn: 112902
2010-09-02 23:03:46 +00:00
Jim Grosbach 7fd9aea67c For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.

rdar://7352504
rdar://8374540
rdar://8355680

llvm-svn: 112883
2010-09-02 22:29:01 +00:00
Dan Gohman 3c9b5f394b Don't narrow the load and store in a load+twiddle+store sequence unless
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.

This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.

llvm-svn: 112861
2010-09-02 21:18:42 +00:00
Sandeep Patel 0ca17f7e8a Fix an unnecessary XFAIL
llvm-svn: 112853
2010-09-02 20:19:24 +00:00
Jim Grosbach 66c681a644 Now that register allocation properly considers reserved regs, simplify the
ARM register class allocation order functions to take advantage of that.

llvm-svn: 112841
2010-09-02 18:14:29 +00:00
Bob Wilson 75a6408f88 Convert VLD1 and VLD2 instructions to use pseudo-instructions until
after regalloc.

llvm-svn: 112825
2010-09-02 16:00:54 +00:00
NAKAMURA Takumi a224e5563e test/loop-strength-reduce4: Add explicit triplet for Win32 host.
llvm-svn: 112802
2010-09-02 03:45:58 +00:00
NAKAMURA Takumi 54ce546865 test/twoaddr-coalesce: Do not use @main.
Win32 codegen emits implicit invoking __main into, to fail.

llvm-svn: 112801
2010-09-02 03:45:51 +00:00
Bob Wilson 38ab35a911 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Bruno Cardoso Lopes fea81b4831 Using target specific nodes for shuffle nodes makes the mask
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:

  movaps  (%rdi), %xmm0
  movaps  (%rax), %xmm1
  movaps  %xmm0, %xmm2
  movss %xmm1, %xmm2
  shufps  $36, %xmm2, %xmm0

now is generated as:

  movaps  (%rdi), %xmm0
  movaps  %xmm0, %xmm1
  movlps  (%rax), %xmm1
  shufps  $36, %xmm1, %xmm0

llvm-svn: 112753
2010-09-01 22:33:20 +00:00
Jakob Stoklund Olesen 4b6fd48bba Teach RemoveCopyByCommutingDef to check all aliases, not just subregisters.
This caused a miscompilation in WebKit where %RAX had conflicting defs when
RemoveCopyByCommutingDef was commuting a %EAX use.

llvm-svn: 112751
2010-09-01 22:15:35 +00:00
Chris Lattner 39eccb4754 temporarily revert r112664, it is causing a decoding conflict, and
the testcases should be merged.

llvm-svn: 112711
2010-09-01 16:00:50 +00:00
Dan Gohman 110ed64fbb Revert 112442 and 112440 until the compile time problems introduced
by 112440 are resolved.

llvm-svn: 112692
2010-09-01 01:45:53 +00:00
Bill Wendling 6789f8b6ae We have a chance for an optimization. Consider this code:
int x(int t) {
  if (t & 256)
    return -26;
  return 0;
}

We generate this:

     tst.w   r0, #256
     mvn     r0, #25
     it      eq
     moveq   r0, #0

while gcc generates this:

     ands    r0, r0, #256
     it      ne
     mvnne   r0, #25
     bx      lr

Scandalous really!

During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):

  %r0 = ISD::AND ...
  ARMISD::CMPZ %r0, 0         @ sets [CPSR]
  %r0 = ARMISD::MOVCC 0, -26  @ reads [CPSR]

All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!

llvm-svn: 112664
2010-08-31 22:41:22 +00:00
Jim Grosbach ad9b6de3b6 Update test for 112609
llvm-svn: 112610
2010-08-31 17:58:47 +00:00
Anton Korobeynikov 3a1d87a7ba Fix borken test
llvm-svn: 112555
2010-08-30 23:41:49 +00:00
Bob Wilson 4cd8a126c3 Remove NEON vmovn intrinsic, replacing it with vector truncate operations.
Auto-upgrade the old intrinsic and update tests.

llvm-svn: 112507
2010-08-30 20:02:30 +00:00
Chris Lattner 34bfab0ad5 two changes:
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
  which was completely wrong.  Specifically, it doesn't 
  make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
  because it is for readonly data.  templates (it turns out)
  go to const_coal_nt.  The real fix for rdar://8018335 was
  to give ConstTextCoalSection a section kind of ReadOnly 
  instead of Text.

llvm-svn: 112496
2010-08-30 18:12:35 +00:00
Duncan Sands 68c30907cc Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Dan Gohman 3a08ed7904 Make IVUsers iterative instead of recursive.
This has the side effect of reversing the order of most of
IVUser's results.

llvm-svn: 112442
2010-08-29 16:40:03 +00:00
Dan Gohman 6665550bca Make this test less dependent on register allocation choices.
llvm-svn: 112426
2010-08-29 14:49:42 +00:00
Kalle Raiskila 1e616572d9 Fix lowering of INSERT_VECTOR_ELT in SPU.
The IDX was treated as byte index, not element index.

llvm-svn: 112422
2010-08-29 12:41:50 +00:00
Bob Wilson d0c054886c Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
2010-08-29 05:57:34 +00:00
Chris Lattner c2887bc283 merge a bunch of shuffle tests into sse2.ll
llvm-svn: 112398
2010-08-29 03:19:04 +00:00
Chris Lattner b1ff978406 add some nounwind's
llvm-svn: 112396
2010-08-29 03:07:47 +00:00
Chris Lattner 94656b1c8c fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner bcb6090ad0 fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Dan Gohman e06905d1f0 Completely disable tail calls when fast-isel is enabled, as fast-isel
doesn't currently support dealing with this.

llvm-svn: 112341
2010-08-28 00:51:03 +00:00
Bob Wilson 13ce07fa92 Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like
all the other LDM/STM instructions.  This fixes asm printer crashes when
compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
was not aware of these special cases.  The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode.  I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322
2010-08-27 23:18:17 +00:00
Chris Lattner 7413e87b6d get this test passing on linux builders.
llvm-svn: 112280
2010-08-27 18:49:08 +00:00
Bob Wilson edf722add3 Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Daniel Dunbar 1844a71e66 X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.
llvm-svn: 112250
2010-08-27 01:30:14 +00:00
Chris Lattner af23e9a798 Add a hackaround for PR7993 which is causing failures on x86 builders that lack sse2.
llvm-svn: 112175
2010-08-26 06:57:07 +00:00
Chris Lattner 66afba7aa4 I think enough general codegen bugs are fixed to allow this to work
on random hosts, lets see!

llvm-svn: 112172
2010-08-26 05:52:42 +00:00
Chris Lattner eb2cc0ce0e implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Chris Lattner 825294b85f Make sure this forces the x86 targets
llvm-svn: 112169
2010-08-26 05:25:05 +00:00
Chris Lattner cc60609cb4 fix sse1 only codegen in x86-64 mode, which is something we
apparently try to support.

llvm-svn: 112168
2010-08-26 05:24:29 +00:00
Jim Grosbach 08da771ec3 Enable pre-RA virtual frame base register allocation. rdar://8277890
llvm-svn: 112127
2010-08-26 00:58:06 +00:00
Bob Wilson 4629f423f8 Revert svn 107892 (with changes to work with trunk). It caused a crash if
a VLD result was not used (Radar 8355607).  It should also fix pr7988, but
I haven't verified that yet.

llvm-svn: 112118
2010-08-26 00:13:36 +00:00
Chris Lattner c7fb446a9d temporarily disable this, which started failing on the llvm-i686-linux
builder.  I will investigate tonight.

llvm-svn: 112113
2010-08-25 23:43:14 +00:00
Chris Lattner 75ff053497 Change handling of illegal vector types to widen when possible instead of
expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats.  This
affects two places in the code: handling cross block values and handling
function return and arguments.  Since vectors are already widened by 
legalizetypes, this gives us much better code and unblocks x86-64 abi
and SPU abi work.

For example, this (which is a silly example of a cross-block value):
define <4 x float> @test2(<4 x float> %A) nounwind {
 %B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1>
 %C = fadd <2 x float> %B, %B
  br label %BB
BB:
 %D = fadd <2 x float> %C, %C
 %E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
 ret <4 x float> %E
}

Now compiles into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 addps %xmm0, %xmm0
 ret

previously it compiled into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 pshufd $1, %xmm0, %xmm1
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
 insertps $0, %xmm0, %xmm0
 insertps $16, %xmm1, %xmm0
 addps %xmm0, %xmm0
 ret

This implements rdar://8230384

llvm-svn: 112101
2010-08-25 22:49:25 +00:00
Daniel Dunbar a54a1b0edf ARM/Thumb2: Fix a misselect in getARMCmp, when attempting to adjust a signed
comparison that would overflow.
 - The other under/overflow cases can't actually happen because the immediates
   which would trigger them are legal (so we don't enter this code), but
   adjusted the style to make it clear the transform is always valid.

llvm-svn: 112053
2010-08-25 16:58:05 +00:00
Eric Christopher 6b1533a1a9 Add another basic test cribbed from the x86 fast-isel tests.
llvm-svn: 112036
2010-08-25 07:57:29 +00:00
Eric Christopher 37d547aee6 Run this on thumb and arm.
llvm-svn: 112035
2010-08-25 07:53:15 +00:00
Eric Christopher e58c03698e Make this testcase actually executed with fast-isel on arm.
llvm-svn: 112033
2010-08-25 07:47:00 +00:00
Bruno Cardoso Lopes 0bc919fa35 Convert test to use filecheck and make it more specific
llvm-svn: 112016
2010-08-25 01:47:16 +00:00
Dan Gohman c88fda477a Fix X86's isLegalAddressingMode to recognize that static addresses
need not be RIP-relative in small mode.

llvm-svn: 111917
2010-08-24 15:55:12 +00:00
Kalle Raiskila 7e25bc4145 Fix SPU BE to use all the available return registers.
llc used to assert on the added testcase.

llvm-svn: 111911
2010-08-24 11:50:48 +00:00
Chris Lattner 58bd73a5a7 Add a new llvm.x86.int intrinsic, allowing access to the
x86 int and int3 instructions.  Patch by Peter Housel!

llvm-svn: 111831
2010-08-23 19:39:25 +00:00
Dan Gohman 42ef669d81 Fix x86 fast-isel's cmp+branch folding to avoid folding when the
comparison is in a different basic block from the branch. In such
cases, the comparison's operands may not have initialized virtual
registers available.

llvm-svn: 111709
2010-08-21 02:32:36 +00:00
Bob Wilson be745d8c00 Replace some NEON vmovl intrinsic that I missed earlier.
llvm-svn: 111696
2010-08-20 23:22:43 +00:00
Bob Wilson 9a511c07e4 Replace the arm.neon.vmovls and vmovlu intrinsics with vector sign-extend and
zero-extend operations.

llvm-svn: 111614
2010-08-20 04:54:02 +00:00
Evan Cheng 361b9be7c6 It's possible to sink a def if its local uses are PHI's.
llvm-svn: 111537
2010-08-19 18:33:29 +00:00
Dan Gohman 82656fb0e1 When sending stats output to stdout for grepping, don't emit normal
output to standard output also.

llvm-svn: 111435
2010-08-18 22:22:44 +00:00
Dan Gohman 2470818942 When sending stats output to stdout for grepping, don't emit normal
output to standard output also.

llvm-svn: 111401
2010-08-18 20:32:46 +00:00
Kalle Raiskila e60b5161d1 Fix a bug with insertelement on SPU.
The previous algorithm in LowerVECTOR_SHUFFLE 
didn't check all requirements for "monotonic" shuffles.

llvm-svn: 111361
2010-08-18 10:20:29 +00:00
Kalle Raiskila ab49360f59 Remove all traces of v2[i,f]32 on SPU.
The "half vectors" are now widened to full size by the legalizer.
The only exception is in parameter passing, where half vectors are 
expanded. This causes changes to some dejagnu tests.

llvm-svn: 111360
2010-08-18 10:04:39 +00:00
Kalle Raiskila f3984d1ef6 Change SPU C calling convention to match that described in
"SPU Application Binary Interface Specification, v1.9" by
IBM. 
Specifically: use r3-r74 to pass parameters and the return value.

llvm-svn: 111358
2010-08-18 09:50:30 +00:00
Bob Wilson fb7eaff759 Expand ZERO_EXTEND operations for NEON vector types.
Testcase from Nick Lewycky.

llvm-svn: 111341
2010-08-18 01:45:52 +00:00
Dan Gohman ed2b005842 Tweak IVUsers' concept of "interesting" to exclude add recurrences
where the step value is an induction variable from an outer loop, to
avoid trouble trying to re-expand such expressions. This effectively
hides such expressions from indvars and lsr, which prevents them
from getting into trouble.

llvm-svn: 111317
2010-08-17 22:50:37 +00:00
Evan Cheng efdc74ea59 Add nounwind.
llvm-svn: 111312
2010-08-17 22:35:20 +00:00
Dale Johannesen 16f96445c3 Make fast scheduler handle asm clobbers correctly.
PR 7882.  Follows suggestion by Amaury Pouly, thanks.

llvm-svn: 111306
2010-08-17 22:17:24 +00:00
Bob Wilson 942b10f511 Change ARM PKHTB and PKHBT instructions to use a shift_imm operand to avoid
printing "lsl #0".  This fixes the remaining parts of pr7792.  Make
corresponding changes for encoding/decoding these instructions.

llvm-svn: 111251
2010-08-17 17:23:19 +00:00
Bob Wilson 411dfad981 Allow more cases of undef shuffle indices and add tests for them.
llvm-svn: 111226
2010-08-17 05:54:34 +00:00
Evan Cheng f259efde47 PHI elimination should not break back edge. It can cause some significant code placement issues. rdar://8263994
good:
LBB0_2:
  mov     r2, r0
  . . .
  mov     r1, r2
  bne     LBB0_2

bad:
LBB0_2:
  mov     r2, r0
  . . .
@ BB#3:
  mov     r1, r2
  b       LBB0_2

llvm-svn: 111221
2010-08-17 01:20:36 +00:00
Bob Wilson eee4824f74 Add a testcase for svn 111208.
llvm-svn: 111212
2010-08-16 23:44:29 +00:00
Bob Wilson 804f6159f1 Generalize a pattern for PKHTB: an SRL of 16-31 bits will guarantee
that the high halfword is zero.  The shift need not be exactly 16 bits.

llvm-svn: 111196
2010-08-16 22:26:55 +00:00
Bob Wilson 3fd1e0dcda Convert test to FileCheck.
llvm-svn: 111195
2010-08-16 22:21:13 +00:00
Bob Wilson 8f553757c4 Convert a test to use FileCheck.
llvm-svn: 111153
2010-08-16 17:05:27 +00:00
Benjamin Kramer cbc55d9dc0 Test expects SSE, give him SSE.
llvm-svn: 111115
2010-08-15 23:32:03 +00:00
Benjamin Kramer 4566466b7f Restore arch on these test, they fail on arm.
llvm-svn: 111109
2010-08-15 20:42:56 +00:00
Dale Johannesen 339423c460 Mark as XFAIL on darwin 8. PR 7886.
llvm-svn: 111108
2010-08-15 19:40:29 +00:00
Bob Wilson 3c9ed76ba5 Temporarily disable tail calls on ARM to work around some linker problems.
llvm-svn: 111050
2010-08-13 22:43:33 +00:00
Dale Johannesen 8d3c89e765 Revert 110491. While not wrong, it was based on a
misanalysis and is undesirable.

llvm-svn: 111028
2010-08-13 18:43:45 +00:00
Bruno Cardoso Lopes 7f704b31a9 - Teach SSEDomainFix to switch between different levels of AVX instructions. Here we guess that AVX will have domain issues, so just implement them for consistency and in the future we remove if it's unnecessary.
- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.

llvm-svn: 110946
2010-08-12 20:20:53 +00:00
Bruno Cardoso Lopes 7306c86886 Begin to support some vector operations for AVX 256-bit intructions. The long
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897
2010-08-12 02:06:36 +00:00
Devang Patel 48595bf2bc This is x86 only test.
llvm-svn: 110887
2010-08-12 00:17:38 +00:00
Bruno Cardoso Lopes 1675ee7a02 Add testcases for all AVX 256-bit intrinsics added in the last couple days
llvm-svn: 110854
2010-08-11 21:12:09 +00:00
Bruno Cardoso Lopes 29c8818ad9 Reapply r109881 using a more strict command line for llc.
llvm-svn: 110833
2010-08-11 17:39:23 +00:00
Jim Grosbach a5f923b1a1 fix silly typo
llvm-svn: 110831
2010-08-11 17:32:46 +00:00
Jim Grosbach 2bf8bd1e19 Add a target triple, as the runtime library invocation varies a bit by
platform. It's apparently "bl __muldf3" on linux, for example. Since that's
not what we're checking here, it's more robust to just force a triple. We
just wwant to check that the inline FP instructions are only generated
on cpus that have them."

llvm-svn: 110830
2010-08-11 17:31:12 +00:00
Evan Cheng b0276814d5 Fix test and re-enable it.
llvm-svn: 110829
2010-08-11 17:25:51 +00:00
Dan Gohman 4df4114870 Temporarily disable some failing tests, until they can be
properly investigated.

llvm-svn: 110825
2010-08-11 16:36:07 +00:00
Jim Grosbach 4d5dc3e7e5 cortex m4 has floating point support, but only single precision.
llvm-svn: 110810
2010-08-11 15:44:15 +00:00
Dan Gohman f3d783a6d2 Temporarily disable some failing tests, until they can be
properly investigated.

llvm-svn: 110808
2010-08-11 15:09:00 +00:00
Bill Wendling 6a98131468 Consider this code snippet:
float t1(int argc) {
  return (argc == 1123) ? 1.234f : 2.38213f;
}

We would generate truly awful code on ARM (those with a weak stomach should look
away):

_t1:
  movw   r1, #1123
  movs   r2, #1
  movs   r3, #0
  cmp    r0, r1
  mov.w  r0, #0
  it     eq
  moveq  r0, r2
  movs   r1, #4
  cmp    r0, #0
  it     ne
  movne  r3, r1
  adr    r0, #LCPI1_0
  ldr    r0, [r0, r3]
  bx     lr

The problem was that legalization was creating a cascade of SELECT_CC nodes, for
for the comparison of "argc == 1123" which was fed into a SELECT node for the ?:
statement which was itself converted to a SELECT_CC node. This is because the
ARM back-end doesn't have custom lowering for SELECT nodes, so it used the
default "Expand".

I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this
testcase, but can obviously be expanded to include more cases.

Now we generate this, which looks optimal to me:

_t1:
  movw   r1, #1123
  movs   r2, #0
  cmp    r0, r1
  adr    r0, #LCPI0_0
  it     eq
  moveq  r2, #4
  ldr    r0, [r0, r2]
  bx     lr
  .align  2
LCPI0_0:
  .long   1075344593  @ float 2.382130e+00
  .long   1067316150  @ float 1.234000e+00

llvm-svn: 110799
2010-08-11 08:43:16 +00:00
Evan Cheng 5190f09291 Report error if codegen tries to instantiate a ARM target when the cpu does support it. e.g. cortex-m* processors.
llvm-svn: 110798
2010-08-11 07:17:46 +00:00
Evan Cheng 40921a4e62 Add ARM Archv6M and let it implies FeatureDB (having dmb, etc.)
llvm-svn: 110795
2010-08-11 06:51:54 +00:00
Evan Cheng 49e02fc414 Add Cortex-M0 support. It's a ARMv6m device (no ARM mode) with some 32-bit
instructions: dmb, dsb, isb, msr, and mrs.

llvm-svn: 110786
2010-08-11 06:30:38 +00:00
Evan Cheng 6e809de90c - Add subtarget feature -mattr=+db which determine whether an ARM cpu has the
memory and synchronization barrier dmb and dsb instructions.
- Change instruction names to something more sensible (matching name of actual
  instructions).
- Added tests for memory barrier codegen.

llvm-svn: 110785
2010-08-11 06:22:01 +00:00
Bill Wendling 79937dfc5b Update test to match output of optimize compares for ARM.
llvm-svn: 110765
2010-08-11 01:05:02 +00:00
Bill Wendling 871d4e1170 The optimize comparisons pass removes the "cmp" instruction this is checking for.
llvm-svn: 110739
2010-08-10 22:16:05 +00:00
Evan Cheng 3f251fb26e Re-apply r110655 with fixes. Epilogue must restore sp from fp if the function stack frame has a var-sized object.
Also added a test case to check for the added benefit of this patch: it's optimizing away the unnecessary restore of sp from fp for some non-leaf functions.

llvm-svn: 110707
2010-08-10 19:30:19 +00:00
Daniel Dunbar 0dd47bfca3 Revert r110655, "Fix ARM hasFP() semantics. It should return true whenever FP
register is", it breaks a couple test-suite tests.

llvm-svn: 110701
2010-08-10 18:32:02 +00:00
Jakob Stoklund Olesen 5730846c2f Fix test for more architectures. Patch by Tobias Grosser.
llvm-svn: 110685
2010-08-10 16:48:24 +00:00
Tobias Grosser fedeff8015 Fix failing testcase.
Those look like typos to me.

llvm-svn: 110664
2010-08-10 09:54:29 +00:00
Devang Patel b219746c80 Handle TAG_constant for integers.
llvm-svn: 110656
2010-08-10 07:11:13 +00:00
Evan Cheng 8d5d1c1331 Fix ARM hasFP() semantics. It should return true whenever FP register is
reserved, not available for general allocation. This eliminates all the
extra checks for Darwin.

This change also fixes the use of FP to access frame indices in leaf
functions and cleaned up some confusing code in epilogue emission.

llvm-svn: 110655
2010-08-10 06:26:49 +00:00
Kalle Raiskila 999da1f3a0 Have SPU handle halfvec stores aligned by 8 bytes.
llvm-svn: 110576
2010-08-09 16:33:00 +00:00
Dale Johannesen a3bd31a923 Use sdmem and sse_load_f64 (etc.) for the vector
form of CMPSD (etc.)  Matching a 128-bit memory
operand is wrong, the instruction uses only 64 bits
(same as ADDSD etc.)  8193553.

llvm-svn: 110491
2010-08-07 00:33:42 +00:00
Rafael Espindola 027d5bcf89 Fix eabi calling convention when a 64 bit value shadows r3.
Without this what was happening was:

* R3 is not marked as "used"
* ARM backend thinks it has to save it to the stack because of vaarg
* Offset computation correctly ignores it
* Offsets are wrong

llvm-svn: 110446
2010-08-06 15:35:32 +00:00
Eric Christopher e1fb772aa5 Add an option to always emit realignment code for a particular module.
llvm-svn: 110404
2010-08-05 23:57:43 +00:00
Devang Patel cc3f3b341d Move x86 specific tests into test/CodeGen/X86.
llvm-svn: 110372
2010-08-05 20:25:37 +00:00
Dan Gohman c53ee449a5 Move x86-specific tests out of test/Transforms/LoopStrengthReduce and
into test/CodeGen/X86, so that they aren't run when the x86 target is
not enabled.

Fix uglygep.ll to not be x86-specific.

llvm-svn: 110343
2010-08-05 17:04:15 +00:00
Daniel Dunbar e62e664656 tests: CodeGen/X86/GC tests require X86.
llvm-svn: 110338
2010-08-05 15:45:33 +00:00
Bill Wendling ca1cb13646 The lower invoke pass needs to have unreachable code elimination run after it
because it could create such things. This fixes a MingW buildbot test failure.

llvm-svn: 110279
2010-08-04 23:36:02 +00:00
Eli Friedman 39d0f57cab PR7814: Truncates cannot be ignored for signed comparisons.
llvm-svn: 110268
2010-08-04 22:40:58 +00:00
Bill Wendling 26feb849a4 Testcase for r110248.
llvm-svn: 110249
2010-08-04 21:56:30 +00:00
Stuart Hastings cba0d06b7c call-imm.ll test case regex fix. Patch by Dimitry Andric!
llvm-svn: 110199
2010-08-04 15:31:35 +00:00
Kalle Raiskila 8b2f70125f Make SPU backend handle insertelement and
store for "half vectors"

llvm-svn: 110198
2010-08-04 13:59:48 +00:00
Bob Wilson 79daf7e0ae Combine NEON VABD (absolute difference) intrinsics with ADDs to make VABA
(absolute difference with accumulate) intrinsics.  Radar 8228576.

llvm-svn: 110170
2010-08-04 00:12:08 +00:00
Jakob Stoklund Olesen 011ff9bec9 OK, that's it. This test is going away now. But don't worry, I am taking it to a
nice farm in the country where it can play with other tests. And bunnies.

It is not clear what is being tested, and the revision history shows a bunch of
random changes to the expected instruction count. Clearly, we are just fudging
it to pass whenever it fails.

llvm-svn: 110118
2010-08-03 17:21:14 +00:00
Kalle Raiskila 77558b7d13 More SPU v2f32 stuff added: insertelement and shuffle.
llvm-svn: 110038
2010-08-02 11:22:10 +00:00
Kalle Raiskila 68b3886678 Add preliminary v2f32 support for SPU. Like with v2i32, we just
duplicate the instructions and operate on half vectors. 

Also reorder code in SPUInstrInfo.td for better coherency.

llvm-svn: 110037
2010-08-02 10:25:47 +00:00
Kalle Raiskila 622f8eb981 Add preliminary v2i32 support for SPU backend. As there are no
such registers in SPU, this support boils down to "emulating" 
them by duplicating instructions on the general purpose registers. 

This adds the most basic operations on v2i32: passing parameters,
addition, subtraction, multiplication and a few others.

llvm-svn: 110035
2010-08-02 08:54:39 +00:00
Eli Friedman 7595ce05a2 PR7781: Fix incorrect shifting in PPCTargetLowering::LowerBUILD_VECTOR.
llvm-svn: 109998
2010-08-02 00:18:19 +00:00
Eli Friedman 1b2bc1b844 PR7774: Fix undefined shifts in Alpha backend. As a bonus, this actually
improves the generated code in some cases.

llvm-svn: 109985
2010-08-01 21:13:28 +00:00
Bob Wilson 66161f5eb4 Revert new AVX intrinsic tests. They are breaking buildbots and Bruno is
away from a computer now.
--- Reverse-merging r109881 into '.':
D    test/CodeGen/X86/avx-intrinsics-x86.ll
D    test/CodeGen/X86/avx-intrinsics-x86_64.ll

llvm-svn: 109959
2010-07-31 22:36:03 +00:00
Bruno Cardoso Lopes 92941fdb26 A *bunch* of tests for AVX intrinsics
llvm-svn: 109881
2010-07-30 19:57:56 +00:00
Eli Friedman ffe64c06ef Fix for bug reported by Evzen Muller on llvm-commits: make sure to correctly
check the range of the constant when optimizing a comparison between a
constant and a sign_extend_inreg node.

llvm-svn: 109854
2010-07-30 06:44:31 +00:00
Jim Grosbach d343166a0b Many Thumb2 instructions can reference the full ARM register set (i.e.,
have 4 bits per register in the operand encoding), but have undefined
behavior when the operand value is 13 or 15 (SP and PC, respectively).
The trivial coalescer in linear scan sometimes will merge a copy from
SP into a subsequent instruction which uses the copy, and if that
instruction cannot legally reference SP, we get bad code such as:
  mls r0,r9,r0,sp
instead of:
  mov r2, sp
  mls r0, r9, r0, r2

This patch adds a new register class for use by Thumb2 that excludes
the problematic registers (SP and PC) and is used instead of GPR
for those operands which cannot legally reference PC or SP. The
trivial coalescer explicitly requires that the register class
of the destination for the COPY instruction contain the source
register for the COPY to be considered for coalescing. This prevents
errant instructions like that above.

PR7499

llvm-svn: 109842
2010-07-30 02:41:01 +00:00
Dale Johannesen 2bff50546c Implement vector constants which are splat of
integers with mov + vdup.  8003375.  This is
currently disabled by default because LICM will
not hoist a VDUP, so it pessimizes the code if
the construct occurs inside a loop (8248029).

llvm-svn: 109799
2010-07-29 20:10:08 +00:00
Nate Begeman 53afc8f06a Implement a vectorized algorithm for <16 x i8> << <16 x i8>
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
2010-07-28 00:21:48 +00:00
Nate Begeman 269a6da023 ~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches.
For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549
2010-07-27 22:37:06 +00:00
Nate Begeman 317b969ac5 Fix a crash in the dag combiner caused by ConstantFoldBIT_CONVERTofBUILD_VECTOR calling itself
recursively and returning a SCALAR_TO_VECTOR node, but assuming the input was always a BUILD_VECTOR.

llvm-svn: 109519
2010-07-27 18:02:18 +00:00
Anton Korobeynikov 6bcea068db Currently EH lowering code expects typeinfo to be global only.
This assumption is not satisfied due to global mergeing.
Workaround the issue by temporary disablinge mergeing of const globals.
Also, ignore LLVM "special" globals. This fixes PR7716

llvm-svn: 109423
2010-07-26 18:45:39 +00:00
Evan Cheng df907f4594 - Allow target to specify when is register pressure "too high". In most cases,
it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279
2010-07-23 22:39:59 +00:00
Dan Gohman 55e244698a Use the proper type for shift counts. This fixes a bootstrap error.
llvm-svn: 109265
2010-07-23 21:08:12 +00:00
Dan Gohman 0818684a70 DAGCombine (shl (anyext x, c)) to (anyext (shl x, c)) if the high bits
are not demanded. This often allows the anyext to be folded away.

llvm-svn: 109242
2010-07-23 18:03:30 +00:00
Eric Christopher 9a77382685 Custom lower the memory barrier instructions and add support
for lowering without sse2.  Add a couple of new testcases.

Fixes a few libgomp tests and latent bugs.  Remove a few todos.

llvm-svn: 109078
2010-07-22 02:48:34 +00:00
Evan Cheng 285903853f More register pressure aware scheduling work.
llvm-svn: 109064
2010-07-21 23:53:58 +00:00
Eric Christopher 84bdfd80df Baby steps towards ARM fast-isel.
llvm-svn: 109047
2010-07-21 22:26:11 +00:00
Rafael Espindola 4277e14dc4 Fix calling convention on ARM if vfp2+ is enabled.
llvm-svn: 109009
2010-07-21 11:38:30 +00:00
Dan Gohman 625fd2292d Fix SCEV denormalization of expressions where the exit value from
one loop is involved in the increment of an addrec for another
loop. This fixes rdar://8168938.

llvm-svn: 108863
2010-07-20 17:06:20 +00:00
Jim Grosbach badf087e45 update tests for smarter BIC usage
llvm-svn: 108846
2010-07-20 16:16:48 +00:00
Duncan Sands 2e839de377 The same problem was being tracked in PR7652.
llvm-svn: 108843
2010-07-20 15:52:32 +00:00
Bruno Cardoso Lopes 160695fecb Fix PR7174, a couple o Mips fixes:
- Fix a typo for PIC check during jmp table lowering
- Also fix the "first jump table basic block is not
considered only reachable by fall through" problem, use this
ad-hoc solution until I come up with something better.

Patch by stetorvs@gmail.com

llvm-svn: 108820
2010-07-20 08:37:04 +00:00
Bruno Cardoso Lopes ea7863647b Fix Mips PR7473. Patch by stetorvs@gmail.com
llvm-svn: 108816
2010-07-20 07:58:51 +00:00
Dan Gohman b5e918dc05 After a custom inserter, in a block which has constant instructions,
update the current basic block in addition to the current insert
position, so that they remain consistent. This fixes rdar://8204072.

llvm-svn: 108765
2010-07-19 22:48:56 +00:00
Owen Anderson 9c271e2835 Remove r108639 now that it is handled by InstCombine instead.
llvm-svn: 108688
2010-07-19 08:10:24 +00:00
Owen Anderson 41670a11a8 Add a testcase for r108639.
llvm-svn: 108640
2010-07-18 08:57:19 +00:00
Jim Grosbach b97e2bbe32 Add combiner patterns to more effectively utilize the BFI (bitfield insert)
instruction for non-constant operands. This includes the case referenced
in the README.txt regarding a bitfield copy.

llvm-svn: 108608
2010-07-17 03:30:54 +00:00
Jim Grosbach 11013eda5a Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction
and a combine pattern to use it for setting a bit-field to a constant
value. More to come for non-constant stores.

llvm-svn: 108570
2010-07-16 23:05:05 +00:00
Bill Wendling bf8370ff36 Consider this function:
void foo() { __builtin_unreachable(); }

It will output the following on Darwin X86:

_func1:
Leh_func_begin0:
        pushq %rbp
Ltmp0:
        movq %rsp, %rbp
Ltmp1:
Leh_func_end0:

This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.

llvm-svn: 108568
2010-07-16 22:51:10 +00:00