Commit Graph

4950 Commits

Author SHA1 Message Date
Craig Topper 6612e35b0d Add support for breaking 256-bit v16i16 and v32i8 VSETCC into two 128-bit ones, avoiding sclarization. Add vex form of pcmpeqq and pcmpgtq. Fixes more cases for PR10712.
llvm-svn: 138321
2011-08-23 04:36:33 +00:00
Bruno Cardoso Lopes 2a3ffb5d97 Introduce a pass to insert vzeroupper instructions to avoid AVX to
SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.

llvm-svn: 138317
2011-08-23 01:14:17 +00:00
Bruno Cardoso Lopes 74f090d44c Add support for breaking 256-bit int VETCC into two 128-bit ones,
avoding scalarization of the compare. Reduces code from 59 to 6
instructions. Fix PR10712.

llvm-svn: 138271
2011-08-22 20:31:04 +00:00
Chad Rosier d6641af80c With the fix in r138164: "Add <imp-def> operands to QQ and QQQQ stack loads."
-verify-machineinstrs can be enabled for this test case.

llvm-svn: 138171
2011-08-20 00:34:45 +00:00
Chad Rosier be7625161e VMOVQQQQs pseudo instructions are only created by ARMBaseInstrInfo::copyPhysReg.
Therefore, rather then generate a pseudo instruction, which is later expanded,
generate the necessary instructions in place.

llvm-svn: 138163
2011-08-20 00:17:25 +00:00
Devang Patel 59e27c5f12 Do not use named md nodes to track variables that are completely optimized. This does not scale while doing LTO with debug info. New approach is to include list of variables in the subprogram info directly.
llvm-svn: 138145
2011-08-19 23:28:12 +00:00
Jim Grosbach 8d77bb5f06 Use regex to remove false dependencies on register allocation.
llvm-svn: 138137
2011-08-19 23:10:31 +00:00
Jim Grosbach 066e9ec1e4 Update tests.
llvm-svn: 138116
2011-08-19 22:19:48 +00:00
Jakob Stoklund Olesen 90b6018c8f Add test case for r138018.
llvm-svn: 138033
2011-08-19 04:30:24 +00:00
Akira Hatanaka fb4161ae88 Use subword loads instead of a 4-byte load when the size of a structure (or a
piece of it) that is being passed by value is smaller than a word.

llvm-svn: 138007
2011-08-18 23:39:37 +00:00
Ivan Krasin d7cbd4c518 FastISel: avoid function calls between the materialization of the constant and its use.
llvm-svn: 137993
2011-08-18 22:06:10 +00:00
Jim Grosbach 90103ccc05 Thumb assembly parsing and encoding for LDM instruction.
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.

llvm-svn: 137986
2011-08-18 21:50:53 +00:00
Richard Osborne 56f3b70225 Add intrinsics for SETEV, GETED, GETET.
llvm-svn: 137938
2011-08-18 13:00:48 +00:00
Bruno Cardoso Lopes 3c7d6eb64c Cleanup vector logical ops in AVX and add use int versions for simple
v2i64

llvm-svn: 137919
2011-08-18 02:11:34 +00:00
Bruno Cardoso Lopes 1a87fcb9ba Fix PR10688. Add support for spliting 256-bit vector shifts when the
shift amount is variable

llvm-svn: 137885
2011-08-17 22:12:20 +00:00
Jim Grosbach e2a0404a69 Thumb assembly parsing and encoding for ADR.
llvm-svn: 137864
2011-08-17 20:37:40 +00:00
Bruno Cardoso Lopes be5e987379 Introduce matching patterns for vbroadcast AVX instruction. The idea is to
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.

llvm-svn: 137810
2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes 3400825b41 Update test to not use the scalar type to splat from a load
llvm-svn: 137809
2011-08-17 02:29:15 +00:00
Bruno Cardoso Lopes ed786a346e Now that we have a canonical way to handle 256-bit splats:
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!

llvm-svn: 137807
2011-08-17 02:29:10 +00:00
Akira Hatanaka 5360f88355 Add support for ext and ins.
llvm-svn: 137804
2011-08-17 02:05:42 +00:00
Bruno Cardoso Lopes 2e99f1b3aa Instead of always leaving the work to the generic legalizer when
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:

For this shuffle:
  shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
                <i32 1, i32 0, i32 7, i32 6>

This was expanded to:
  vextractf128  $1, %ymm1, %xmm2
  vpextrq $0, %xmm2, %rax
  vmovd %rax, %xmm1
  vpextrq $1, %xmm2, %rax
  vmovd %rax, %xmm2
  vpunpcklqdq %xmm1, %xmm2, %xmm1
  vpextrq $0, %xmm0, %rax
  vmovd %rax, %xmm2
  vpextrq $1, %xmm0, %rax
  vmovd %rax, %xmm0
  vpunpcklqdq %xmm2, %xmm0, %xmm0
  vinsertf128 $1, %xmm1, %ymm0, %ymm0
  ret

Now we get:
  vshufpd $1, %xmm0, %xmm0, %xmm0
  vextractf128  $1, %ymm1, %xmm1
  vshufpd $1, %xmm1, %xmm1, %xmm1
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

llvm-svn: 137733
2011-08-16 18:21:54 +00:00
Akira Hatanaka 7d7bec5acf Add test case for r137711.
llvm-svn: 137725
2011-08-16 17:32:01 +00:00
Akira Hatanaka 2263c10946 Fix handling of double precision loads and stores when Mips1 is targeted.
Mips1 does not support double precision loads or stores, therefore two single
precision loads or stores must be used in place of these instructions. This 
patch treats double precision loads and stores as if they are legal
instructions until MCInstLowering, instead of generating the single precision
instructions during instruction selection or Prolog/Epilog code insertion.

Without the changes made in this patch, llc produces code that has the same 
problem described in r137484 or bails out when
MipsInstrInfo::storeRegToStackSlot or loadRegFromStackSlot is called before
register allocation.

llvm-svn: 137711
2011-08-16 03:51:51 +00:00
Bruno Cardoso Lopes cbe7feeab9 Fix PR10656. It's only profitable to use 128-bit inserts and extracts
when AVX mode is one. Otherwise is just more work for the type
legalizer.

llvm-svn: 137661
2011-08-15 21:45:54 +00:00
Eric Christopher 8c5f3f7624 Fix this test to avoid leaving a temporary file behind.
llvm-svn: 137651
2011-08-15 20:55:03 +00:00
Bob Wilson d1de7764be Expand VMOVQQQQ pseudo instructions.
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed.  Our register allocator must be awesome!

llvm-svn: 137551
2011-08-13 05:14:55 +00:00
Bruno Cardoso Lopes f15dfe5818 The VPERM2F128 is a AVX instruction which permutes between two 256-bit
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Akira Hatanaka 2fcc1cfdce Define unaligned load and store.
llvm-svn: 137515
2011-08-12 21:30:06 +00:00
Akira Hatanaka 2f6b944f56 Test case for 137484
llvm-svn: 137486
2011-08-12 18:12:06 +00:00
Akira Hatanaka 79d60d0e94 Enclose directive .cprestore with .set macro and nomacro to silence assembler
warning. 

llvm-svn: 137378
2011-08-11 22:42:31 +00:00
Bruno Cardoso Lopes 8fbf023c9b Add a dag combine to xform 256-bit shuffles into simple vector
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes 9eb3762e08 Fix the test added by Nadav in r137308. Make it more strict:
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file

llvm-svn: 137361
2011-08-11 21:50:35 +00:00
Bruno Cardoso Lopes 043c820800 Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Jim Grosbach 27ad83d8a9 ARM push of a single register encodes as pre-indexed STR.
Per the ARM ARM, a 'push' of a single register encodes as an STR,
not an STM.

llvm-svn: 137318
2011-08-11 18:07:11 +00:00
Jim Grosbach 8ba76c6d5c ARM pop of a single register encodes as post-indexed LDR.
Per the ARM ARM, a 'pop' of a single register encodes as an LDR,
not an LDM.

llvm-svn: 137316
2011-08-11 17:35:48 +00:00
Nadav Rotem 1542d5a00a [AVX] If the data which is going to be saved is already in two XMM registers
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Chris Lattner 5a2c70cc8f add missing colon, thanks peter.
llvm-svn: 137306
2011-08-11 16:15:10 +00:00
Chris Lattner 96710b4308 fix PR10605 / rdar://9930964 by adding a pretty scary missed check.
It's somewhat surprising anything works without this.  Before we would
compile the testcase into:

test:                                   # @test
	movl	$4, 8(%rdi)
	movl	8(%rdi), %eax
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2

now we produce:

test:                                   # @test
	movl	8(%rdi), %eax
	movl	$4, 8(%rdi)
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2
llvm-svn: 137303
2011-08-11 06:26:54 +00:00
Bruno Cardoso Lopes a2d8bb97b9 Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes 572c9aaf53 Use the splat index to generate the desired shuffle. Otherwise we
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman 3ae39f8ad1 Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
Fixes PR9693.

llvm-svn: 137292
2011-08-11 01:48:05 +00:00
NAKAMURA Takumi 504769fc2f test/CodeGen/X86/opt-shuff-tstore.ll: Add explicit -mtriple=x86_64-linux.
llvm-svn: 137262
2011-08-10 22:52:48 +00:00
Devang Patel 37a62058fe While extending definition range of a debug variable, consult lexical scopes also. There is no point extending debug variable out side its lexical block. This provides 6x compile time speedup in some cases.
llvm-svn: 137250
2011-08-10 21:25:34 +00:00
Nadav Rotem d2b071f562 Fix the test. Add cpu target.
llvm-svn: 137241
2011-08-10 19:49:19 +00:00
Nadav Rotem 410a11fe82 When performing a truncating store, it is sometimes possible to rearrange the
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes 3ff111c12d The following X86 pattern is incorrect:
def : Pat<(X86Movss VR128:$src1,
                   (bc_v4i32 (v2i64 (load addr:$src2)))),
          (MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.

llvm-svn: 137227
2011-08-10 17:45:17 +00:00
Rafael Espindola 36a3abc671 Add support for the R and Q constraints.
llvm-svn: 137217
2011-08-10 16:26:42 +00:00
Bruno Cardoso Lopes 278ffd7d8e Fix a bug in vpermilps mask checking. Fix PR10560
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes 72323966c8 Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes fc481959d2 Add v16i16 and v32i8 store patterns
llvm-svn: 137166
2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes 6963062a99 Use fp unpack instructions to unpack int types. Until we have AVX2, this
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Eli Friedman 4ef2426b87 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Bill Wendling d7f41b7f66 Revert r137134. It breaks some code as Eli pointed out.
llvm-svn: 137135
2011-08-09 18:56:35 +00:00
Bill Wendling 84ec8f65d1 Print out the variable declaration only if it is a declaration. Otherwise, a
'static' variable will be emitted twice.
PR10081

llvm-svn: 137134
2011-08-09 18:31:50 +00:00
Jakob Stoklund Olesen 53910d6aae Inflate register classes after coalescing.
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class.  Examples are:

  x86: GR32_ABCD:sub_8bit_hi -> GR32
  arm: DPR_VFP2:ssub0 -> DPR

Recompute the register class of any virtual registers that are used by
less instructions after coalescing.

This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:

  vadd.f32  d16, d1, d0
  vcvt.s32.f32  d0, d16

The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.

llvm-svn: 137133
2011-08-09 18:19:41 +00:00
Bruno Cardoso Lopes bed48dc8ff Reapply a more appropriate solution than in r137114. AVX supports
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.

llvm-svn: 137128
2011-08-09 17:39:13 +00:00
Bruno Cardoso Lopes 24dd1d4a27 Revert r137114
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Justin Holewinski db05c2b963 PTX: Add initial support for device function calls
- Calls are supported on SM 2.0+ for function with no return values

llvm-svn: 137125
2011-08-09 17:36:31 +00:00
Bruno Cardoso Lopes ad3453cf2d Handle sitofp between v4f64 <- v4i32. Fix PR10559
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes 1155b1eafa Add support for avx vector fextend
llvm-svn: 137105
2011-08-09 03:04:29 +00:00
Bruno Cardoso Lopes 337a7fdb13 Rename and tidy up tests
llvm-svn: 137103
2011-08-09 03:04:23 +00:00
Bruno Cardoso Lopes 2fc107365b Add two patterns to match special vmovss and vmovsd cases. Also fix
the patterns already there to be more strict regarding the predicate.
This fixes PR10558

llvm-svn: 137100
2011-08-09 01:43:09 +00:00
Bruno Cardoso Lopes af6a85484c Make LowerVSETCC aware of AVX types and add patterns to match them.
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes c96953c12a Add support for several vector shifts operations while in AVX mode. Fix PR10581
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Eli Friedman a27da98921 Fix up the patterns for SXTB, SXTH, UXTB, and UXTH so that they are correctly active without HasT2ExtractPack. PR10611.
llvm-svn: 137061
2011-08-08 19:49:37 +00:00
Jakob Stoklund Olesen 4f0ace5674 Don't clobber pending ST regs when FP regs are killed.
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints.  It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.

When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.

This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.

This fixes PR10602.

llvm-svn: 137050
2011-08-08 17:15:43 +00:00
Rafael Espindola 9bc32a96be print st_shndx with the correct number of bits.
llvm-svn: 136880
2011-08-04 15:50:13 +00:00
Rafael Espindola 9528995e3f print st_other with the correct number of bits.
llvm-svn: 136877
2011-08-04 15:38:19 +00:00
Rafael Espindola 96df560ce1 print st_type with the correct number of bits.
llvm-svn: 136875
2011-08-04 15:24:00 +00:00
Rafael Espindola 79ef75dc49 Print st_bind with the correct number of bits.
llvm-svn: 136874
2011-08-04 15:10:35 +00:00
Rafael Espindola 1848231ad1 Print r_sym with the correct number of bits.
llvm-svn: 136873
2011-08-04 14:48:27 +00:00
Rafael Espindola 260af5cef6 Print r_type with the correct number of bits.
llvm-svn: 136872
2011-08-04 14:39:30 +00:00
Rafael Espindola 65c559c5fb Change anther counter to decimal.
llvm-svn: 136870
2011-08-04 14:01:03 +00:00
Rafael Espindola cad9e7f094 Don't print a counter in hex.
llvm-svn: 136869
2011-08-04 13:39:15 +00:00
Bill Wendling e234f6ae0c Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
Fixes PR10527.

llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer 3c7e9ee480 Remove underscore that's breaking linux buildbots.
llvm-svn: 136833
2011-08-03 23:13:01 +00:00
Jakub Staszak 15e5b742ad Use MachineBranchProbabilityInfo in If-Conversion instead of its own heuristics.
llvm-svn: 136826
2011-08-03 22:34:43 +00:00
Jakob Stoklund Olesen da618420ee Handle IMPLICIT_DEF instructions in X86FloatingPoint.
This fixes PR10575.

llvm-svn: 136787
2011-08-03 16:33:19 +00:00
Devang Patel dc9cbaaf23 Use byte offset, instead of element number, to access merged global.
llvm-svn: 136759
2011-08-03 01:25:46 +00:00
Rafael Espindola c48e10cd54 Assume .cfi_startproc is the first thing in a function. If the function is
externally visable, create a local symbol to use in the CFE. If not, use the
function label itself.

Fixes PR10420.

llvm-svn: 136716
2011-08-02 20:24:22 +00:00
Bruno Cardoso Lopes 5ada908140 Make this kind of lowering to be supported by 256-bit instructions:
shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
  shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494

llvm-svn: 136691
2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes a8e3673816 Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise
the legalizer. This commit together with the two previous ones fixes
PR10495.

llvm-svn: 136654
2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes 7513939ddd Since vectors with all ones can't be created with a 256-bit instruction,
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547

llvm-svn: 136642
2011-08-01 19:51:53 +00:00
Richard Osborne 0cc000ef29 Fix crash with varargs function with no named parameters.
llvm-svn: 136623
2011-08-01 16:45:59 +00:00
Jakob Stoklund Olesen 5670f850c6 Revert "Don't check liveness of unallocatable registers."
The ARM target depends on CPSR liveness being tracked after register
allocation.

llvm-svn: 136548
2011-07-30 00:57:25 +00:00
Jakob Stoklund Olesen 95cc5440e9 Don't check liveness of unallocatable registers.
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.

While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.

llvm-svn: 136541
2011-07-29 23:36:21 +00:00
Eric Christopher aa5030066f Add support for the 'Q' constraint.
Fixes rdar://9866494

llvm-svn: 136523
2011-07-29 21:18:58 +00:00
Bruno Cardoso Lopes 65ce5ea3ba Fix two tests that I crashed in the previous commits. The mask elts
on the second half must be reindexed.

llvm-svn: 136454
2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes 81eb193f2e Match VPERMIL masks more strictly and update the target specific mask
generation to always catch the weird cases.

llvm-svn: 136453
2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes d23709b18c Add v8i32 and v4i64 vpermil patterns
llvm-svn: 136451
2011-07-29 01:31:07 +00:00
Jakob Stoklund Olesen b28ee4115d Transfer implicit operands in NEONMoveFixPass.
Later passes /are/ using this information when running the register
scavenger.

This fixes the second problem in PR10520.

llvm-svn: 136440
2011-07-29 00:27:35 +00:00
Jakob Stoklund Olesen 9c3badceba Add -verify-arm-pseudo-expand.
This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.

The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.

llvm-svn: 136439
2011-07-29 00:27:32 +00:00
Jakob Stoklund Olesen b16081ce8c Handle REG_SEQUENCE with implicitly defined operands.
Code like that would only be produced by bugpoint, but we should still
handle it correctly.

When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.

Fixes part of PR10520.

llvm-svn: 136401
2011-07-28 21:38:51 +00:00
Bruno Cardoso Lopes 76bc28bac6 Add patterns to generate copies for extract_subvector instead of
using vextractf128. This will reduce the number of issued instruction
for several avx codes.

llvm-svn: 136323
2011-07-28 01:26:50 +00:00
Bruno Cardoso Lopes eca99c4b5a Add a few patterns to match allzeros without having to use the fp unit.
Take advantage that the 128-bit vpxor zeros the higher part and use it.
This also fixes PR10491

llvm-svn: 136321
2011-07-28 01:26:43 +00:00
Bruno Cardoso Lopes 9e2a301216 Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move
a convert pattern close to the instruction definition.

llvm-svn: 136320
2011-07-28 01:26:39 +00:00
Bruno Cardoso Lopes 27a30a7792 The vpermilps and vpermilpd have different behaviour regarding the
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Devang Patel f098ce2757 It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry.
llvm-svn: 136196
2011-07-27 00:34:13 +00:00
Jakob Stoklund Olesen c3bcb02154 Eliminate copies of undefined values during coalescing.
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.

This fixes PR10503.

llvm-svn: 136174
2011-07-26 23:00:24 +00:00
Benjamin Kramer a79c1e0589 Update test.
llvm-svn: 136170
2011-07-26 22:45:39 +00:00