Commit Graph

338 Commits

Author SHA1 Message Date
Sean Callanan 46bb77f2cf Fixed PCMPESTRM128 to have opcode 0x60 instead of 0x62, as specified by the
Intel documentation.

llvm-svn: 79554
2009-08-20 18:24:27 +00:00
Eric Christopher 9fe912de5f Implement sse4.2 string/text processing instructions:
Add patterns and instruction encoding information.
Add custom lowering to deal with hardwired return register of
uncertain type (xmm0).

llvm-svn: 79377
2009-08-18 22:50:32 +00:00
Daniel Dunbar c4f8ea4ccb Add 'isCodeGenOnly' bit to Instruction .td records.
- Used to mark fake instructions which don't correspond to an actual machine
   instruction (or are duplicates of a real instruction). This is to be used for
   "special cases" in the .td files, which should be ignored by things like the
   assembler and disassembler. We still need a good solution to handle pervasive
   duplication, like with the Int_ instructions.

 - Set the bit on fake "mov 0" style instructions, which allows turning an
   assembler matcher warning into a hard error.

 - -2 FIXMEs.

llvm-svn: 78731
2009-08-11 22:17:52 +00:00
Eric Christopher 458c91732c Fix up whitespace, remove commented out code.
llvm-svn: 78600
2009-08-10 21:48:58 +00:00
Daniel Dunbar 17410a4b92 llvm-mc/AsmMatcher: Change assembler parser match classes to their own record
structure.

llvm-svn: 78581
2009-08-10 18:41:10 +00:00
Daniel Dunbar 447c4ab91d Extend comment on ParserMatchClass .td field, and add some missing
classes for X86.

llvm-svn: 78524
2009-08-09 06:00:04 +00:00
Eric Christopher 7dfa9f2e56 Add crc32 instruction and intrinsics. Add a new class of prefix
bytes for F2 0F 38 and propagate. Add a FIXME for a set
of possibilities which correspond to intrinsics already used.

New test.

llvm-svn: 78508
2009-08-08 21:55:08 +00:00
Eric Christopher 45d7185117 Whitespace and 80-col cleanup.
llvm-svn: 77718
2009-07-31 20:07:27 +00:00
Dan Gohman 49a6f16b7c Add a new register class to describe operands that can't be SP,
due to x86 encoding restrictions. This is currently off by default
because it may cause code quality regressions. This is for PR4572.

llvm-svn: 77565
2009-07-30 01:56:29 +00:00
Eric Christopher f7802a33ce Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower
to ptest instruction plus setcc. Revamp ptest instruction. Add test.

llvm-svn: 77407
2009-07-29 00:28:05 +00:00
Eric Christopher f37ea3ad75 Update insertps handling based on feedback. Move to a v4f32 style
to support vector arguments and scalar arguments correctly. Update
lowering and fix comment to refer to pinsr* instead of insertps.

llvm-svn: 76921
2009-07-24 00:33:09 +00:00
Eric Christopher b1b77ca862 Support insertps via the intrinsic and add a couple of simple
testcases to make sure it's being generated.

llvm-svn: 76843
2009-07-23 02:22:41 +00:00
Eli Friedman 2fc939c809 Fix for PR2484: add an SSE1 pattern for a shuffle we normally prefer to
handle with an SSE2 instruction.

llvm-svn: 73760
2009-06-19 07:00:55 +00:00
Eli Friedman 868bd6ab52 Fix an obvious typo.
llvm-svn: 72987
2009-06-06 05:55:37 +00:00
Bill Wendling 2e09bd3d34 The MONITOR and MWAIT instructions have insufficient information for
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.

Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.

llvm-svn: 72556
2009-05-28 23:40:46 +00:00
Evan Cheng cc3ae1f2ff Fix MOVMSKPDrr encoding.
llvm-svn: 72535
2009-05-28 18:55:28 +00:00
Evan Cheng 60618fe4ac Fix PSIGND encoding bug. Patch by Sean Callanan.
llvm-svn: 72534
2009-05-28 18:48:53 +00:00
Bill Wendling 0feb0e6071 "The instructions MMX_PSADBWrm and MMX_PSADBWrr have opcode 0b11100000 (e0), but
the Intel manual (screenshot) says it should be 0b11110110 (f6).  The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."

Patch by Sean Callanan!

llvm-svn: 72508
2009-05-28 02:04:00 +00:00
Evan Cheng 4db1631a43 Fix sfence jit encoding. Patch by Sean Callanan.
llvm-svn: 72488
2009-05-27 18:38:01 +00:00
Evan Cheng b41de47856 80 col violations.
llvm-svn: 71582
2009-05-12 20:17:52 +00:00
Nate Begeman 7e6e352735 Fix infinite recursion in the C++ code which handles movddup by making it unnecessary.
llvm-svn: 70425
2009-04-29 22:47:44 +00:00
Nate Begeman 8d6d4b9289 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225
2009-04-27 18:41:29 +00:00
Rafael Espindola b93db668b3 Revert 69952. Causes testsuite failures on linux x86-64.
llvm-svn: 69967
2009-04-24 12:40:33 +00:00
Nate Begeman bb881d66f4 PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952
2009-04-24 03:42:54 +00:00
Rafael Espindola 3b2df10c9e Re-apply 68552.
Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645
2009-04-08 21:14:34 +00:00
Bill Wendling 4aa25b79f9 Temporarily revert r68552. This was causing a failure in the self-hosting LLVM
builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560
2009-04-07 22:35:25 +00:00
Rafael Espindola 1edda06792 Reduce code duplication on the TLS implementation.
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552
2009-04-07 21:37:46 +00:00
Evan Cheng 40abb7b5d0 ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these intrinsics expect the high bits will not be modified.
llvm-svn: 65499
2009-02-26 03:12:02 +00:00
Nate Begeman e684da3e5d Generate better code for v8i16 shuffles on SSE2
Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311
2009-02-23 08:49:38 +00:00
Evan Cheng 589a539423 Handle llvm.x86.sse2.maskmov.dqu in 64-bit.
llvm-svn: 64240
2009-02-10 22:06:28 +00:00
Evan Cheng 64fdacc27f A few more isAsCheapAsAMove.
llvm-svn: 63852
2009-02-05 08:42:55 +00:00
Evan Cheng f31f288863 The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement.
llvm-svn: 63195
2009-01-28 08:35:02 +00:00
Dan Gohman e907a0a527 Whitespace and other minor adjustments to make SSE instructions have
the same formatting as their corresponding SSE2 instructions, for
consistency.

llvm-svn: 61971
2009-01-09 02:27:34 +00:00
Mon P Wang 998fd29ce1 Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1.
llvm-svn: 61211
2008-12-18 21:42:19 +00:00
Dan Gohman 69cc2cbbff Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning.
llvm-svn: 60487
2008-12-03 18:15:48 +00:00
Dan Gohman cc78cdf275 Mark x86's V_SET0 and V_SETALLONES with isSimpleLoad, and teach X86's
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.

Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).

llvm-svn: 60461
2008-12-03 05:21:24 +00:00
Evan Cheng 27c3702267 Fix lfence and mfence encoding. These look like MRM5r and MRM6r instructions except they do not have any operands. The RegModRM byte is encoded with register number 0.
llvm-svn: 57692
2008-10-17 17:14:20 +00:00
Dan Gohman 6bae5268a7 Fix the predicate for memop64 to be a regular load, not just
an unindexed load.

llvm-svn: 57612
2008-10-16 00:03:00 +00:00
Dan Gohman 29ad439782 Now that predicates can be composed, simplify several of
the predicates by extending simple predicates to create
more complex predicates instead of duplicating the logic
for the simple predicates.

This doesn't reduce much redundancy in DAGISelEmitter.cpp's
generated source yet; that will require improvements to
DAGISelEmitter.cpp's instruction sorting, to make it more
effectively group nodes with similar predicates together.

llvm-svn: 57565
2008-10-15 06:50:19 +00:00
Dale Johannesen 05b54c2af4 Fix SSE4.1 roundss, roundsd. While the instructions have
the same pattern as roundpd/roundps, the Intel compiler 
builtins do not:  rounds* has an extra operand.  Fixes
gcc.target/i386/sse4_1-rounds[sd]-[1234].c

llvm-svn: 57370
2008-10-10 23:51:03 +00:00
Anders Carlsson 1699ad9030 Certain patterns involving the "movss" instruction were marked as requiring SSE2, when in reality movss is an SSE1 instruction.
llvm-svn: 57246
2008-10-07 16:14:11 +00:00
Bill Wendling b04e6edba9 "The original bug was a complaint that _mm_srli_si128 mis-compiled when passed
a constant vector ("{0x123, 0x456}" syntax).  The fix is to simplify the
_mm_srli_si128 macro, and  move the "* 8" from the macro into the compiler
back-end.  I can't change the existing __builtins because so many people are
using them :-(."
Patch by Stuart Hastings!

llvm-svn: 56944
2008-10-02 05:56:52 +00:00
Evan Cheng 7d6fa97567 Implement "punpckldq %xmm0, $xmm0" as "pshufd $0x50, %xmm0, %xmm" unless optimizing for code size.
llvm-svn: 56711
2008-09-26 23:41:32 +00:00
Evan Cheng 30f5494efb unpckhps requires sse1, punpckhdq requires sse2.
llvm-svn: 56697
2008-09-26 21:26:30 +00:00
Evan Cheng 74c9ed91b0 With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps
llvm-svn: 56620
2008-09-25 20:50:48 +00:00
Evan Cheng 7b5a6afb44 pmovsxbq etc. requires sse4.1.
llvm-svn: 56600
2008-09-25 00:49:51 +00:00
Evan Cheng f8ead16b50 Fix patterns for SSE4.1 move and sign extend instructions. Also add instructions which fold VZEXT_MOVL and VZEXT_LOAD.
llvm-svn: 56594
2008-09-24 23:27:55 +00:00
Dan Gohman effb894453 Rename ConstantSDNode::getValue to getZExtValue, for consistency
with ConstantInt. This led to fixing a bug in TargetLowering.cpp
using getValue instead of getAPIntValue.

llvm-svn: 56159
2008-09-12 16:56:44 +00:00
Eli Friedman a9c52c8219 Fix for PR2687: Add patterns to match sint_to_fp and fp_to_sint for <2 x
i32>.  This is a little messy, but it works.

We should really get rid of the intrinsics, though, since they map
perfectly well to standard LLVM instructions.

llvm-svn: 55864
2008-09-05 23:07:03 +00:00
Evan Cheng 97af20f85f FsFLD0S{S|D} and V_SETALLONES are as cheap as moves.
llvm-svn: 55466
2008-08-28 07:52:25 +00:00