Commit Graph

1307 Commits

Author SHA1 Message Date
Adam Nemet 35b80eaef1 [X86] Remove AVX1 vbroadcast intrinsics
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h.  This patch removes the LLVM intrinsics from the backend.

We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.

The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics.  As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.

CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change.  It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.

Also verified that there was no assembly change in the test-suite before and
after this patch.

llvm-svn: 209864
2014-05-29 23:35:36 +00:00
Filipe Cabecinhas dc92102766 Added more insertps optimizations
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.

Also added tests for SSE4.1, AVX, and AVX2.

Reviewers: delena, nadav, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3581

llvm-svn: 209156
2014-05-19 19:45:57 +00:00
Tim Northover 60091cfeb9 TableGen: use correct MIOperand when printing aliases
Previously, TableGen assumed that every aliased operand consumed precisely 1
MachineInstr slot (this was reasonable because until a couple of days ago,
nothing more complicated was eligible for printing).

This allows a couple more ARM64 aliases to print so we can remove the special
code.

On the X86 side, I've gone for explicit AT&T size specifiers as the default, so
turned off a few of the aliases that would have just started printing.

llvm-svn: 208880
2014-05-15 13:36:01 +00:00
Tim Northover d8d65a69cf TableGen/ARM64: print aliases even if they have syntax variants.
To get at least one use of the change (and some actual tests) in with its
commit, I've enabled the AArch64 & ARM64 NEON mov aliases.

llvm-svn: 208867
2014-05-15 11:16:32 +00:00
Benjamin Kramer 6d2dff61f9 X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available.
llvm-svn: 207318
2014-04-26 14:12:19 +00:00
Benjamin Kramer c9827ab103 X86: Add patterns for MULHU/MULHS of v8i16 and v16i16.
This gets us pretty code for divs of i16 vectors. Turn the existing
intrinsics into the corresponding nodes.

llvm-svn: 207317
2014-04-26 13:01:03 +00:00
Quentin Colombet 04f7b74c39 [X86] Fix missing/wrong scheduling model found by code inspection.
llvm-svn: 207014
2014-04-23 19:30:26 +00:00
Filipe Cabecinhas 20352216fb Rename X86insrtps to the proper instruction name.
Summary:
The INSERTPS pattern fragment was called insrtps (mising 'e'), which
would make it harder to grep for the patterns related to this instruction.
Renaming it to use the proper instruction name.

Reviewers: nadav

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3443

llvm-svn: 206779
2014-04-21 20:07:29 +00:00
Benjamin Kramer e6c821ef4c X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

llvm-svn: 206579
2014-04-18 10:45:33 +00:00
Jim Grosbach e4fef71981 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

llvm-svn: 205938
2014-04-09 23:39:25 +00:00
Quentin Colombet 9c816f39ad Revert r205599, the commit was not intended to have so many changes
llvm-svn: 205600
2014-04-04 02:02:49 +00:00
Quentin Colombet 7ee4e79dec [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are hit.

This is related to PR18747.

Patch by MAYUR PANDEY <mayur.p@samsung.com>

llvm-svn: 205599
2014-04-04 01:58:57 +00:00
Cameron McInally 45dc489403 Fix AVX2 Gather execution domains.
llvm-svn: 204713
2014-03-25 12:36:38 +00:00
Quentin Colombet 2d5c156b96 [X86][ISelDAG] Add missing fallback patterns for avx2 broadcast instructions.
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.

<rdar://problem/16074331>

llvm-svn: 204631
2014-03-24 17:54:19 +00:00
Quentin Colombet ca49851833 [X86][SchedModel] Add missing scheduling model for SSE related instructions.
The patch defines new or refines existing generic scheduling classes to match
the behavior of the SSE instructions.
It also maps those scheduling classes on the related SSE instructions.

<rdar://problem/15607571>

llvm-svn: 202065
2014-02-24 19:33:51 +00:00
Craig Topper e2347df24d [x86] Switch PAUSE instruction to use XS prefix instead of HasREPPrefix. Remove HasREPPrefix support from disassembler table generator since its now only used by CodeGenOnly instructions.
llvm-svn: 201767
2014-02-20 07:59:43 +00:00
Craig Topper 6872fd3ad9 Add a bunch of OpSize32 tags to 64-bit mode only instructions to match their 32-bit mode counterparts for cases where there is also a OpSize16 instruction.
llvm-svn: 201550
2014-02-18 08:18:29 +00:00
Craig Topper 5ccb61781f Add an x86 prefix encoding for instructions that would decode to a different instruction with 0xf2/f3/66 were in front of them, but don't themselves have a prefix. For now this doesn't change any bbehavior, but plan to use it to fix some bugs in the disassembler.
llvm-svn: 201538
2014-02-18 00:21:49 +00:00
Craig Topper a0869dceea Recommit r201059 and r201060 with hopefully a fix for its original failure.
Original commits messages:

Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code.

Simplify a bunch of code by removing the need for the x86 disassembler table builder to know about extended opcodes. The modrm forms are sufficient to convey the information.

llvm-svn: 201065
2014-02-10 06:55:41 +00:00
Bob Wilson ebdae7c2ff Revert r201059 and r201060.
r201059 appears to cause a crash in a bootstrapped build of clang. Craig
isn't available to look at it right now, so I'm reverting it while he
investigates.

llvm-svn: 201064
2014-02-10 05:28:30 +00:00
Craig Topper 0d88de8c56 Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code.
llvm-svn: 201059
2014-02-10 00:50:34 +00:00
Jim Grosbach e9008de652 X86: Resolve a long standing FIXME and properly isel pextr[bw].
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.

The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.

Add test cases for SSE4 and AVX variants.

Resolves rdar://13414672.

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 200957
2014-02-07 00:16:33 +00:00
Tim Northover 546b57b011 X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes
I believe VZEXT_MOVL means "zero all vector elements except the first" (and
should have identical input & output types) whereas VZEXT means "zero extend
each element of a vector (discarding higher elements if necessary)".

For example:
    (v4i32 (vzext (v16i8 ...)))

should zero extend the low 4 bytes of the incoming vector to 32-bits,
discarding higher bytes.

However, somewhere in the past, these two concepts had become confused, even
leading to a nonsensical VSEXT_MOVL.

This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
-> VZEXT when it's an actual extension).

rdar://problem/15981990

llvm-svn: 200918
2014-02-06 09:54:51 +00:00
Craig Topper fa6298a162 Merge x86 HasOpSizePrefix/HasOpSize16Prefix into a 2-bit OpSize field with 0 meaning no 0x66 prefix in any mode. Rename Opsize16->OpSize32 and OpSize->OpSize16. The classes now refer to their operand size rather than the mode in which they need a 0x66 prefix. Hopefully can merge REX_W into this as OpSize64.
llvm-svn: 200626
2014-02-02 09:25:09 +00:00
Craig Topper 348cbdacda Remove duplicate patterns
llvm-svn: 200461
2014-01-30 07:19:10 +00:00
Craig Topper c45da1619c Remove some AddedComplexity tags that were forcing priority for AVX over SSE. Use predicates instead.
llvm-svn: 200458
2014-01-30 06:26:25 +00:00
Craig Topper f124c6a5ef Add OpSize16 flags to 32-bit CRC32 instructions so they can be encoded correctly in 16-bit mode.
llvm-svn: 199478
2014-01-17 08:01:20 +00:00
Craig Topper ae11aed9d7 Separate the concept of 16-bit/32-bit operand size controlled by 0x66 prefix and the current mode from the concept of SSE instructions using 0x66 prefix as part of their encoding without being affected by the mode.
This should allow SSE instructions to be encoded correctly in 16-bit mode which r198586 probably broke.

llvm-svn: 199193
2014-01-14 07:41:20 +00:00
Craig Topper 7894e812bb Add the other form of movq xmm,xmm for the disassembler.
llvm-svn: 198551
2014-01-05 07:16:04 +00:00
Craig Topper d9e1669d1c Use patterns to remove some duplicate instructions.
llvm-svn: 198550
2014-01-05 06:55:48 +00:00
Craig Topper 0550ce7ac1 Mark x86 _alt instructions as AsmParserOnly so they will be omitted from disassembler without string matches.
llvm-svn: 198545
2014-01-05 04:55:55 +00:00
Craig Topper 3484fc2161 Add a new x86 specific instruction flag to force some isCodeGenOnly instructions to go through to the disassembler tables without resorting to string matches. Apply flag to all _REV instructions.
llvm-svn: 198543
2014-01-05 04:17:28 +00:00
Craig Topper 9dd48c8ed4 Mark all x86 Int_ and _Int patterns as isCodeGenOnly so the disassembler table builder doesn't need to string match them to exclude them.
llvm-svn: 198323
2014-01-02 17:28:14 +00:00
Eric Christopher c0a5aaeab0 [x86] Rename In32BitMode predicate to Not64BitMode
That's what it actually means, and with 16-bit support it's going to be
a little more relevant since in a few corner cases we may actually want
to distinguish between 16-bit and 32-bit mode (for example the bare 'push'
aliases to pushw/pushl etc.)

Patch by David Woodhouse

llvm-svn: 197768
2013-12-20 02:04:49 +00:00
Elena Demikhovsky 47fc44e52e AVX-512: Added legal type MVT::i1 and VK1 register for it.
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.

llvm-svn: 197384
2013-12-16 13:52:35 +00:00
Andrea Di Biagio 9b5c3dcf01 Added new X86 patterns to select SSE scalar fp arithmetic instructions from
a vector packed single/double fp operation followed by a vector insert.

The effect is that the backend coverts the packed fp instruction
followed by a vectro insert into a SSE or AVX scalar fp instruction.

For example, given the following code:
   __m128 foo(__m128 A, __m128 B) {
     __m128 C = A + B;
     return (__m128) {c[0], a[1], a[2], a[3]};
   }

 previously we generated:
   addps %xmm0, %xmm1
   movss %xmm1, %xmm0
 
 we now generate:
   addss %xmm1, %xmm0

llvm-svn: 197145
2013-12-12 11:50:47 +00:00
Andrea Di Biagio f7c33c8162 Ensure that the backend no longer emits unnecessary vector insert instructions
immediately after SSE scalar fp instructions like addss or mulss.

Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.

For example, given the following code:
  __m128 foo(__m128 A, __m128 B) {
    A[0] += B[0];
    return A;
  }

previously we generated:
  addss %xmm0, %xmm1
  movss %xmm1, %xmm0

now we generate:
  addss %xmm1, %xmm0

llvm-svn: 196925
2013-12-10 15:22:48 +00:00
Cameron McInally c592e5251c Add an intrinsic for the SSE2 PAUSE instruction.
llvm-svn: 195697
2013-11-26 00:20:43 +00:00
Cameron McInally d1cd0be6f3 Fix assembly operands for the SSE2 cvtsd2ss instruction.
llvm-svn: 195129
2013-11-19 14:36:00 +00:00
Craig Topper be79768b6a Lift alignment restrictions on load folding for a significant portion of AVX instructions.
llvm-svn: 194048
2013-11-05 06:31:43 +00:00
Michael Liao b638d05ecb Fix PR17764
- When selecting BLEND from vselect, the operands need swapping as due to the
  difference between vselect and SSE/AVX's BLEND insn

llvm-svn: 193900
2013-11-02 00:10:02 +00:00
Benjamin Kramer 0ccab2d66c X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.
Also update the cost model.

llvm-svn: 193270
2013-10-23 21:06:07 +00:00
Benjamin Kramer da8446b833 X86: Custom lower zext v16i8 to v16i16.
On sandy bridge (PR17654) we now get
	vpxor	%xmm1, %xmm1, %xmm1
	vpunpckhbw	%xmm1, %xmm0, %xmm2
	vpunpcklbw	%xmm1, %xmm0, %xmm0
	vinsertf128	$1, %xmm2, %ymm0, %ymm0

On haswell it's a simple
	vpmovzxbw	%xmm0, %ymm0

There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.

llvm-svn: 193262
2013-10-23 19:19:04 +00:00
Craig Topper f7290f7194 Replace (V)MOVZDI2PDIrr/rm instructions with patterns that select (V)MOVDI2PDIrr/rm.
llvm-svn: 193146
2013-10-22 04:35:20 +00:00
Lang Hames 2783993fca X86 vector element shift-by-immediate instructions take i8 immediates. Make
the instruction defenitions and ISEL reflect this.

Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).

Fixes <rdar://problem/14968098>

llvm-svn: 193096
2013-10-21 17:51:24 +00:00
Craig Topper ef9e993eaa Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
llvm-svn: 192672
2013-10-15 05:20:47 +00:00
Craig Topper d7abdb6f12 Create classes to reduce the size of the tablegen entries for the CRC32 instructions.
llvm-svn: 192568
2013-10-14 05:19:58 +00:00
Craig Topper a422b09ae3 Allow pinsrw/pinsrb/pextrb/pextrw/movmskps/movmskpd/pmovmskb/extractps instructions to parse either GR32 or GR64 without resorting to duplicating instructions.
llvm-svn: 192567
2013-10-14 04:55:01 +00:00
Craig Topper 4432208884 Add disassembler support for SSE4.1 register/register form of PEXTRW. There is a shorter encoding that was part of SSE2, but a memory form was added in SSE4.1. This is the register form of that encoding.
llvm-svn: 192566
2013-10-14 01:42:32 +00:00
Craig Topper 7158745e55 Mark MOVMSKPS/MOVMSKPD/VPINSRWrr64i as AsmParserOnly to remove them from the disassembler tables. Add PINSRWrr64i to complement the AVX version.
llvm-svn: 192565
2013-10-14 01:21:22 +00:00
Craig Topper c4a5a3f65d Don't use 64-bit versions of MOVMSKPD in CodeGen. The instructions only produce a 1-bit result so we can just use SUBREG_TO_REG to extend the 32-bit versions.
llvm-svn: 192562
2013-10-14 00:24:33 +00:00
Craig Topper aab53e7785 Mark some more instructions as CodeGenOnly. Remove filters from the disassembler.
llvm-svn: 192522
2013-10-12 04:46:18 +00:00
Craig Topper 5fb5bd3373 Allow non-AVX form of pmovmskb to take a GR64 operand.
llvm-svn: 192341
2013-10-10 05:33:31 +00:00
Craig Topper 3ada6deaac Remove duplicate instructions.
llvm-svn: 192340
2013-10-10 05:01:22 +00:00
Elena Demikhovsky a3a714082b AVX-512: Added VRCP28 and VRSQRT28 instructions and intrinsics.
llvm-svn: 192283
2013-10-09 08:16:14 +00:00
Craig Topper 49d3319857 Mark some instructions as CodeGenOnly since they aren't needed by the assembler or disassembler. Disassembler already filtered them, but asm parser still had them in its tables.
llvm-svn: 192271
2013-10-09 03:56:16 +00:00
Craig Topper bc749db947 Add in64BitMode/in32BitMode to the MMX/SSE2/AVX maskmovq/dq instructions. This way the asm parser will pick the right one based on the mode. Instruction selection already did the right thing based on the pointer size.
llvm-svn: 192266
2013-10-09 02:18:34 +00:00
Craig Topper 72c8cd7bc3 Remove some instructions that existed to provide aliases to the assembler. Can be done with InstAlias instead. Unfortunately, this was causing printer to use 'vmovq' or 'vmovd' based on what was parsed. To cleanup the inconsistencies convert all 'vmovd' with 64-bit registers to 'vmovq', but provide an alias so that 'vmovd' will still parse.
llvm-svn: 192171
2013-10-08 05:53:50 +00:00
Craig Topper 07ad1b23bb Remove some instructions that seem to only exist to trick the filtering checks in the disassembler table creation. Just fix up the filter to let the real instruction through instead.
llvm-svn: 192090
2013-10-07 07:19:47 +00:00
Craig Topper 68d2546ec6 Remove FsMOVAPSrr and friends. They have no patterns and are no longer selected anywhere.
llvm-svn: 192089
2013-10-07 06:10:45 +00:00
Craig Topper a0e0735e6a Teach X86 asm parser that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not.
This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior and instruction selection already does this.

llvm-svn: 192088
2013-10-07 05:42:48 +00:00
Craig Topper 8f14de8f32 Switch HasAVX to UseAVX in one spot to ensure that AVX512 form of VINSERTPS is used in AVX512 mode.
llvm-svn: 191489
2013-09-27 07:16:24 +00:00
Craig Topper c6a1aac735 Removal some duplicate patterns.
llvm-svn: 191488
2013-09-27 07:11:17 +00:00
Yunzhong Gao 4467f33e3c Fixing Intel format of the vshufpd instruction.
Phabricator code review is located at: http://llvm-reviews.chandlerc.com/D1759

llvm-svn: 191481
2013-09-27 01:44:23 +00:00
Craig Topper 9a3915a74f Lift alignment restrictions on load/store folding of VEXTRACTI128/VINSERTI128.
llvm-svn: 191073
2013-09-20 05:37:49 +00:00
Craig Topper 98064b9f4d Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268.
llvm-svn: 190916
2013-09-18 03:55:53 +00:00
Ben Langmuir de39520f79 Add llvm.x86.* intrinsics for Intel SHA Extensions
Add llvm.x86.* intrinsics for all of the Intel SHA Extensions instructions, as
well as tests. Also remove mayLoad and hasSideEffects, which can be inferred
from the instruction patterns.

llvm-svn: 190864
2013-09-17 13:44:39 +00:00
Craig Topper a6d204ec68 Make F16C feature flag imply AVX rather than just checking both at the patterns.
llvm-svn: 190775
2013-09-16 04:29:58 +00:00
Ben Langmuir 8eb45a4ef6 Add the remaining Intel SHA instructions
Also assembly/disassembly tests, and for sha256rnds2, aliases with an explicit
xmm0 dependency.

llvm-svn: 190754
2013-09-14 15:03:21 +00:00
Preston Gurd 3fe264d625 Adds support for Atom Silvermont (SLM) - -march=slm
Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.

Auto detects SLM.

Turns on post RA scheduler when generating code for SLM.

llvm-svn: 190717
2013-09-13 19:23:28 +00:00
Ben Langmuir 1650175de6 Partial support for Intel SHA Extensions (sha1rnds4)
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.

Support for the remaining instructions will follow in a separate patch.

llvm-svn: 190611
2013-09-12 15:51:31 +00:00
Elena Demikhovsky 8952974e29 AVX-512: implemented extractelement with variable index.
Added parsing of mask register and "zeroing" semantic, like {%k1} {z}.

llvm-svn: 190595
2013-09-12 08:55:00 +00:00
Craig Topper adbb9a121f Add neverHasSideEffects=1 on a couple move instructions.
llvm-svn: 190259
2013-09-08 00:50:45 +00:00
Elena Demikhovsky 9a5ed9c3bd AVX-512: added SQRT, VRSQRT14, VCOMISS, VUCOMISS, VRCP14, VPABS
llvm-svn: 189472
2013-08-28 11:21:58 +00:00
Elena Demikhovsky 93eeb47d49 AVX-512: added conversion instructions.
llvm-svn: 189349
2013-08-27 13:54:04 +00:00
Elena Demikhovsky 0a2b6290f1 AVX-512: Added shuffle instructions -
VPSHUFD, VPERMILPS, VMOVDDUP, VMOVLHPS, VMOVHLPS, VSHUFPS, VALIGN
 single and double forms.

llvm-svn: 189215
2013-08-26 12:45:35 +00:00
Elena Demikhovsky 540d582594 AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQ
llvm-svn: 188786
2013-08-20 11:00:29 +00:00
Craig Topper fd2b389263 Move AVX and non-AVX replication inside a couple multiclasses to avoid repeating each instruction for both individually.
llvm-svn: 188743
2013-08-20 04:24:14 +00:00
Elena Demikhovsky 3ce8dbbac2 AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions.
llvm-svn: 188637
2013-08-18 13:08:57 +00:00
Benjamin Kramer 5bc180c14f X86: Turn fp selects into mask operations.
double test(double a, double b, double c, double d) { return a<b ? c : d; }

before:
_test:
	ucomisd	%xmm0, %xmm1
	ja	LBB0_2
	movaps	%xmm3, %xmm2
LBB0_2:
	movaps	%xmm2, %xmm0

after:
_test:
	cmpltsd	%xmm1, %xmm0
	andpd	%xmm0, %xmm2
	andnpd	%xmm3, %xmm0
	orpd	%xmm2, %xmm0

Small speedup on Benchmarks/SmallPT

llvm-svn: 187706
2013-08-04 12:05:16 +00:00
Elena Demikhovsky cd46691728 AVX-512 set: added VEXTRACTPS instruction
llvm-svn: 187705
2013-08-04 10:46:07 +00:00
Elena Demikhovsky 67b05fc0b3 Added INSERT and EXTRACT intructions from AVX-512 ISA.
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.

llvm-svn: 187491
2013-07-31 11:35:14 +00:00
Craig Topper efd67d4612 Changed register names (and pointer keywords) to be lower case when using Intel X86 assembler syntax.
Patch by Richard Mitton.

llvm-svn: 187476
2013-07-31 02:47:52 +00:00
Craig Topper 6030a65039 Remove some errant space charcters in mnemonic strings.
llvm-svn: 186932
2013-07-23 06:45:34 +00:00
Craig Topper 61da939a17 More Intel syntax alias fixes.
llvm-svn: 186814
2013-07-22 09:58:07 +00:00
Craig Topper 03db790dc6 Change %xmm0 to XMM0 in Intel side of asm strings for PBLENDVB.
llvm-svn: 186812
2013-07-22 09:22:49 +00:00
Elena Demikhovsky 89703c06f2 Removed PackedDouble domain from scalar instructions. Added more formats for the scalar stuff.
llvm-svn: 183626
2013-06-09 07:37:10 +00:00
Michael Liao 00b20cc924 [PATCH] Fix VGATHER* operand constraints
Add earlyclobber constaints to prevent input register being allocated as
the output register because, according to Intel spec [1], "If any pair
of the index, mask, or destination registers are the same, this
instruction results a UD fault."

---
[1] http://software.intel.com/sites/default/files/319433-014.pdf

llvm-svn: 183327
2013-06-05 18:12:26 +00:00
Elena Demikhovsky fad029202f Removed SSEPacked domain from all forms (AVX, SSE, signed, unsigned) scalar compare instructions, like COMISS, COMISD.
No functional changes.

llvm-svn: 182371
2013-05-21 12:04:22 +00:00
Preston Gurd 9264c95400 Corrected Atom latencies for SSE SQRT instructions.
llvm-svn: 181346
2013-05-07 19:57:34 +00:00
Rafael Espindola 817c1d92b4 Put VMOVPQIto64rr in the VRPDI class.
Patch by Joshua Magee.

llvm-svn: 180842
2013-05-01 13:00:16 +00:00
Benjamin Kramer aec90531f9 X86: Now that we have a canonical form for vector integer abs, match it into pabs.
llvm-svn: 180600
2013-04-26 12:05:21 +00:00
Jakob Stoklund Olesen e440d476ee Annotate the remaining x86 instructions with SchedRW lists.
Now all x86 instructions that have itinerary classes also have SchedRW
lists. This is required before the new scheduling models can be used.

There are still unannotated instructions remaining, but they don't have
itinerary classes either.

llvm-svn: 178051
2013-03-26 18:24:22 +00:00
Jakob Stoklund Olesen 4d39e81fb8 Remove IIC_DEFAULT from X86Schedule.td
All the instructions tagged with IIC_DEFAULT had nothing in common, and
we already have a NoItineraries class to represent untagged
instructions.

llvm-svn: 177937
2013-03-25 23:12:41 +00:00
Jakob Stoklund Olesen 712f674880 Model prefetches and barriers as loads.
It's not yet clear if these instructions need a more careful model.

llvm-svn: 177599
2013-03-20 23:09:53 +00:00
Jakob Stoklund Olesen 5b535c965e Add a catch-all WriteSystem SchedWrite type.
This is used for all the expensive system instructions.

llvm-svn: 177598
2013-03-20 23:09:50 +00:00
Jakob Stoklund Olesen cd4ebb7639 Annotate the remaining SSE MOV instructions.
llvm-svn: 177592
2013-03-20 22:37:16 +00:00
Jakob Stoklund Olesen c6dc70d865 Annotate SSE horizontal and integer instructions.
llvm-svn: 177591
2013-03-20 22:37:13 +00:00
Jakob Stoklund Olesen 7a8bb72a3a Add some missing SSE annotations.
llvm-svn: 177540
2013-03-20 16:56:39 +00:00
Jakob Stoklund Olesen 3a546156c7 Annotate various null idioms with SchedRW lists.
llvm-svn: 177461
2013-03-19 23:23:31 +00:00
Jakob Stoklund Olesen 24aac1dc92 Annotate SSE float conversions with SchedRW lists.
llvm-svn: 177460
2013-03-19 23:23:29 +00:00
Jakob Stoklund Olesen a5158c8f0a Add SchedRW annotations to most of X86InstrSSE.td.
We hitch a ride with the existing OpndItins class that was used to add
instruction itinerary classes in the many multiclasses in this file.

Use the link provided by the X86FoldableSchedWrite.Folded to find the
right SchedWrite for folded loads.

llvm-svn: 177326
2013-03-18 22:01:35 +00:00
Nadav Rotem adfa5eaf8c Unaligned loads should use the VMOVUPS opcode.
llvm-svn: 177130
2013-03-14 23:49:44 +00:00
Craig Topper 8fb09f0abb Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction.
llvm-svn: 173667
2013-01-28 06:48:25 +00:00
Craig Topper c7e6feee42 Combine AVX and SSE forms of MOVSS and MOVSD into the same multiclasses so they get instantiated together.
llvm-svn: 172704
2013-01-17 06:59:42 +00:00
Craig Topper 0d2c29e807 Simplify nested strconcats in X86 td files since strconcat can take more than 2 arguments.
llvm-svn: 172379
2013-01-14 07:46:34 +00:00
Craig Topper 4c69a05d2d Create a single multiclass for SSE and AVX version of MOVL/MOVH. Prevents needing to specify everything twice. No functional change intended
llvm-svn: 172378
2013-01-14 07:26:58 +00:00
Benjamin Kramer bcd14a0f26 X86: Add patterns for X86ISD::VSEXT in registers.
Those can occur when something between the sextload and the store is on the same
chain and blocks isel. Fixes PR14887.

llvm-svn: 172353
2013-01-13 11:37:04 +00:00
Craig Topper bd62d64cbf Remove unnecessary # tokens at the beginning and end of defm names.
llvm-svn: 171694
2013-01-07 05:04:39 +00:00
Craig Topper 4f1c7256f9 Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior.
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.

Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.

llvm-svn: 171668
2013-01-06 20:39:29 +00:00
Craig Topper 9791afb182 Merge SSE and AVX instruction definitions for scalar forms of SQRT, RSQRT, and RCP.
llvm-svn: 171356
2013-01-02 08:00:39 +00:00
Craig Topper 4bc5c4e152 Merge SSE and AVX instruction definitions for PSHUFD/PSHUFHW/PSHUFLW.
llvm-svn: 171355
2013-01-02 07:27:49 +00:00
Rafael Espindola db1a84c84a Revert 171351. It broke MC/X86/x86-32-avx.s.
llvm-svn: 171352
2013-01-02 01:35:11 +00:00
Craig Topper 86d0cdb82f Merge SSE and AVX instruction definitions for scalar forms of SQRT, RSQRT, and RCP.
llvm-svn: 171351
2013-01-01 20:53:20 +00:00
Craig Topper 12ed9cd6ae Remove unused argument from a multiclass.
llvm-svn: 171340
2013-01-01 03:42:44 +00:00
Craig Topper 2edafc059d Merge intrinsic instruction definitions for SSE and AVX versions of RCPPS and RSQRTPS.
llvm-svn: 171339
2013-01-01 03:30:21 +00:00
Craig Topper d04dbec6c9 Remove 2 unused multiclasses.
llvm-svn: 171338
2013-01-01 02:02:45 +00:00
Craig Topper 7cc4f322cf Merge AVX/SSE instruction definitions for SQRTPS/PD, RSQRTPS, RCPPS. No funcitonal change intended.
llvm-svn: 171337
2013-01-01 00:11:07 +00:00
Craig Topper c2521cd309 Use packed instead of scalar itineraries for SSE1/2 SQRTPS/PD, RCPPS, and RSQRTPS. VEX-encoded forms already use packed.
llvm-svn: 171336
2012-12-31 23:49:05 +00:00
Craig Topper fe82eb6bcd Remove intrinsic specific instructions for (V)SQRTPS/PD. Instead lower to target-independent ISD nodes and use the existing patterns for those.
llvm-svn: 171237
2012-12-29 18:18:20 +00:00
Craig Topper 6b27251a76 Remove intrinsic specific instructions for SSE/SSE2/AVX floating point max/min instructions. Lower them to target specific nodes and use those patterns instead. This also allows them to be commuted if UnsafeFPMath is enabled.
llvm-svn: 171227
2012-12-29 16:44:25 +00:00
Craig Topper ab2e6842cc Merge basic_sse12_fp_binop_p_int and basic_sse12_fp_binop_p_y_int multiclasses.
llvm-svn: 171171
2012-12-27 22:53:47 +00:00
Craig Topper e2eec3c52b Merge basic_sse12_fp_binop_p and basic_sse12_fp_binop_p_y multiclasses.
llvm-svn: 171166
2012-12-27 18:51:50 +00:00
Craig Topper 757f3fc394 Add hasSideEffects=0 to some forms of ROUND, RCP, and RSQRT.
llvm-svn: 171143
2012-12-27 07:16:08 +00:00
Craig Topper 09ce4b9efe Move single letter 'P' prefix out of multiclass now that tablegen allows defm to start with #NAME. This makes instruction names more searchable again.
llvm-svn: 171141
2012-12-27 06:34:54 +00:00
Craig Topper 1b8c0750ee Mark all the _REV instructions as not having side effects. They aren't really emitted by the backend, but it reduces the number of instructions in the output files with unmodelled side effects to make auditing easier.
llvm-svn: 171118
2012-12-26 21:30:22 +00:00
Craig Topper 18f2675e9b Remove a special conditional setting of neverHasSideEffects if the instruction didn't have a pattern. This was leftover from when tablegen used to complain if things were already inferred from patterns.
llvm-svn: 171117
2012-12-26 21:04:30 +00:00
Craig Topper 24f316e4db Merge still more SSE/AVX instruction definitions.
llvm-svn: 171103
2012-12-26 07:54:43 +00:00
Craig Topper af629e2700 Merge more SSE/AVX instruction definitions.
llvm-svn: 171102
2012-12-26 07:20:35 +00:00
Craig Topper 65fe30450d Fix 80 column violation.
llvm-svn: 171097
2012-12-26 06:15:53 +00:00
Craig Topper f4d0fe8fcd Fix class name in comment.
llvm-svn: 171096
2012-12-26 06:15:09 +00:00
Craig Topper 59747c4dbd Merge SSE/AVX PCMPEQ/PCMPGT instruction definitions.
llvm-svn: 171095
2012-12-26 06:14:15 +00:00
Craig Topper 8a48677586 Remove 'v' from mnemonic to fix asm matching failures.
llvm-svn: 171093
2012-12-26 06:02:15 +00:00
Craig Topper b4ef0fa3a1 Use an additional multiclass to merge the 128/256-bit SSE/AVX instruction definitions for a bunch of SSE2 integer arithmetic instructions.
llvm-svn: 171092
2012-12-26 05:49:15 +00:00
Craig Topper a2594dd5f0 Use an additional multiclass to merge the 128/256-bit SSE/AVX instruction definitions for PAND/POR/PXOR/PANDN
llvm-svn: 171087
2012-12-26 04:36:03 +00:00
Craig Topper 97730a0d6a Merge an AVX/SSE 256-bit and 128-bit multiclass.
llvm-svn: 171086
2012-12-26 03:56:47 +00:00
Craig Topper 8b59746390 Mark VANDNPD/VANDNPDS as not commutable.
llvm-svn: 171085
2012-12-26 03:48:10 +00:00
Benjamin Kramer 4669d18893 X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics
This is very mechanical, no functionality change. Preparation for PR14667.

llvm-svn: 170898
2012-12-21 14:04:55 +00:00
Elena Demikhovsky 14a4af0e66 Optimized load + SIGN_EXTEND patterns in the X86 backend.
llvm-svn: 170506
2012-12-19 07:50:20 +00:00
Benjamin Kramer b16ccde7a4 X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible.
We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273
2012-12-15 16:47:44 +00:00
Craig Topper 216bcd522b Remove intrinsic specific instructions for (V)MOVQUmr with patterns pointing to the normal instructions.
llvm-svn: 169482
2012-12-06 07:31:16 +00:00
Craig Topper 922f10aec4 Mark MOVDQ(A/U)rm as ReMaterializable. Mark all MOVDQ(A/U) instructions as neverHasSideEffects.
llvm-svn: 169477
2012-12-06 06:49:16 +00:00
Elena Demikhovsky cd3c1c4a16 Simplified BLEND pattern matching for shuffles.
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
2012-12-05 09:24:57 +00:00
Craig Topper 70601ba6f9 Use roundps/pd for llvm.ceil, llvm.trunc, llvm.rint, and llvm.nearbyint of vector types.
llvm-svn: 168141
2012-11-16 06:37:56 +00:00
Craig Topper 9268c94b15 Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support.
llvm-svn: 167652
2012-11-10 01:23:36 +00:00
Michael Liao ec47090b1e Remove tailing whitespaces
llvm-svn: 167445
2012-11-06 08:06:35 +00:00
Manman Ren 6b223a4f06 X86 SSE: update rsqrtss and rcpss to use two source operands and
the first source operand is tied to the destination operand.

This is to accurately model the corresponding instructions where the upper
bits are unmodified.

rdar://12558838
PR14221

llvm-svn: 167064
2012-10-30 23:53:59 +00:00
Michael Liao ad0b69fe3e Fix PR14204
- Add missing pattern on X86ISD::VZEXT from VR256 to VR256 when AVX2 is enabled.

llvm-svn: 166947
2012-10-29 17:57:12 +00:00
Michael Liao c5af149e70 Add custom conversion from v2u32 to v2f32 in 32-bit mode
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
  v2f32 is added to improve the efficiency of the code generated.

llvm-svn: 166545
2012-10-24 04:09:32 +00:00
Michael Liao 1be96bb5ce Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1
llvm-svn: 166486
2012-10-23 17:34:00 +00:00
Michael Liao e999b865dd Add support for FP_ROUND from v2f64 to v2f32
- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631
2012-10-10 16:53:28 +00:00
Craig Topper 3f23c1a8b9 Remove code for setting the VEX L-bit as a function of operand size from the code emitters and the disassembler table builder. Fix a couple instructions that were still missing VEX_L.
llvm-svn: 164204
2012-09-19 06:37:45 +00:00
Craig Topper a73be890a1 Add explicit VEX_L tags to all 256-bit instructions. This will allow us to remove code from the code emitters that examined operands to set the L-bit.
llvm-svn: 164202
2012-09-19 06:06:34 +00:00
Nadav Rotem 37521aa89c The PMOVZXWD family of functions had patterns extends narrow vector types to wide vector types.
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.

rdar://11897677

llvm-svn: 163995
2012-09-16 07:39:07 +00:00
Michael Liao 400f7ef871 Enhance PR11334 fix to support extload from v2f32/v4f32
- Fix an remaining issue of PR11674 as well

llvm-svn: 163528
2012-09-10 18:33:51 +00:00
Craig Topper 4ed79bd7d7 Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled.
llvm-svn: 163473
2012-09-08 17:42:27 +00:00
Craig Topper f3e4aa8cdd Use iPTR instead of i32 for extract_subvector/insert_subvector index in lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder.
llvm-svn: 163293
2012-09-06 06:09:01 +00:00
Craig Topper daa5ed1e0a Add patterns for converting stores of subvector_extracts of lower 128-bits of a 256-bit vector to VMOVAPSmr/VMOVUPSmr.
llvm-svn: 163292
2012-09-06 05:15:01 +00:00
Craig Topper 81f06df699 Remove some of the patterns added in r163196. Increasing the complexity on insert_subvector into undef accomplishes the same thing.
llvm-svn: 163198
2012-09-05 07:26:35 +00:00
Craig Topper f7c87d6eea Add patterns for integer forms of VINSERTF128/VINSERTI128 folded with loads. Also add patterns to turn subvector inserts with loads to index 0 of an undef into VMOVAPS.
llvm-svn: 163196
2012-09-05 06:58:39 +00:00
Craig Topper 2db2353b21 Convert vextracti128/vextractf128 intrinsics to extract_subvector at DAG build time. Similar was previously done for vinserti128/vinsertf128. Add patterns for folding these extract_subvectors with stores.
llvm-svn: 163192
2012-09-05 05:48:09 +00:00
Craig Topper d6cc4062be Typos
llvm-svn: 163053
2012-09-01 06:33:50 +00:00
Michael Liao 969f3913dd Clean up AddedComplexity further after adding UseSSEx
llvm-svn: 162973
2012-08-31 03:01:35 +00:00
Jim Grosbach e423e865fe X86: Fix encoding of 'movd %xmm0, %rax'
The assembly string for the VMOVPQIto64rr instruction incorrectly lacked the 'v'
prefix, resulting in mis-assembly of the vanilla movd instruction.

llvm-svn: 162963
2012-08-31 00:30:30 +00:00
Michael Liao bbd10792c2 Introduce 'UseSSEx' to force SSE legacy encoding
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919
2012-08-30 16:54:46 +00:00
Bill Wendling cc56718038 The commutative flag is already correctly set within the multiclass. If we set
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>

llvm-svn: 162741
2012-08-28 07:36:46 +00:00
Craig Topper 72f51c3986 Convert V_SETALLONES/AVX_SETALLONES/AVX2_SETALLONES to Post-RA pseudos.
llvm-svn: 162740
2012-08-28 07:30:47 +00:00
Craig Topper bd509eea4a Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo.
llvm-svn: 162738
2012-08-28 07:05:28 +00:00
Jakob Stoklund Olesen 89d6b29d16 More missing mayLoad flags on AVX multiclasses.
llvm-svn: 162714
2012-08-28 00:02:01 +00:00
Craig Topper 5af2fed5f2 Don't allow vextractf128 to be folded with unaligned stores. We don't fold unaligned loads so shouldn't fold unaligned stores as it can cause an alignment fault to occur.
llvm-svn: 162658
2012-08-27 07:19:59 +00:00
Craig Topper 6d44554cd4 Fold some patterns into instruction definitons so tablegen can infer flags removing the need for an explicit 'neverHasSideEffects = 1'
llvm-svn: 162656
2012-08-27 07:04:50 +00:00
Craig Topper f7828f91ee Add HasAVX1Only predicate and use it for patterns that have an AVX1 instruction and an AVX2 instruction rather than relying on AddedComplexity.
llvm-svn: 162654
2012-08-27 06:08:57 +00:00
Jakob Stoklund Olesen 3d91b43ad2 Add missing mayLoad flags to a large class of AVX *_Int instructions.
llvm-svn: 162622
2012-08-24 23:29:07 +00:00
Jakob Stoklund Olesen d3511235d1 Remove some spurious mayLoad = 0 flags.
They were inserted to silence TableGen's warning about
redundant properties. That warning is now gone.

llvm-svn: 162517
2012-08-24 00:31:20 +00:00
Nadav Rotem 178250ad87 When unsafe math is used, we can use commutative FMAX and FMIN. In some cases
this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187
2012-08-19 13:06:16 +00:00
Michael Liao 34107b9177 fix PR11334
- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894
2012-08-14 21:24:47 +00:00
Craig Topper ab47fe4e16 Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.
llvm-svn: 161318
2012-08-06 06:22:36 +00:00
Craig Topper 6d0408d3a5 Remove custom inserter for MWAIT. It doesn't do anything that couldn't be represented in a pattern.
llvm-svn: 161306
2012-08-05 00:36:57 +00:00
Manman Ren 4059145396 X86: mark GATHER instructios as mayLoad
llvm-svn: 161143
2012-08-01 23:28:59 +00:00
Craig Topper 14eac5dda8 Give VCVTTPD2DQ priority over CVTTPD2DQ.
llvm-svn: 160942
2012-07-30 02:20:32 +00:00
Craig Topper f881d385da Fix patterns for CVTTPS2DQ to specify SSE2 instead of SSE1.
llvm-svn: 160941
2012-07-30 02:14:02 +00:00
Craig Topper 415b3586d0 Fix up patterns for VCVTSS2SD. Specifically give it priority over SSE form. Add an OptForSpeed to explicitly pair up with an OptForSize that was already on another pattern.
llvm-svn: 160939
2012-07-30 01:38:57 +00:00
Craig Topper 28402efcb6 Fix load types on intrinsic forms of SS2SD and SD2SS AVX/SSE convert instruction patterns.
llvm-svn: 160938
2012-07-29 23:26:34 +00:00
Craig Topper b6767f3acd Move more SSE/AVX convert instruction patterns into their definitions.
llvm-svn: 160937
2012-07-29 22:30:06 +00:00
Craig Topper fc93281c07 Fold patterns for some of the SSE/AVX convert instructions into their instruction definitions.
llvm-svn: 160922
2012-07-28 18:59:19 +00:00
Craig Topper 024797b9a2 Mark some of the SSE/AVX convert instructions as mayLoad/neverHasSideEffects.
llvm-svn: 160921
2012-07-28 18:36:39 +00:00
Craig Topper 44f9b5343d Make CVTSS2SI instruction definition consistent with CVTSD2SI.
llvm-svn: 160914
2012-07-28 08:28:23 +00:00
Craig Topper 1c1aef07b8 Fix up memory load types for SSE scalar convert intrinsic patterns.
llvm-svn: 160913
2012-07-28 07:59:59 +00:00
Jakob Stoklund Olesen 77cd55b4ee Remove the last mentions of sub_ss and sub_sd from patterns.
I'll remove these two sub-register indexes shortly.

llvm-svn: 160831
2012-07-26 23:03:08 +00:00
Jakob Stoklund Olesen b96d0b4e08 Eliminate sub_ss, sub_sd from broadcast patterns.
The (COPY_TO_REGCLASS GR32:$src, VR128) pattern looks odd, but
copyPhysReg does the right thing with it. (The old pattern would
eventually produce the same cross-class copy).

llvm-svn: 160830
2012-07-26 22:59:06 +00:00
Jakob Stoklund Olesen 206b825f5c Eliminate more sub_ss / sub_sd patterns.
This gets rid of some more INSERT_SUBREG - IMPLICIT_DEF patterns,
simplifying the emitted code a bit.

llvm-svn: 160820
2012-07-26 22:30:18 +00:00
Jakob Stoklund Olesen 75d17b0577 Eliminate some SUBREG_TO_REG patterns with sub_ss and sub_sd.
The SUBREG_TO_REG instruction has magic semantics asserting that the
source value was defined by an instruction that cleared the high half of
the register. Those semantics are never actually exploited for xmm
registers.

llvm-svn: 160818
2012-07-26 22:03:21 +00:00
Jakob Stoklund Olesen ceee4a9d0c Eliminate a batch of uses of sub_ss and sub_sd in the X86 target.
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves.  They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.

This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.

The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.

llvm-svn: 160816
2012-07-26 21:40:42 +00:00
Craig Topper c7690ac7ac Make l/q suffixes on AVX forms of scalar convert instructions consistent with their non-AVX forms.
llvm-svn: 160775
2012-07-26 07:48:28 +00:00
Nadav Rotem 4c12245b3a The vbroadcast family of instructions has 'fallback patterns' in case where the
load source operand is used by multiple nodes. The v2i64 broadcast was emulated
by shuffling the two lower i32 elements to the upper two.
We had a bug in the immediate used for the broadcast.
Replacing 0 to 0x44.
0x44 means [01|00|01|00] which corresponds to the correct lane.

Patch by Michael Kuperstein.

llvm-svn: 160430
2012-07-18 08:14:48 +00:00
Craig Topper 01deb5f2df Make x86 asm parser to check for xmm vs ymm for index register in gather instructions. Also fix Intel syntax for gather instructions to use 'DWORD PTR' or 'QWORD PTR' to match gas.
llvm-svn: 160420
2012-07-18 04:11:12 +00:00
Nadav Rotem ee3552f88d Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention.
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.

PR12782.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160230
2012-07-15 12:26:30 +00:00
Craig Topper b3bac4908e Mark VINSERTI128rm as MayLoad=1. Fixes PR13348.
llvm-svn: 160162
2012-07-13 05:46:28 +00:00
Craig Topper f7755df776 Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren.
llvm-svn: 160110
2012-07-12 06:52:41 +00:00
Craig Topper be41e2daa6 Reverse assembler/disassembler operand order for gather instructions.
llvm-svn: 159983
2012-07-10 06:38:33 +00:00
Craig Topper 85c938f44f Remove extra space.
llvm-svn: 159647
2012-07-03 06:48:58 +00:00
Craig Topper f067f9aa51 Change i128mem/i256mem to f128mem/f256mem on some floating point vector instructions.
llvm-svn: 159646
2012-07-03 06:11:06 +00:00
Craig Topper 676dcd8c39 Add aliases for pblendvb, blendvpd, and blendvps instructions with the implicit xmm0 operand specified. Fixes PR13252.
llvm-svn: 159644
2012-07-03 05:49:45 +00:00
Elena Demikhovsky 9af899fa88 Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2.
llvm-svn: 159504
2012-07-01 06:12:26 +00:00
Manman Ren 98a5bf24a9 X86: add more GATHER intrinsics in LLVM
Corrected type for index of llvm.x86.avx2.gather.d.pd.256
  from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
  from 256-bit to 128-bit.

Support the following intrinsics:
  llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
  llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
  llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
  llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256

llvm-svn: 159402
2012-06-29 00:54:20 +00:00
Manman Ren a09820414a X86: add GATHER intrinsics (AVX2) in LLVM
Support the following intrinsics:
llvm.x86.avx2.gather.d.pd, llvm.x86.avx2.gather.q.pd
llvm.x86.avx2.gather.d.pd.256, llvm.x86.avx2.gather.q.pd.256
llvm.x86.avx2.gather.d.ps, llvm.x86.avx2.gather.q.ps
llvm.x86.avx2.gather.d.ps.256, llvm.x86.avx2.gather.q.ps.256

Modified Disassembler to handle VSIB addressing mode.

llvm-svn: 159221
2012-06-26 19:47:59 +00:00
Craig Topper 94bf0f3855 Remove some duplicate instructions that exist only to given different mnemonics for the assembler. Use InstAlias instead.
llvm-svn: 159184
2012-06-26 04:12:49 +00:00
Craig Topper 357de815b4 Add SSE2 predicate to CVTPS2PD instructions. Doesn't matter much because there are no patterns in the instruction.
llvm-svn: 159127
2012-06-25 06:51:42 +00:00
Craig Topper b6eb513c68 Remove codegen only instruction in favor of one that has the same definition. Make some pattern operands more explicit about types.
llvm-svn: 159126
2012-06-25 06:16:00 +00:00
Craig Topper fd5e6e7db1 Remove intrinsic specific instructions for (V)CVTPS2DQ and replace with patterns.
llvm-svn: 159109
2012-06-24 07:07:16 +00:00
Craig Topper b925230fb1 Remove intrinsic specific instructions for (V)CVTPS2DQ and replace with patterns.
llvm-svn: 159108
2012-06-24 06:55:37 +00:00
Craig Topper f48ec7a708 Fix build failures from r159106.
llvm-svn: 159107
2012-06-24 06:08:31 +00:00
Craig Topper bab2b89944 Remove intrinsic specific instructions for CVTPD2PS and replace with just patterns.
llvm-svn: 159106
2012-06-24 05:44:31 +00:00
Craig Topper 3cee08ce7d Remove intrinsic specific instructions for CVTPD2DQ. Replace with patterns.
llvm-svn: 159105
2012-06-24 05:33:24 +00:00
Craig Topper a899cc15f1 Remove intrinsic specific instructions for (V)CVTDQ2PS. Use a Pat instead instead.
llvm-svn: 159090
2012-06-23 22:33:14 +00:00
Craig Topper 7e9415220a Make CVTDQ2PS instruction use SSE2 predicate instead of SSE1. No functional change because there are no patterns in the instructions. Also fix a typo in a comment.
llvm-svn: 159087
2012-06-23 20:52:45 +00:00
Craig Topper 24e3418215 Move CVTPD2DQ to use SSE2 predicate instead of SSE3. Move DQ2PD and PD2DQ to the SSE2 section of the file.
llvm-svn: 159086
2012-06-23 20:15:42 +00:00
Craig Topper 8c03ea79c4 Use correct memory types for (V)CVTDQ2PD instructions.
llvm-svn: 159075
2012-06-23 08:30:27 +00:00
Craig Topper 431f1e7192 Remove intrinsic specific instructions for 128-bit (V)CVTDQ2PD. Replace with intrinsic patterns. Mem forms omitted because the load size is only 64-bits.
llvm-svn: 159070
2012-06-23 04:23:36 +00:00
Craig Topper 21d04fc118 Add predicate check around some patterns.
llvm-svn: 158797
2012-06-20 07:30:23 +00:00
Craig Topper 3b662a6279 Add predicate check around some patterns.
llvm-svn: 158795
2012-06-20 07:01:11 +00:00
Kay Tiong Khoo 390edb0d91 *no need to pollute Intel syntax with bonus mnemonics; operand size is explicitly specified
llvm-svn: 158603
2012-06-16 17:19:49 +00:00
Craig Topper bf2409e8aa Mark several instructions SSE2 instead of SSE3 as they should be.
llvm-svn: 158049
2012-06-06 06:45:27 +00:00
Benjamin Kramer a0396e4583 X86: Rename the CLMUL target feature to PCLMUL.
It was renamed in gcc/gas a while ago and causes all kinds of
confusion because it was named differently in llvm and clang.

llvm-svn: 157745
2012-05-31 14:34:17 +00:00
Craig Topper c1ac05dad5 Add intrinsic for pclmulqdq instruction.
llvm-svn: 157731
2012-05-31 04:37:40 +00:00
Benjamin Kramer ef479ea854 Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions.
This required light surgery on the assembler and disassembler
because the instructions use an uncommon encoding. They are
the only two instructions in x86 that use register operands
and two immediates.

llvm-svn: 157634
2012-05-29 19:05:25 +00:00
Craig Topper 7daf897678 Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit.
llvm-svn: 156375
2012-05-08 06:58:15 +00:00
Craig Topper dbb98b4917 Fix some issues in the f16c instructions.
llvm-svn: 156287
2012-05-07 06:00:15 +00:00
Craig Topper d4e1894ec1 Add SSE4A MOVNTSS/MOVNTSD instructions.
llvm-svn: 156281
2012-05-07 05:36:19 +00:00
Nadav Rotem 810734b7f4 AVX: Add additional vbroadcast replacement sequences for integers.
Remove the v2f64 patterns because it does not match any vbroadcast
instruction.

llvm-svn: 155461
2012-04-24 18:09:59 +00:00
Nadav Rotem aa3ff8da00 AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions
using the pattern (vbroadcast (i32load src)). In some cases, after we generate
this pattern new users are added to the load node, which prevent the selection
of the blend pattern. This commit provides fallback patterns which perform
in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1).

llvm-svn: 155437
2012-04-24 11:07:03 +00:00
Elena Demikhovsky 8d7e56c409 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Craig Topper 4badeb3f0d Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes.
llvm-svn: 154801
2012-04-16 07:13:00 +00:00
Craig Topper c0075aa7ff Flip the arguments when converting vpermd/vpermps intrinsics into instructions. The intrinsic has the mask as the last operand, but the instruction has it as the second.
llvm-svn: 154797
2012-04-16 06:26:15 +00:00
Craig Topper b86fa404d3 Merge vpermps/vpermd and vpermpd/vpermq SD nodes.
llvm-svn: 154782
2012-04-16 00:41:45 +00:00
Craig Topper bfc9a5f7d3 Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors.
llvm-svn: 154778
2012-04-15 22:43:31 +00:00
Nadav Rotem 42bcd04ee3 Fix PR12529. The Vxx family of instructions are only supported by AVX.
Use non-vex instructions for SSE4.

llvm-svn: 154770
2012-04-15 19:36:44 +00:00
Elena Demikhovsky 779a72b49e Added VPERM optimization for AVX2 shuffles
llvm-svn: 154761
2012-04-15 11:18:59 +00:00
Craig Topper d0271b27cb Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions.
llvm-svn: 154580
2012-04-12 07:23:00 +00:00
Nadav Rotem 9bc178ac5c Reapply 154396 after fixing a test.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483
2012-04-11 06:40:27 +00:00
Eric Christopher 65ada95b84 Temporarily revert this patch to see if it brings the buildbots back.
llvm-svn: 154425
2012-04-10 19:33:16 +00:00
Nadav Rotem f934f91709 Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396
2012-04-10 14:33:13 +00:00
Craig Topper d024cef233 Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.
llvm-svn: 154272
2012-04-07 22:32:29 +00:00
Craig Topper aa9aab5ad2 Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
llvm-svn: 154268
2012-04-07 21:57:43 +00:00
Craig Topper 7629d63bc4 Add support for AVX enhanced comparison predicates. Patch from Kay Tiong Khoo.
llvm-svn: 153935
2012-04-03 05:20:24 +00:00
Chad Rosier 4106917355 [avx] Add patterns for combining vextractf128 + vmovaps/vmovups/vmobdqu to
vextractf128 with 128-bit mem dest.

Combines

	vextractf128 $0, %ymm0, %xmm0
	vmovaps %xmm0, (%rdi)

to

    vextractf128 $0, %ymm0, (%rdi)

rdar://11082570

llvm-svn: 153139
2012-03-20 21:43:40 +00:00
Chad Rosier 0158ae2e5b [avx] Add the AddedComplexity to the VINSERTI128 avx2 patterns to give
precedence over the VINSERTF128 avx1 patterns.

llvm-svn: 153114
2012-03-20 19:45:07 +00:00
Chad Rosier 93d5427c69 Whitespace.
llvm-svn: 153105
2012-03-20 18:38:33 +00:00
Chad Rosier 5a6011267a [avx] Move the vextractf128 patterns closer to the vextractf128 def. Remove
whitespace from test case.  No functional change intended.

llvm-svn: 153103
2012-03-20 18:24:55 +00:00
Chad Rosier 07a4cb9382 [avx] Adjust the VINSERTF128rm pattern to allow for unaligned loads.
This results in things such as

	vmovups	16(%rdi), %xmm0
	vinsertf128	$1, %xmm0, %ymm0, %ymm0

to be combined to

    vinsertf128	$1, 16(%rdi), %ymm0, %ymm0

rdar://11076953

llvm-svn: 153092
2012-03-20 17:08:51 +00:00