Commit Graph

387 Commits

Author SHA1 Message Date
Craig Topper c1e7b3f6ca [X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER.
Now we consistently represent the mask result without relying on isel ignoring it.

We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints.

llvm-svn: 318821
2017-11-22 07:11:03 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
Craig Topper f26fc823ec [X86] Simplify type constraints for AVX2 masked gather.
We don't need separate 32 and 64 node types. We can use SDTCisInt and SDTCisSameSizeAs to ensure the mask size the result type and is integer.

llvm-svn: 318732
2017-11-21 06:28:15 +00:00
Craig Topper d219220bd9 [X86] Simplify the predicates for avx2 masked gather patterns.
We don't need a dyn_cast and we only need to check the type of the index. The base ptr is guaranteed to be scalar.

llvm-svn: 318730
2017-11-21 06:01:20 +00:00
Mohammed Agabaria 115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Craig Topper 5d404f58dd [X86] Simplify the gather/scatter isel predicates.
We don't need a dyn_cast, the predicate already specified the base node. We only need to check the type of the index, the base ptr is guaranteed to be scalar.

llvm-svn: 318596
2017-11-18 19:05:12 +00:00
Craig Topper 16a91cee6c [X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 mask instead of a VK4 mask.
This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction.

While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking.

llvm-svn: 318278
2017-11-15 07:46:43 +00:00
Craig Topper 0af48f1ad4 [X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with and without the rounding operand. NFCI
I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations.

llvm-svn: 318007
2017-11-13 02:02:58 +00:00
Craig Topper b42a23ff8f [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.
This fixes a bug where we selected packed instructions for scalar intrinsics.

llvm-svn: 317999
2017-11-12 18:51:09 +00:00
Craig Topper 7a6e294a6c [X86] Make X86ISD::FMADDS3 isel patterns commutable.
This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND.

llvm-svn: 317769
2017-11-09 06:17:05 +00:00
Craig Topper cf8e6d0a76 [X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics.
Looks like there's some missed load folding opportunities for i64 loads.

llvm-svn: 317544
2017-11-07 07:13:03 +00:00
Craig Topper 07dac55d95 [X86] Add scalar FMA ISD nodes without rounding mode. NFC
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.

llvm-svn: 317453
2017-11-06 05:48:25 +00:00
Craig Topper 692c8efe30 [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled.
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.

Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.

I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.

As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.

This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.

Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.

Reviewers: zvi, DavidKreitzer, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39583

llvm-svn: 317413
2017-11-04 18:26:41 +00:00
Simon Pilgrim dac6fd4170 [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.
The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations.

Differential Revision: https://reviews.llvm.org/D37949

llvm-svn: 314207
2017-09-26 14:12:50 +00:00
Craig Topper 47e14ead54 [X86] Make IFMA instructions during isel so we can fold broadcast loads.
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.

llvm-svn: 314083
2017-09-24 19:30:55 +00:00
Craig Topper 092c2f4357 [X86] Move the getInsertVINSERTImmediate and getExtractVEXTRACTImmediate helper functions over to X86ISelDAGToDAG.cpp
Redefine them to call getI8Imm and return that directly.

llvm-svn: 314059
2017-09-23 05:34:07 +00:00
Craig Topper 492282d4e2 [X86] Remove is the isVINSERT*Index/isVEXTRACT*Index predicates from isel.
The only insert_subvector/extract_subvector nodes that make it to isel are guaranteed to match.

llvm-svn: 314058
2017-09-23 05:34:06 +00:00
Craig Topper 75370b9b49 [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns
Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns.

Differential Revision: https://reviews.llvm.org/D38022

llvm-svn: 313644
2017-09-19 17:19:45 +00:00
Craig Topper fc52eb37af [X86] Strengthen some of the SD type constraints in X86InstrFragmentsSIMD.td
This effects the vector shift and rotates as well as some of the vector compares.

The changes to the shifts by immediates allows a few hundred bytes to be removed by removing type checks for the size of the immediate containing the shift/rotate amount.

llvm-svn: 313512
2017-09-18 05:50:54 +00:00
Craig Topper af0b992b04 [X86] Mark the FMA nodes as commutable so tablegen will auto generate the patterns.
This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands.

This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512.

I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this.

We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have.

llvm-svn: 312469
2017-09-04 06:59:50 +00:00
Craig Topper f1417ca625 [X86] Remove X86ISD::FMADD in favor ISD::FMA
There's no reason to have a target specific node with the same semantics as a target independent opcode.

This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp

Differential Revision: https://reviews.llvm.org/D36983

llvm-svn: 311568
2017-08-23 16:28:04 +00:00
Craig Topper 4de6f583da [X86] Merge all of the vecload and alignedload predicates into single predicates.
We can load the memory VT and check for natural alignment. This also adds a new preferNonTemporalLoad helper that checks the correct subtarget feature based on the load size.

This shrinks the isel table by at least 5000 bytes by allowing more reordering and combining to occur.

llvm-svn: 311266
2017-08-19 23:21:22 +00:00
Craig Topper afa69eecbb [X86] Converge alignedstore/alignedstore256/alignedstore512 to a single predicate.
We can read the memoryVT and get its store size directly from the SDNode to check its alignment.

llvm-svn: 311265
2017-08-19 23:21:21 +00:00
Craig Topper 5960848060 [X86] Remove memopmmx pattern fragment
Summary: Just like the FIXME says, there is no alignment requirement for MMX.

Reviewers: RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36815

llvm-svn: 311090
2017-08-17 15:25:05 +00:00
Craig Topper bbe3e46bb9 [X86] Remove patterns for PALIGNR with non-vXi8 types.
llvm-svn: 311058
2017-08-17 01:48:00 +00:00
Craig Topper 2dfc7889fd [X86] Remove unused pattern fragment that referenced MVT::i1. NFC
llvm-svn: 310799
2017-08-13 20:04:05 +00:00
Craig Topper 050c9c8f83 [X86] Prevent selecting masked aligned load instructions if the load should be non-temporal
Summary: The aligned load predicates don't  suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35712

llvm-svn: 309079
2017-07-26 04:31:04 +00:00
Simon Pilgrim c338ba48fc [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memops
The 'scalar' simd bitops were dropped a while ago

llvm-svn: 306248
2017-06-25 17:04:58 +00:00
Simon Pilgrim bed1fa1ac1 Strip trailing whitespace. NFCI.
llvm-svn: 306247
2017-06-25 16:57:46 +00:00
Elena Demikhovsky 2dac0b4d58 AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.

In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.

Differential revision: https://reviews.llvm.org/D34343

llvm-svn: 305987
2017-06-22 06:47:41 +00:00
Simon Pilgrim b079c8b35b [X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls
First possible step towards merging SSE/AVX memory folding pattern fragments.

Also allows us to remove the duplicate non-temporal load logic.

Differential Revision: https://reviews.llvm.org/D33902

llvm-svn: 305184
2017-06-12 10:01:27 +00:00
Simon Pilgrim 807b708d13 [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Missed SSE41 non-temporal load case in previous commit

Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304722
2017-06-05 16:45:32 +00:00
Simon Pilgrim a25bf0b6b9 [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304717
2017-06-05 15:43:03 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Craig Topper 111b2d6997 [X86] Remove unused SDTypeProfile. NFC
llvm-svn: 297594
2017-03-12 23:05:03 +00:00
Craig Topper 2b92542908 [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.
This allows us to remove a duplicate set of patterns.

llvm-svn: 297593
2017-03-12 23:05:00 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper 56d4022997 [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.

I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.

Differential revision: https://reviews.llvm.org/D30186

llvm-svn: 295810
2017-02-22 06:54:18 +00:00
Igor Breger fda32d266a [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. 
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

llvm-svn: 295660
2017-02-20 14:16:29 +00:00
Craig Topper 06ae5e821c [X86] Tighten up some of the SDNode type constraints.
llvm-svn: 295588
2017-02-19 01:54:47 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Craig Topper 3b7e823f92 [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
2017-01-30 00:06:01 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Craig Topper 6413f8a8f2 [X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.

I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.

I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.

Reviewers: spatel, delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27401

llvm-svn: 288771
2016-12-06 04:58:39 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Craig Topper 871da8ebea [X86] Remove unused PatFrags. NFC
llvm-svn: 283523
2016-10-07 06:54:39 +00:00
Peter Collingbourne 2261d78cd2 Target: Remove unused patterns and transforms. NFC.
llvm-svn: 283515
2016-10-07 00:30:49 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper aeca0460f3 [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode because isel is not robust with multiple type profiles for the same opcode.
llvm-svn: 282340
2016-09-24 21:42:47 +00:00
Craig Topper 7e664dad60 [AVX-512] Remove the patterns for selecting scalar VCOMI/VUCOMI instructions with SAE as there is no way to create the pattern.
llvm-svn: 282339
2016-09-24 21:42:43 +00:00
Craig Topper a02e394872 [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes for each type constraint.
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.

llvm-svn: 282231
2016-09-23 06:24:43 +00:00
Craig Topper 3174b6e467 [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't reuse non-X86 ISD opcodes with extra X86 specific arguments.
llvm-svn: 282230
2016-09-23 06:24:39 +00:00
Craig Topper d70ec9b25e [AVX-512] Use different ISD opcodes for some of the scalar intrinsic lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate.
llvm-svn: 282229
2016-09-23 06:24:35 +00:00
Craig Topper 29f1a1f834 [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.

We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.

With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.

llvm-svn: 282055
2016-09-21 06:37:54 +00:00
Craig Topper d868870f17 [AVX-512] Don't add an additional rounding mode operand to the avx512 vcvtps2ph intrinsic lowering.
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.

This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.

llvm-svn: 282052
2016-09-21 03:58:44 +00:00
Craig Topper e18258dc1c [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
llvm-svn: 282046
2016-09-21 02:05:22 +00:00
Elena Demikhovsky b906df9fe5 AVX-512: Fix for PR28175 - Scalar code optimization.
Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32.
Optimization of this patterns is a one more step towards i1 optimization on AVX-512.

Differential Revision: https://reviews.llvm.org/D24456

llvm-svn: 281302
2016-09-13 07:57:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper e3807febd8 [X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected.
The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD.

I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were.

llvm-svn: 280644
2016-09-05 02:20:49 +00:00
Craig Topper f8ad647b93 [X86] Strengthen some SDNode type constraints.
llvm-svn: 280463
2016-09-02 04:25:33 +00:00
Craig Topper 850feaf3b7 [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.
llvm-svn: 279951
2016-08-28 22:20:51 +00:00
Craig Topper 80c8b80919 [X86] Mark some of the X86 SDNodes as commutative.
llvm-svn: 278653
2016-08-15 04:47:30 +00:00
Craig Topper dbc387cfc9 [X86] X86ISD::FANDN is not commutative or associative.
llvm-svn: 278652
2016-08-15 04:47:28 +00:00
Craig Topper 05629d05c7 [X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions.
llvm-svn: 276555
2016-07-24 07:32:45 +00:00
Ahmed Bougacha 0851ecd1b0 [X86] Remove dead ISD opcodes. NFC.
llvm-svn: 273716
2016-06-24 20:37:55 +00:00
Igor Breger e59165ca63 [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Differential Revision: http://reviews.llvm.org/D20897

llvm-svn: 273138
2016-06-20 07:05:43 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Simon Pilgrim e85506b6e0 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

llvm-svn: 271633
2016-06-03 08:06:03 +00:00
Ahmed Bougacha 63f78b0206 [X86] Define segment MI operands as regs instead of i8imm.
We've been pretending that segments are i8imm since the initial
support (r68645), predating the addition of the SEGMENT_REG class
(r81895).  That happens to works, but is wrong, and inconsistent
with how we print (e.g., X86ATTInstPrinter::printMemReference)
and parse them (e.g., X86Operand::addMemOperands).

This change shouldn't affect any tool users, but is visible to
library users or out-of-tree tablegen backends: this causes
MCOperandInfo for the segment op to have an RC instead of "unknown",
and TII::getRegClass to actually return something.  As the registers
are reserved and no vregs of the class ever created, that shouldn't
change anything.

No test change; no suspicious getRegClass() in X86 and CodeGen.

llvm-svn: 271559
2016-06-02 18:29:15 +00:00
Michael Zuckerman a63a129749 [Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics.
Differential Revision: http://reviews.llvm.org/D20438

llvm-svn: 270322
2016-05-21 14:44:18 +00:00
Michael Zuckerman 11b55b29d1 [Clang][AVX512][intrinsics] Fix vscalef intrinsics.
Differential Revision: http://reviews.llvm.org/D20324

llvm-svn: 270321
2016-05-21 11:09:53 +00:00
Craig Topper 19e04b6430 [X86] Generalize and combine some similar type constraints and node types. No changes to the isel table size so the separation wasn't buying us anything.
llvm-svn: 270026
2016-05-19 06:13:58 +00:00
Craig Topper 9152f5fcdf [X86] Simplify some type constraints by removing parts that were already implied.
llvm-svn: 270025
2016-05-19 06:13:48 +00:00
Craig Topper 4fcff19ff5 [X86] Remove some type constraint classes and use already existing stricter classes.
llvm-svn: 270013
2016-05-19 02:05:58 +00:00
Craig Topper 7ee092a268 [AVX512] Strengthen type constraints for VFIXUPIMM patterns and combine the type constraints for vector and scalar.
llvm-svn: 270012
2016-05-19 02:05:55 +00:00
Craig Topper 095fc41523 [AVX512] Strengthen type constraints on my rounding mode inputs and some immediate inputs.
llvm-svn: 269886
2016-05-18 06:56:01 +00:00
Craig Topper 74ed087b0b [AVX512] Strengthen type checks on the X86ISD::SELECT node. Saves over 800 bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands.
llvm-svn: 269885
2016-05-18 06:55:59 +00:00
Craig Topper a5d0bf5c36 [X86] Strengthen some type contraints for floating point round and extend.
llvm-svn: 268892
2016-05-09 05:34:14 +00:00
Craig Topper a58abd1cc6 [AVX512] Fix up types for arguments of int_x86_avx512_mask_cvtsd2ss_round and int_x86_avx512_mask_cvtss2sd_round. Only the argument being converted should be a different type. The other 2 argument should have the same type as the result.
llvm-svn: 268891
2016-05-09 05:34:12 +00:00
Simon Pilgrim e5e04baf95 [X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN lowering
movmsk.ll tests are unchanged.

llvm-svn: 268237
2016-05-02 14:58:22 +00:00
Simon Pilgrim cd0dfc93eb [X86][SSE] Support for MOVMSK signbit extraction instructions
Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR.

This is an early step towards more optimal handling of vector comparison results.

Differential Revision: http://reviews.llvm.org/D18741

llvm-svn: 265266
2016-04-03 18:22:03 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Igor Breger 639fde79b0 AVX512: Combine AND + TESTM instructions .
Differential Revision: http://reviews.llvm.org/D17844

llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Ahmed Bougacha f3cccab1e0 [X86] Remove the now-unused X86ISD::PSIGN. NFC.
llvm-svn: 261025
2016-02-16 22:14:12 +00:00
Asaf Badouh ad5c3fc47d [X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
Differential Revision: http://reviews.llvm.org/D16629

llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Simon Pilgrim 18bcf93efb [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
Follow up to D16217 and D16729

This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD

Differential Revision: http://reviews.llvm.org/D16768

llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Sanjay Patel c54600dbb1 fix typos; NFC
llvm-svn: 259438
2016-02-01 23:53:35 +00:00
Asaf Badouh 5a3a0231f4 [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Asaf Badouh 655822ab7e [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
 VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators

Differential Revision: http://reviews.llvm.org/D16407

llvm-svn: 258680
2016-01-25 11:14:24 +00:00
Asaf Badouh d4a0d9a78c [X86][AVX512]fix dag & add intrinsics for fixupimm
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics

Differential Revision: http://reviews.llvm.org/D16313

llvm-svn: 258124
2016-01-19 14:21:39 +00:00
Michael Zuckerman 298a680c80 [AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics
Differential Revision: http://reviews.llvm.org/D16052

llvm-svn: 257594
2016-01-13 12:39:33 +00:00