Commit Graph

310 Commits

Author SHA1 Message Date
Craig Topper 050c9c8f83 [X86] Prevent selecting masked aligned load instructions if the load should be non-temporal
Summary: The aligned load predicates don't  suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35712

llvm-svn: 309079
2017-07-26 04:31:04 +00:00
Simon Pilgrim c338ba48fc [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memops
The 'scalar' simd bitops were dropped a while ago

llvm-svn: 306248
2017-06-25 17:04:58 +00:00
Simon Pilgrim bed1fa1ac1 Strip trailing whitespace. NFCI.
llvm-svn: 306247
2017-06-25 16:57:46 +00:00
Elena Demikhovsky 2dac0b4d58 AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.

In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.

Differential revision: https://reviews.llvm.org/D34343

llvm-svn: 305987
2017-06-22 06:47:41 +00:00
Simon Pilgrim b079c8b35b [X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls
First possible step towards merging SSE/AVX memory folding pattern fragments.

Also allows us to remove the duplicate non-temporal load logic.

Differential Revision: https://reviews.llvm.org/D33902

llvm-svn: 305184
2017-06-12 10:01:27 +00:00
Simon Pilgrim 807b708d13 [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Missed SSE41 non-temporal load case in previous commit

Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304722
2017-06-05 16:45:32 +00:00
Simon Pilgrim a25bf0b6b9 [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304717
2017-06-05 15:43:03 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Craig Topper 111b2d6997 [X86] Remove unused SDTypeProfile. NFC
llvm-svn: 297594
2017-03-12 23:05:03 +00:00
Craig Topper 2b92542908 [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.
This allows us to remove a duplicate set of patterns.

llvm-svn: 297593
2017-03-12 23:05:00 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper 56d4022997 [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.

I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.

Differential revision: https://reviews.llvm.org/D30186

llvm-svn: 295810
2017-02-22 06:54:18 +00:00
Igor Breger fda32d266a [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. 
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

llvm-svn: 295660
2017-02-20 14:16:29 +00:00
Craig Topper 06ae5e821c [X86] Tighten up some of the SDNode type constraints.
llvm-svn: 295588
2017-02-19 01:54:47 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Craig Topper 3b7e823f92 [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
2017-01-30 00:06:01 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Craig Topper 6413f8a8f2 [X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.

I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.

I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.

Reviewers: spatel, delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27401

llvm-svn: 288771
2016-12-06 04:58:39 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Craig Topper 871da8ebea [X86] Remove unused PatFrags. NFC
llvm-svn: 283523
2016-10-07 06:54:39 +00:00
Peter Collingbourne 2261d78cd2 Target: Remove unused patterns and transforms. NFC.
llvm-svn: 283515
2016-10-07 00:30:49 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper aeca0460f3 [AVX-512] Split scalar version of X86ISD::SELECT into a separate opcode because isel is not robust with multiple type profiles for the same opcode.
llvm-svn: 282340
2016-09-24 21:42:47 +00:00
Craig Topper 7e664dad60 [AVX-512] Remove the patterns for selecting scalar VCOMI/VUCOMI instructions with SAE as there is no way to create the pattern.
llvm-svn: 282339
2016-09-24 21:42:43 +00:00
Craig Topper a02e394872 [AVX-512] Split X86ISD::VFPROUND and X86ISD::VFPEXT into separate opcodes for each type constraint.
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.

llvm-svn: 282231
2016-09-23 06:24:43 +00:00
Craig Topper 3174b6e467 [AVX-512] Add separate ISD opcodes for each form of CVT instructions. Don't reuse non-X86 ISD opcodes with extra X86 specific arguments.
llvm-svn: 282230
2016-09-23 06:24:39 +00:00
Craig Topper d70ec9b25e [AVX-512] Use different ISD opcodes for some of the scalar intrinsic lowering. Isel is not very robust against using the same ISD opcode with different number of operands so its better to separate.
llvm-svn: 282229
2016-09-23 06:24:35 +00:00
Craig Topper 29f1a1f834 [AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.

We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.

With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.

llvm-svn: 282055
2016-09-21 06:37:54 +00:00
Craig Topper d868870f17 [AVX-512] Don't add an additional rounding mode operand to the avx512 vcvtps2ph intrinsic lowering.
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.

This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.

llvm-svn: 282052
2016-09-21 03:58:44 +00:00
Craig Topper e18258dc1c [AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
llvm-svn: 282046
2016-09-21 02:05:22 +00:00
Elena Demikhovsky b906df9fe5 AVX-512: Fix for PR28175 - Scalar code optimization.
Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32.
Optimization of this patterns is a one more step towards i1 optimization on AVX-512.

Differential Revision: https://reviews.llvm.org/D24456

llvm-svn: 281302
2016-09-13 07:57:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper e3807febd8 [X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected.
The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD.

I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were.

llvm-svn: 280644
2016-09-05 02:20:49 +00:00
Craig Topper f8ad647b93 [X86] Strengthen some SDNode type constraints.
llvm-svn: 280463
2016-09-02 04:25:33 +00:00
Craig Topper 850feaf3b7 [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.
llvm-svn: 279951
2016-08-28 22:20:51 +00:00
Craig Topper 80c8b80919 [X86] Mark some of the X86 SDNodes as commutative.
llvm-svn: 278653
2016-08-15 04:47:30 +00:00
Craig Topper dbc387cfc9 [X86] X86ISD::FANDN is not commutative or associative.
llvm-svn: 278652
2016-08-15 04:47:28 +00:00
Craig Topper 05629d05c7 [X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions.
llvm-svn: 276555
2016-07-24 07:32:45 +00:00
Ahmed Bougacha 0851ecd1b0 [X86] Remove dead ISD opcodes. NFC.
llvm-svn: 273716
2016-06-24 20:37:55 +00:00
Igor Breger e59165ca63 [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Differential Revision: http://reviews.llvm.org/D20897

llvm-svn: 273138
2016-06-20 07:05:43 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Simon Pilgrim e85506b6e0 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

llvm-svn: 271633
2016-06-03 08:06:03 +00:00