Commit Graph

210 Commits

Author SHA1 Message Date
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Craig Topper 353e59b6d6 [AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
2016-11-14 01:53:22 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper 43e97649a1 [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.

llvm-svn: 286754
2016-11-13 07:26:15 +00:00
Craig Topper 706d897d8a [AVX-512] Move masked shift intrinsics tests to the autoupgrade test file. These missed being moved in r286725.
llvm-svn: 286746
2016-11-13 03:42:27 +00:00
Craig Topper da6a63db1c [AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.

llvm-svn: 286725
2016-11-12 18:04:46 +00:00
Craig Topper 9d25c5e2fa [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26333

llvm-svn: 286711
2016-11-12 05:28:24 +00:00
Craig Topper b110e04851 [AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext.
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.

llvm-svn: 286092
2016-11-07 02:12:57 +00:00
Craig Topper 812d3d30ae [AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.

Reviewers: igorb, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25933

llvm-svn: 285173
2016-10-26 04:59:58 +00:00
Craig Topper 8ec5c7326d [AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR.
Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow.

llvm-svn: 284955
2016-10-24 04:04:16 +00:00
Craig Topper b084c90a18 [X86] Add support for printing shuffle comments for VALIGN instructions.
llvm-svn: 284915
2016-10-22 06:51:56 +00:00
Simon Pilgrim ca3072ac58 [X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx
llvm-svn: 284488
2016-10-18 15:45:37 +00:00
Craig Topper 72b9f9864f [AVX-512] Add test case to check shuffle decoding for masked vpermilps for r284450.
This is harder to do for vpermilpd as shuffle combining turns the constant vector into an immediate since all vpermilpd's inputs with constant vector can also be encoded with the immediate form.

llvm-svn: 284455
2016-10-18 05:44:04 +00:00
Craig Topper 4729fe8bb6 [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.
llvm-svn: 284328
2016-10-16 04:54:31 +00:00
Craig Topper 61403201ea [X86,AVX-512] Use INSERT_SUBREG instead of SUBREG_TO_REG when the input is not the output of an instruction.
SUBREG_TO_REG is supposed to indicate that the super register has been zeroed, but we can't prove that if we don't know where it came from.

llvm-svn: 281885
2016-09-19 02:53:43 +00:00
Elena Demikhovsky 0569d9d588 AVX-512: Fixed a bug in kortest.z intrinsic
Lowering was wrong - X86ISD::SETCC node should return i8 type.

llvm-svn: 281446
2016-09-14 08:06:54 +00:00
Craig Topper 4e2d5a43cf [X86] Remove the VCVTSI2SD32 with rounding intrinsic. It's not used by clang and not needed since 32-bit integer to double is always exact.
llvm-svn: 281442
2016-09-14 06:27:46 +00:00
Craig Topper 4619c9e6a8 [X86] Remove masked shufpd/shufps intrinsics and autoupgrade to native vector shuffles. They were removed from clang previously but accidentally left in the backend.
llvm-svn: 281300
2016-09-13 07:40:53 +00:00
Craig Topper d9ca3d97ef [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648
2016-09-05 06:43:06 +00:00
Craig Topper af0d63d2e7 [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR.
llvm-svn: 280611
2016-09-04 02:09:53 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Elena Demikhovsky dca03bebd3 AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".

Differential Revision: https://reviews.llvm.org/D23247

llvm-svn: 277958
2016-08-07 13:05:58 +00:00
Craig Topper 05948fb36c [AVX-512] Correct ExeDomain for many AVX-512 instructions.
llvm-svn: 277416
2016-08-02 05:11:15 +00:00
Craig Topper ddc96cd33d [X86] Regenerate a test to pick up shuffle comments that were added at some point.
llvm-svn: 277326
2016-08-01 07:55:24 +00:00
Craig Topper c7de3a1018 [AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX.
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.

llvm-svn: 277098
2016-07-29 02:49:08 +00:00
Craig Topper f4151bea72 [AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions.
llvm-svn: 276393
2016-07-22 05:00:52 +00:00
Craig Topper a6e6febe2c [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.
llvm-svn: 275155
2016-07-12 05:27:53 +00:00
Craig Topper 70610cf7b6 [X86] Remove and autoupgrade 512-bit non-temporal store intrinsics.
llvm-svn: 274966
2016-07-09 04:38:27 +00:00
Matthias Braun 152e7c8b12 VirtRegMap: Replace some identity copies with KILL instructions.
An identity COPY like this:
   %AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.

Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).

llvm-svn: 274952
2016-07-09 00:19:07 +00:00
Craig Topper f7bf6de0af [AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift intrinsics.
I'm not sure if clang ever used these builtin names or not.

llvm-svn: 274827
2016-07-08 06:14:47 +00:00
Michael Kuperstein 3e3652aef2 Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.

The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.

llvm-svn: 274802
2016-07-07 22:50:23 +00:00
Michael Kuperstein edb38a94f8 Revert r274692 to check whether this is what breaks windows selfhost.
llvm-svn: 274771
2016-07-07 16:55:35 +00:00
Michael Kuperstein 1ef6c59b1d [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.

This fixes PR28146.

Differential Revision: http://reviews.llvm.org/D21774

llvm-svn: 274692
2016-07-06 21:56:18 +00:00
Elena Demikhovsky ad0a56f3da Re-commit of 274613.
The prev commit failed on compilation.
A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure.

llvm-svn: 274626
2016-07-06 14:15:43 +00:00
Elena Demikhovsky 02ced295aa Reverted 274613 due to compilation failue.
llvm-svn: 274615
2016-07-06 09:11:49 +00:00
Elena Demikhovsky 5a4f2476fd AVX-512: Optimization for patterns with i1 scalar type
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.

This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.

Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).

Differential revision: http://reviews.llvm.org/D21956

llvm-svn: 274613
2016-07-06 09:01:20 +00:00
Simon Pilgrim 4e96fbf3c1 [X86][AVX512] Autoupgrade the BROADCAST intrinsics
llvm-svn: 274550
2016-07-05 13:58:47 +00:00
Simon Pilgrim 02d435d2f4 [X86][AVX512] Autoupgrade the VPERMPD/VPERMQ intrinsics
llvm-svn: 274506
2016-07-04 14:19:05 +00:00
Simon Pilgrim 9fca300cbe [X86][AVX512] Autoupgrade the VPERMILPD/VPERMILPS intrinsics
llvm-svn: 274498
2016-07-04 12:40:54 +00:00
Simon Pilgrim 68ea80649b [X86][AVX512] Add support for VPERMPD/VPERMQ masked shuffle comments
llvm-svn: 274469
2016-07-03 18:40:24 +00:00
Simon Pilgrim a0d73835b2 [X86][AVX512] Add support for 512-bit shuffle decoding of VPERMPD/VPERMQ
llvm-svn: 274468
2016-07-03 18:27:37 +00:00
Simon Pilgrim 1f59076196 [X86][AVX512] Add support for VPERM/VSHUF masked shuffle comments
llvm-svn: 274462
2016-07-03 13:55:41 +00:00
Simon Pilgrim 68f438a036 [X86][AVX512] Add support for PMOVZX masked shuffle comments
llvm-svn: 274461
2016-07-03 13:33:28 +00:00
Simon Pilgrim 19adee9d84 [X86][AVX512] Autoupgrade the MOVDDUP/MOVSLDUP/MOVSHDUP intrinsics
llvm-svn: 274439
2016-07-02 14:42:35 +00:00
Craig Topper 597aa42fec [AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects.
llvm-svn: 273543
2016-06-23 07:37:33 +00:00
Craig Topper 283418fbb6 [AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 273253
2016-06-21 07:37:32 +00:00
Craig Topper 0a0fb0fda1 [AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps.
llvm-svn: 273240
2016-06-21 03:53:24 +00:00