Craig Topper
08a6857c82
[AVX512] Fix copy/paste mistake a I made in a comment.
...
llvm-svn: 270331
2016-05-21 22:50:04 +00:00
Michael Zuckerman
11b55b29d1
[Clang][AVX512][intrinsics] Fix vscalef intrinsics.
...
Differential Revision: http://reviews.llvm.org/D20324
llvm-svn: 270321
2016-05-21 11:09:53 +00:00
Craig Topper
02626c076b
[AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled.
...
llvm-svn: 270318
2016-05-21 07:08:56 +00:00
Craig Topper
19e04b6430
[X86] Generalize and combine some similar type constraints and node types. No changes to the isel table size so the separation wasn't buying us anything.
...
llvm-svn: 270026
2016-05-19 06:13:58 +00:00
Craig Topper
74ed087b0b
[AVX512] Strengthen type checks on the X86ISD::SELECT node. Saves over 800 bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands.
...
llvm-svn: 269885
2016-05-18 06:55:59 +00:00
Craig Topper
a58abd1cc6
[AVX512] Fix up types for arguments of int_x86_avx512_mask_cvtsd2ss_round and int_x86_avx512_mask_cvtss2sd_round. Only the argument being converted should be a different type. The other 2 argument should have the same type as the result.
...
llvm-svn: 268891
2016-05-09 05:34:12 +00:00
Craig Topper
707c89c00d
[AVX512] Add non-temporal store patterns for v16i32/v32i16/v64i8.
...
llvm-svn: 268889
2016-05-08 23:43:17 +00:00
Craig Topper
c41320d700
[AVX512] Add missing patterns for non-temporal stores of 128/256-bit vXi8/vXi16/vXi32 when VLX is enabled. The equivalent AVX1/2 patterns are disabled by VLX.
...
This caused regular stores to be emitted instead.
llvm-svn: 268886
2016-05-08 23:08:45 +00:00
Craig Topper
e5ce84a33c
[AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used.
...
llvm-svn: 268884
2016-05-08 21:33:53 +00:00
Craig Topper
9d9251b86f
[X86] Remove extra patterns that check for BUILD_VECTOR of all 0s. These are always canonicalized to v4i32/v8i32/v16i32 except for in SSE1 only when only v4f32 is supported.
...
llvm-svn: 268880
2016-05-08 20:10:20 +00:00
Igor Breger
58c07806ae
[AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC.
...
Differential Revision: http://reviews.llvm.org/D19860
llvm-svn: 268375
2016-05-03 11:51:45 +00:00
Craig Topper
b6da65403a
[AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW.
...
llvm-svn: 268200
2016-05-01 17:38:32 +00:00
Igor Breger
131008fbcb
Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth.
...
Differential Revision: http://reviews.llvm.org/D19579
llvm-svn: 268190
2016-05-01 08:40:00 +00:00
Craig Topper
5acb5a1caf
[AVX512] Add HasVLX to the 128/256-bit versions of VPACKSSDW/USDW/SSWB/USWB and VPMADDUBSW/VPMADDWD.
...
llvm-svn: 268188
2016-05-01 06:24:57 +00:00
Craig Topper
db290664f6
[AVX512] Make sure 128/256-bit DQI versions of VAND/VANDN/VOR/VXOR are also marked as requiring VLX.
...
llvm-svn: 268186
2016-05-01 05:57:06 +00:00
Craig Topper
7ed84d826e
[X86] Remove some redundant selection patterns.
...
llvm-svn: 268180
2016-05-01 04:59:46 +00:00
Craig Topper
c9b1923358
[AVX512] Replace vector_extract with extractelt in some patterns. They mean the same thing but vector_extract is deprecated. NFC
...
llvm-svn: 268179
2016-05-01 04:59:44 +00:00
Craig Topper
99f6b620cc
[AVX512] Add hasSideEffects/mayLoad/mayStore flags to some instructions.
...
llvm-svn: 268174
2016-05-01 01:03:56 +00:00
Elena Demikhovsky
5e426f7356
AVX-512: Load and Extended Load for i1 vectors
...
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.
Differential Revision: http://reviews.llvm.org/D18737
llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Elena Demikhovsky
95629caaa9
AVX-512: fixed a bug in fp_to_uint pattern on KNL
...
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>
Differential Revision: http://reviews.llvm.org/D18512
llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Igor Breger
999ac754f2
AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.
...
Differential Revision: http://reviews.llvm.org/D17953
llvm-svn: 262929
2016-03-08 15:21:25 +00:00
Igor Breger
f1bd761e00
AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
...
Differential Revision: http://reviews.llvm.org/D17763
llvm-svn: 262797
2016-03-06 07:46:03 +00:00
Igor Breger
639fde79b0
AVX512: Combine AND + TESTM instructions .
...
Differential Revision: http://reviews.llvm.org/D17844
llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Craig Topper
c929349912
[X86] Null out some redundant patterns for masked vector register to register moves. These can be accomplished with both aligned and unaligned opcodes.
...
Currently aligned is what is being used so remove the redundant patterns for the unaligned versions. But don't do this for the byte and word vector types since they don't have aligned versions.
llvm-svn: 261985
2016-02-26 06:50:29 +00:00
Igor Breger
45ef10f110
AVX512F: Add GATHER/SCATTER assembler Intel syntax tests for knl/skx/avx . Change memory operand parser handling.
...
Differential Revision: http://reviews.llvm.org/D17564
llvm-svn: 261862
2016-02-25 13:30:17 +00:00
Elena Demikhovsky
e5bbca6ae2
Optimized loading (zextload) of i1 value from memory.
...
This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793 .
Extra "and" causes performance degradation.
We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits.
Differential Revision: http://reviews.llvm.org/D17541
llvm-svn: 261828
2016-02-25 07:05:12 +00:00
Igor Breger
c7ba5699c5
AVX512: Add vpmovzxbw/d/q ,vpmovzxw/d/q ,vpmovzxbdq lowering patterns that support 256bit inputs like AVX patterns ( that are disable in case HasVLX , see SS41I_pmovx_avx2_patterns).
...
Differential Revision: http://reviews.llvm.org/D17504
llvm-svn: 261724
2016-02-24 08:15:20 +00:00
Igor Breger
252c2d9680
AVX512F: Add assembler Intel syntax tests for knl, fix minor bugs.
...
Differential Revision: http://reviews.llvm.org/D17498
llvm-svn: 261521
2016-02-22 12:37:41 +00:00
Igor Breger
4511e76e5c
AVX512: Fix scalar mem operands.
...
Differential Revision: http://reviews.llvm.org/D17500
llvm-svn: 261520
2016-02-22 11:48:27 +00:00
Dimitry Andric
db417b6d40
Fix incorrect selection of AVX512 sqrt when OptForSize is on
...
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition. Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.
In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.
Reviewers: grosbach, resistor, joker.eph
Subscribers: spatel, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D17414
llvm-svn: 261360
2016-02-19 20:14:11 +00:00
Asaf Badouh
ad5c3fc47d
[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
...
Differential Revision: http://reviews.llvm.org/D16629
llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Igor Breger
0aeda37464
AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.
...
Differential Revision: http://reviews.llvm.org/D16813
llvm-svn: 260024
2016-02-07 08:30:50 +00:00
Simon Pilgrim
7823fd2535
[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
...
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.
This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....
llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Simon Pilgrim
6788f33cf2
[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
...
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
Differential Revision: http://reviews.llvm.org/D16729
llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Michael Zuckerman
7d73360479
[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
...
Differential Revision: http://reviews.llvm.org/D16589
llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim
18bcf93efb
[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
...
Follow up to D16217 and D16729
This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD
Differential Revision: http://reviews.llvm.org/D16768
llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Asaf Badouh
5a3a0231f4
[X86][AVX512VBMI] add encoding and intrinsics for Multishift
...
Differential Revision: http://reviews.llvm.org/D16399
llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Igor Breger
fca0a34398
AVX512: Fix truncate v32i8 to v32i1 lowering implementation.
...
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.
Differential Revision: http://reviews.llvm.org/D16531
llvm-svn: 259044
2016-01-28 13:19:25 +00:00
Asaf Badouh
42852d99e7
[X86][AVX512] small fix in ptestm intrinsics
...
move ptestm{q|d} intrinsics from patterns form (in td file) to the intrinsics table
Differential Revision: http://reviews.llvm.org/D16633
llvm-svn: 259029
2016-01-28 08:33:22 +00:00
Igor Breger
d6c187b038
AVX512: Add store mask patterns.
...
Differential Revision: http://reviews.llvm.org/D16596
llvm-svn: 258914
2016-01-27 08:43:25 +00:00
Asaf Badouh
655822ab7e
[X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer
...
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
Differential Revision: http://reviews.llvm.org/D16407
llvm-svn: 258680
2016-01-25 11:14:24 +00:00
Igor Breger
1e5bafbc82
AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.
...
Differential Revision: http://reviews.llvm.org/D16137
llvm-svn: 258657
2016-01-24 08:04:33 +00:00
Igor Breger
7a000f5bb2
AVX512: Masked move intrinsic implementation.
...
Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD.
Differential Revision: http://reviews.llvm.org/D16316
llvm-svn: 258398
2016-01-21 14:18:11 +00:00
Igor Breger
d3341f5021
AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation.
...
Differential Revision: http://reviews.llvm.org/D16350
llvm-svn: 258309
2016-01-20 13:11:47 +00:00
Michael Zuckerman
4582bdab12
[AVX512] Adding VPERMT2B and VPERMI2B instruction .
...
Differential Revision: http://reviews.llvm.org/D16297
llvm-svn: 258161
2016-01-19 18:47:02 +00:00
Michael Zuckerman
d9cac592f4
[AVX512] Adding VPERMB instruction
...
Differential Revision: http://reviews.llvm.org/D16294
llvm-svn: 258144
2016-01-19 17:07:43 +00:00
Asaf Badouh
d4a0d9a78c
[X86][AVX512]fix dag & add intrinsics for fixupimm
...
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics
Differential Revision: http://reviews.llvm.org/D16313
llvm-svn: 258124
2016-01-19 14:21:39 +00:00
Igor Breger
239fda676c
AVX512: Masked store intrinsic implementation.
...
Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD.
Differential Revision: http://reviews.llvm.org/D16271
llvm-svn: 258047
2016-01-18 13:52:57 +00:00
Igor Breger
dd6522c653
AVX512 : Change v8i1 bitconvert GR8 pattern, remove unnecessary movzbl instruction.
...
code example , previous implementation.
movzbl %dil, %eax
kmovw %eax, %k0
new code
kmovw %edi, %k0
Differential Revision: http://reviews.llvm.org/D16287
llvm-svn: 258045
2016-01-18 12:02:45 +00:00
Michael Zuckerman
298a680c80
[AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics
...
Differential Revision: http://reviews.llvm.org/D16052
llvm-svn: 257594
2016-01-13 12:39:33 +00:00