llvm-project

Commit Graph

Author	SHA1	Message	Date
Ahmed Bougacha	63f78b0206	[X86] Define segment MI operands as regs instead of i8imm. We've been pretending that segments are i8imm since the initial support (r68645), predating the addition of the SEGMENT_REG class (r81895). That happens to works, but is wrong, and inconsistent with how we print (e.g., X86ATTInstPrinter::printMemReference) and parse them (e.g., X86Operand::addMemOperands). This change shouldn't affect any tool users, but is visible to library users or out-of-tree tablegen backends: this causes MCOperandInfo for the segment op to have an RC instead of "unknown", and TII::getRegClass to actually return something. As the registers are reserved and no vregs of the class ever created, that shouldn't change anything. No test change; no suspicious getRegClass() in X86 and CodeGen. llvm-svn: 271559	2016-06-02 18:29:15 +00:00
Michael Zuckerman	a63a129749	[Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics. Differential Revision: http://reviews.llvm.org/D20438 llvm-svn: 270322	2016-05-21 14:44:18 +00:00
Michael Zuckerman	11b55b29d1	[Clang][AVX512][intrinsics] Fix vscalef intrinsics. Differential Revision: http://reviews.llvm.org/D20324 llvm-svn: 270321	2016-05-21 11:09:53 +00:00
Craig Topper	19e04b6430	[X86] Generalize and combine some similar type constraints and node types. No changes to the isel table size so the separation wasn't buying us anything. llvm-svn: 270026	2016-05-19 06:13:58 +00:00
Craig Topper	9152f5fcdf	[X86] Simplify some type constraints by removing parts that were already implied. llvm-svn: 270025	2016-05-19 06:13:48 +00:00
Craig Topper	4fcff19ff5	[X86] Remove some type constraint classes and use already existing stricter classes. llvm-svn: 270013	2016-05-19 02:05:58 +00:00
Craig Topper	7ee092a268	[AVX512] Strengthen type constraints for VFIXUPIMM patterns and combine the type constraints for vector and scalar. llvm-svn: 270012	2016-05-19 02:05:55 +00:00
Craig Topper	095fc41523	[AVX512] Strengthen type constraints on my rounding mode inputs and some immediate inputs. llvm-svn: 269886	2016-05-18 06:56:01 +00:00
Craig Topper	74ed087b0b	[AVX512] Strengthen type checks on the X86ISD::SELECT node. Saves over 800 bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands. llvm-svn: 269885	2016-05-18 06:55:59 +00:00
Craig Topper	a5d0bf5c36	[X86] Strengthen some type contraints for floating point round and extend. llvm-svn: 268892	2016-05-09 05:34:14 +00:00
Craig Topper	a58abd1cc6	[AVX512] Fix up types for arguments of int_x86_avx512_mask_cvtsd2ss_round and int_x86_avx512_mask_cvtss2sd_round. Only the argument being converted should be a different type. The other 2 argument should have the same type as the result. llvm-svn: 268891	2016-05-09 05:34:12 +00:00
Simon Pilgrim	e5e04baf95	[X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN lowering movmsk.ll tests are unchanged. llvm-svn: 268237	2016-05-02 14:58:22 +00:00
Simon Pilgrim	cd0dfc93eb	[X86][SSE] Support for MOVMSK signbit extraction instructions Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266	2016-04-03 18:22:03 +00:00
Simon Pilgrim	572ca71573	[X86][XOP] Support for VPPERM byte shuffle instruction This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260	2016-03-24 11:52:43 +00:00
Igor Breger	639fde79b0	AVX512: Combine AND + TESTM instructions . Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621	2016-03-03 14:18:38 +00:00
Ahmed Bougacha	f3cccab1e0	[X86] Remove the now-unused X86ISD::PSIGN. NFC. llvm-svn: 261025	2016-02-16 22:14:12 +00:00
Asaf Badouh	ad5c3fc47d	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Simon Pilgrim	18bcf93efb	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635	2016-02-03 09:41:59 +00:00
Sanjay Patel	c54600dbb1	fix typos; NFC llvm-svn: 259438	2016-02-01 23:53:35 +00:00
Asaf Badouh	5a3a0231f4	[X86][AVX512VBMI] add encoding and intrinsics for Multishift Differential Revision: http://reviews.llvm.org/D16399 llvm-svn: 259363	2016-02-01 15:48:21 +00:00
Asaf Badouh	655822ab7e	[X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680	2016-01-25 11:14:24 +00:00
Asaf Badouh	d4a0d9a78c	[X86][AVX512]fix dag & add intrinsics for fixupimm cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124	2016-01-19 14:21:39 +00:00
Michael Zuckerman	298a680c80	[AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257594	2016-01-13 12:39:33 +00:00
Michael Zuckerman	2ddcbcf464	[AVX512] adding PROLQ and PROLD Intrinsics Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523	2016-01-12 21:19:17 +00:00
Igor Breger	756c289dd8	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE. Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470	2015-12-27 13:56:16 +00:00
Matt Arsenault	fbd9bbfda3	Start replacing vector_extract/vector_insert with extractelt/insertelt These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359	2015-12-11 19:20:16 +00:00
Asaf Badouh	2489f350c0	[X86][AVX512] add comi with Sae add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 254493	2015-12-02 08:17:51 +00:00
Craig Topper	aad5f11e5f	[AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns. For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output. llvm-svn: 254277	2015-11-30 00:13:24 +00:00
Craig Topper	ecae476e4c	[X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices. llvm-svn: 254269	2015-11-29 22:53:22 +00:00
Craig Topper	05858f52fe	[X86] Merge X86VPermt2Fp and X86VPermt2Int back together by weakening them just enough. The SDTCisSameSizeAs introduced in r254138 helps here. llvm-svn: 254176	2015-11-26 20:02:01 +00:00
Craig Topper	0009656335	[X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel. llvm-svn: 254172	2015-11-26 19:41:34 +00:00
Craig Topper	ff2f14731a	[X86] Revert part of r254167 to recover bots. llvm-svn: 254169	2015-11-26 19:13:05 +00:00
Craig Topper	9d1deb4b72	[X86] Strengthen more type constraints to reduce isel table size. llvm-svn: 254167	2015-11-26 18:31:19 +00:00
Craig Topper	a3ac738725	[X86] Strengthen more type constraints to reduce isel table size. llvm-svn: 254142	2015-11-26 07:58:20 +00:00
Craig Topper	4c175cdc8e	[X86] Strengthen the type constraints on X86psadbw and X86dbpsadbw to reduce some of the type checks in the isel matching tables. llvm-svn: 254139	2015-11-26 07:02:21 +00:00
Elena Demikhovsky	f07df9fcac	AVX-512: Fixed a bug in VPERMT2* intrinsic. It was wrong order of operands (from intrinsic to DAG node). I added more strict type specification for instruction selection. Differential Revision: http://reviews.llvm.org/D14942 llvm-svn: 254059	2015-11-25 08:17:56 +00:00
Cong Hou	db6220f84d	[X86] Fix several issues related to X86's psadbw instruction. This patch fixes the following issues: 1. Fix the return type of X86psadbw: it should not be the same type of inputs. For vNi8 inputs the output should be vMi64, where M = N/8. 2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly. 3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly. 4. Adjust the return type when building a DAG node of X86ISD::PSADBW type. 5. Update related tests. Differential revision: http://reviews.llvm.org/D14897 llvm-svn: 254010	2015-11-24 19:51:26 +00:00
Asaf Badouh	0d957b8b09	[X86][AVX512CD] add mask broadcast intrinsics Differential Revision: http://reviews.llvm.org/D14573 llvm-svn: 253450	2015-11-18 09:42:45 +00:00
Asaf Badouh	f99c054ebc	revert rev. 252153 due to build failure on ubuntu [X86][AVX512] add comi with Sae llvm-svn: 252154	2015-11-05 08:55:54 +00:00
Asaf Badouh	7fdabf0a35	[X86][AVX512] add comi with Sae add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153	2015-11-05 08:45:06 +00:00
Igor Breger	fa798a9dbb	AVX512: Implemented encoding and intrinsics for VBROADCASTI32x2 and VBROADCASTF32x2 instructions. Differential Revision: http://reviews.llvm.org/D14216 llvm-svn: 251781	2015-11-02 07:39:36 +00:00
Asaf Badouh	c7cb880669	[X86][AVX512] [X86][AVX512] add convert float to half convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem). Differential Revision: http://reviews.llvm.org/D14113 llvm-svn: 251409	2015-10-27 15:37:17 +00:00
Asaf Badouh	7c52245660	[X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions Differential Revision: http://reviews.llvm.org/D13945 llvm-svn: 251018	2015-10-22 14:01:16 +00:00
Asaf Badouh	696e8e0bb7	[X86][AVX512DQ] add scalar fpclass Differential Revision: http://reviews.llvm.org/D13769 llvm-svn: 250650	2015-10-18 11:04:38 +00:00
Simon Pilgrim	86c5e85e84	[X86][XOP] Add VPROT instruction opcodes Added X86ISD opcodes for VPROT vector rotate by variable and by immediate. llvm-svn: 250620	2015-10-17 19:04:24 +00:00
Igor Breger	b4bb190eed	AVX512: Implemented encoding and intrinsics for vpternlogd/q. Differential Revision: http://reviews.llvm.org/D13768 llvm-svn: 250396	2015-10-15 12:33:24 +00:00
Sanjay Patel	85030aa1bd	function names should start with a lower case letter; NFC llvm-svn: 250174	2015-10-13 16:23:00 +00:00
Simon Pilgrim	52d47e5704	[X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions. The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646). llvm-svn: 249976	2015-10-11 14:15:17 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Asaf Badouh	eaf2da14bf	[X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP Differential Revision: http://reviews.llvm.org/D12524 llvm-svn: 248147	2015-09-21 10:23:53 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Ahmed Bougacha	37bffd83f0	[CodeGen] Make x86 nontemporal store patfrags generic. NFC. To be used by other targets. llvm-svn: 247225	2015-09-10 00:53:15 +00:00
Igor Breger	0dcd8bcf24	AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750	2015-09-03 09:05:31 +00:00
Ahmed Bougacha	b03ea02479	[X86] Require 32-byte alignment for 32-byte VMOVNTs. We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733	2015-09-02 23:25:39 +00:00
Ahmed Bougacha	bc72ad7b27	[X86] Cleanup nontemporal fragments. NFCI. We can chain other fragments to avoid repeating conditions. This also fixes a potential bug (that realistically can't happen), where we would match indexed nontemporal stores for i32/i64. llvm-svn: 246719	2015-09-02 22:27:38 +00:00
Igor Breger	1e58e8adf6	AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642	2015-09-02 11:18:55 +00:00
Igor Breger	5ea0a68115	AVX512: ktest implemantation Added tests for encoding. Differential Revision: http://reviews.llvm.org/D11979 llvm-svn: 246439	2015-08-31 13:30:19 +00:00
Igor Breger	f3ded811b2	AVX512: Implemented encoding and intrinsics for vdbpsadbw Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12491 llvm-svn: 246436	2015-08-31 13:09:30 +00:00
Igor Breger	8352a0ddf2	AVX512: Implemented encoding and intrinsics for VGETEXPSS/D instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11528 llvm-svn: 243390	2015-07-28 06:53:28 +00:00
Igor Breger	074a64e72c	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 243122	2015-07-24 17:24:15 +00:00
Chandler Carruth	fe414353db	Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..." This commit broke the build. Numerous build bots broken, and it was blocking my progress so reverting. It should be trivial to reproduce -- enable the BPF backend and it should fail when running llvm-tblgen. llvm-svn: 242992	2015-07-23 08:03:44 +00:00
Igor Breger	da1b2ea955	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 242990	2015-07-23 07:39:21 +00:00
Asaf Badouh	a5b2e5e2a7	[X86][AVX512] add reduce/range/scalef/rndScale include encoding and intrinsics Differential Revision: http://reviews.llvm.org/D11222 llvm-svn: 242896	2015-07-22 12:00:43 +00:00
Igor Breger	f7fd547e27	AVX512 : Implemented VPMADDUBSW and VPMADDWD instruction , Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11351 llvm-svn: 242761	2015-07-21 07:11:28 +00:00
Elena Demikhovsky	0f370936a0	AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long types. In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch. I temporary removed the old intrinsics test (just to split this patch). Half types are not covered here. Differential Revision: http://reviews.llvm.org/D11134 llvm-svn: 242023	2015-07-13 13:26:20 +00:00
Simon Pilgrim	d85cae3d52	[X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it. As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions. From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level. Differential Revision: http://reviews.llvm.org/D10146 llvm-svn: 241508	2015-07-06 20:46:41 +00:00
Simon Pilgrim	8b756596fc	[X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 implementation With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations. This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code. Differential Revision: http://reviews.llvm.org/D10947 llvm-svn: 241506	2015-07-06 20:30:47 +00:00
Asaf Badouh	c6f3c82ffc	[X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale pmulhrsw review: http://reviews.llvm.org/D10948 llvm-svn: 241443	2015-07-06 14:03:40 +00:00
Elena Demikhovsky	30bc4ca313	AVX-512: all forms of SCATTER instruction on SKX, encoding, intrinsics and tests. llvm-svn: 240936	2015-06-29 12:14:24 +00:00
Asaf Badouh	7ec4b7a8bb	[x86][AVX512] Add vscalef support include encoding and intrinsics review: http://reviews.llvm.org/D10730 llvm-svn: 240906	2015-06-28 14:30:39 +00:00
Elena Demikhovsky	6a1a357f1f	AVX-512: Added all SKX forms of GATHER instructions. Added intrinsics. Added encoding and tests. llvm-svn: 240905	2015-06-28 10:53:29 +00:00
Elena Demikhovsky	5e2f8c4231	AVX-512: Added all forms of VPABS instruction Added all intrinsics, tests for encoding, tests for intrinsics. llvm-svn: 240386	2015-06-23 08:19:46 +00:00
Elena Demikhovsky	ba5ab328e5	AVX-512: All forms of VCOPMRESS VEXPAND instructions, encoding tests. llvm-svn: 240272	2015-06-22 11:16:30 +00:00
Asaf Badouh	81f03c30a5	[AVX512] add instructions: VPAVGB and VPAVGW review http://reviews.llvm.org/D10504 llvm-svn: 240012	2015-06-18 12:30:53 +00:00
Simon Pilgrim	cae7b94cbd	[X86][SSE] Vectorize v2i32 to v2f64 conversions This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions. Differential Revision: http://reviews.llvm.org/D10433 llvm-svn: 239855	2015-06-16 21:40:28 +00:00
Igor Breger	abe4a79b75	AVX-512: Implemented cvtsi2ss/d cvtusi2ss/d instructions with round control for KNL. Added intrinsics for cvtsi2ss/d instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D10430 llvm-svn: 239694	2015-06-14 12:44:55 +00:00
Asaf Badouh	402ebb34af	re-apply 238809 AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. CR: http://reviews.llvm.org/D9991 llvm-svn: 238923	2015-06-03 13:41:48 +00:00
Elena Demikhovsky	9e38086534	AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions for SKX and KNL. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238917	2015-06-03 10:56:40 +00:00
Simon Pilgrim	452252e6c8	[X86] Removed (unused) FSRL x86 operation This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....). Since the refactoring of the shuffle lowering code this no longer has any use. Differential Revision: http://reviews.llvm.org/D10169 llvm-svn: 238906	2015-06-03 08:32:36 +00:00
Asaf Badouh	8d897dd05f	revert 238809 llvm-svn: 238810	2015-06-02 07:45:19 +00:00
Asaf Badouh	17de10f37e	AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. llvm-svn: 238809	2015-06-02 07:18:14 +00:00
Elena Demikhovsky	3582eb3b39	AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX. Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238738	2015-06-01 11:05:34 +00:00
Elena Demikhovsky	42c96d9c0a	AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKX Implemented DAG lowering for all these forms. Added tests for encoding. by Igor Breger (igor.breger@intel.com) llvm-svn: 238728	2015-06-01 06:50:49 +00:00
Chandler Carruth	6ba9730a4e	[x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636	2015-05-30 03:20:59 +00:00
Elena Demikhovsky	ad9c396838	AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other instructions from this set Added encoding tests. llvm-svn: 237557	2015-05-18 06:42:57 +00:00
Elena Demikhovsky	0d7e9364d1	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Elena Demikhovsky	29792e9a80	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
Elena Demikhovsky	60eb9db7bb	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	52266388f8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Elena Demikhovsky	e1eda8a9e6	Masked gather and scatter - added DAGCombine visitors and AVX-512 instruction selection patterns. All other patches, including tests will follow. http://reviews.llvm.org/D7665 llvm-svn: 236211	2015-04-30 08:38:48 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Elena Demikhovsky	431b81e41f	AVX-512: Added VPTESTM and VPTESTNM instructions for SKX llvm-svn: 235383	2015-04-21 13:13:46 +00:00
Benjamin Kramer	619c4e57ba	Reduce dyn_cast<> to isa<> or cast<> where possible. No functional change intended. llvm-svn: 234586	2015-04-10 11:24:51 +00:00
Elena Demikhovsky	d207f17fa1	AVX-512: Moved patterns for masked load/store under avx_store, avx_load classes. No functional changes. llvm-svn: 231069	2015-03-03 15:03:35 +00:00
Elena Demikhovsky	0995479e67	Reverted 230471 - gather scatter handling in table gen. llvm-svn: 230892	2015-03-01 08:23:41 +00:00

1 2 3 4 5 ...

310 Commits