llvm-project

Commit Graph

Author	SHA1	Message	Date
Elena Demikhovsky	0995479e67	Reverted 230471 - gather scatter handling in table gen. llvm-svn: 230892	2015-03-01 08:23:41 +00:00
Elena Demikhovsky	02ffd26023	AVX-512: Added mask and rounding mode for scalar arithmetics Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms. llvm-svn: 230891	2015-03-01 07:44:04 +00:00
Elena Demikhovsky	56eadcf5ce	AVX-512: Gather and Scatter patterns Gather and scatter instructions additionally write to one of the source operands - mask register. In this case Gather has 2 destination values - the loaded value and the mask. Till now we did not support code gen pattern for gather - the instruction was generated from intrinsic only and machine node was hardcoded. When we introduce the masked_gather node, we need to select instruction automatically, in the standard way. I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands. (Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big patch in many small patches) llvm-svn: 230471	2015-02-25 09:46:31 +00:00
Bruno Cardoso Lopes	9e1c4c17d9	[X86][MMX] Support folding loads in psll, psrl and psra intrinsics llvm-svn: 230225	2015-02-23 15:23:14 +00:00
Elena Demikhovsky	52e81bc499	AVX-512: recommitted 229837 + bugfix + test llvm-svn: 230223	2015-02-23 15:12:31 +00:00
Eric Christopher	0d94fa98e5	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Elena Demikhovsky	69e8b45b13	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Elena Demikhovsky	714f23bcdb	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Sanjay Patel	b811c1d6a5	prevent folding a scalar FP load into a packed logical FP instruction (PR22371) Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 llvm-svn: 229531	2015-02-17 20:08:21 +00:00
Craig Topper	141e65e69c	[X86] Remove 256-bit and 512-bit memop pattern fragments. They are no longer used. llvm-svn: 228563	2015-02-09 04:04:53 +00:00
Bruno Cardoso Lopes	ab9ae87623	[X86][MMX] Handle i32->mmx conversion using movd Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293	2015-02-05 13:23:07 +00:00
Bruno Cardoso Lopes	e446aefcfe	[X86][MMX] Move MMX DAG node to proper file llvm-svn: 228291	2015-02-05 13:22:50 +00:00
Sanjay Patel	ffd039bde1	Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371). r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit. The commit log says that change did not affect anything, but that's not correct. That change allowed SSE instructions to have unaligned mem operands folded into math ops, and that's not allowed in the default specification for any SSE variant. The bug is exposed when compiling for an AVX-capable CPU that had this feature flag but without enabling AVX codegen. Another mistake in r224330 was not adding the feature flag to all AVX CPUs; the AMD chips were excluded. This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ). This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem". Changed the existing test case for the feature bit to reflect the new name and renamed the test file itself to better reflect the feature. Added runs to fold-vex.ll to check for the failing codegen. Note that the feature bit is not set by default on any CPU because it may require a configuration register setting to enable the enhanced unaligned behavior. llvm-svn: 227983	2015-02-03 17:13:04 +00:00
Elena Demikhovsky	7b0dd39db6	AVX-512: Added FMA intrinsics with rounding mode By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303	2015-01-28 10:21:27 +00:00
Elena Demikhovsky	a79fc16bb0	X86: Added FeatureVectorUAMem for all AVX architectures. According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330	2014-12-16 09:10:08 +00:00
Elena Demikhovsky	72860c341e	AVX-512: Added EXPAND instructions and intrinsics. llvm-svn: 224241	2014-12-15 10:03:52 +00:00
Elena Demikhovsky	908dbf48c8	AVX-512: Added all forms of COMPRESS instruction + intrinsics + tests llvm-svn: 224019	2014-12-11 15:02:24 +00:00
Elena Demikhovsky	905a5a606f	AVX-512: Scalar ERI intrinsics including SAE mode and memory operand. Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future. The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect. I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction. http://reviews.llvm.org/D6378 llvm-svn: 222820	2014-11-26 10:46:49 +00:00
Elena Demikhovsky	be8808dc3f	AVX-512: Intrinsics for ERI 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774	2014-11-12 07:31:03 +00:00
Chandler Carruth	6d5916a2d7	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300	2014-09-23 10:08:29 +00:00
Chandler Carruth	ed5dfff865	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282	2014-09-22 22:29:42 +00:00
Chandler Carruth	204ad4c613	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). llvm-svn: 217819	2014-09-15 20:09:47 +00:00
Chandler Carruth	373b2b1728	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310	2014-09-06 10:00:01 +00:00
Adam Nemet	50b83f0bb8	[AVX512] Add enum for the static rounding types No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662	2014-08-14 17:13:26 +00:00
Adam Nemet	2f10cc699d	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887	2014-08-05 17:22:55 +00:00
Robert Khasanov	7ca7df0bf9	[SKX] Enabling load/store instructions: encoding Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719	2014-08-04 14:35:15 +00:00
Chandler Carruth	8366cebeb5	[x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDW instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332	2014-06-20 01:05:28 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Filipe Cabecinhas	20352216fb	Rename X86insrtps to the proper instruction name. Summary: The INSERTPS pattern fragment was called insrtps (mising 'e'), which would make it harder to grep for the patterns related to this instruction. Renaming it to use the proper instruction name. Reviewers: nadav CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3443 llvm-svn: 206779	2014-04-21 20:07:29 +00:00
Elena Demikhovsky	9f423d6f25	AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors. llvm-svn: 201066	2014-02-10 07:02:39 +00:00
Tim Northover	546b57b011	X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918	2014-02-06 09:54:51 +00:00
Elena Demikhovsky	a30e437659	AVX-512: Added intrinsic for cvtph2ps. Added VPTESTNM instruction. Added a pattern to vselect (lit tests will follow). llvm-svn: 200823	2014-02-05 07:05:03 +00:00
Craig Topper	aefaab640c	Improve some x86 type constraints. llvm-svn: 200120	2014-01-26 04:59:39 +00:00
Elena Demikhovsky	a5d38a39a0	AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions, they give better sequences than VPERMI llvm-svn: 199893	2014-01-23 14:27:26 +00:00
Elena Demikhovsky	3629b4aa0e	AVX-512: added intrinsic vcvtpd2ps (with rounding mode and without) llvm-svn: 198593	2014-01-06 08:45:54 +00:00
Elena Demikhovsky	de3f751baf	AVX-512: Added intrinsics for vcvt, vcvtt, vrndscale, vcmp Printing rounding control. Enncoding for EVEX_RC (rounding control). llvm-svn: 198277	2014-01-01 15:12:34 +00:00
Elena Demikhovsky	c5f6726a24	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Elena Demikhovsky	1f3ed4169c	AVX-512: aligned / unaligned load and store for 512-bit integer vectors. llvm-svn: 193156	2013-10-22 09:19:28 +00:00
Elena Demikhovsky	8952974e29	AVX-512: implemented extractelement with variable index. Added parsing of mask register and "zeroing" semantic, like {%k1} {z}. llvm-svn: 190595	2013-09-12 08:55:00 +00:00
Elena Demikhovsky	980c6b08b1	AVX-512: added extend and truncate instructions. llvm-svn: 189580	2013-08-29 11:56:53 +00:00
Craig Topper	6269f49505	Make sure x86 instructions using ssmem/sdmem operand types are only able to parse memory operands of the proper size in Intel syntax. Primarily affects some of sse cvt instructions. llvm-svn: 189206	2013-08-26 00:39:04 +00:00
Elena Demikhovsky	33d447a2d6	AVX-512: Added SHIFT instructions. llvm-svn: 188899	2013-08-21 09:36:02 +00:00
Elena Demikhovsky	1490c5eb5b	AVX-512: added arithmetic and logical operations. ADD, SUB, MUL integer and FP types. OR, AND, XOR. Added embeded broadcast form for these instructions. llvm-svn: 188673	2013-08-19 13:26:14 +00:00
Elena Demikhovsky	3ce8dbbac2	AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions. llvm-svn: 188637	2013-08-18 13:08:57 +00:00
Craig Topper	8c929627d9	Don't use v16i32 for load pattern matching. All 512-bit loads are cated to v8i64. llvm-svn: 188534	2013-08-16 06:07:34 +00:00
Elena Demikhovsky	60b1f289f2	AVX-512: Added CMP and BLEND instructions. Lowering for SETCC. llvm-svn: 188265	2013-08-13 13:24:07 +00:00
Elena Demikhovsky	cf5b1458e6	AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions. Added a test for shuffles using VPERM. llvm-svn: 188147	2013-08-11 07:55:09 +00:00
Elena Demikhovsky	45c54ad8dc	AVX-512 set: Added BROADCAST instructions with lowering logic and a test. llvm-svn: 187884	2013-08-07 12:34:55 +00:00
Elena Demikhovsky	40864b690b	AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types. Added intrinsics and tests. llvm-svn: 187717	2013-08-05 08:52:21 +00:00

1 2 3 4

161 Commits