llvm-project

Commit Graph

Author	SHA1	Message	Date
Robert Khasanov	1cf354c92f	[AVX512] Fix VSQRT packed instructions internal names. No functional change llvm-svn: 220808	2014-10-28 18:22:41 +00:00
Robert Khasanov	eb12639375	[AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset. Refactored through AVX512_maskable llvm-svn: 220806	2014-10-28 18:15:20 +00:00
Robert Khasanov	3e534c93b9	[AVX-512] Expanded rsqrt/rcp instructions to VL subset. Refactored multiclass through AVX512_maskable llvm-svn: 220783	2014-10-28 16:37:13 +00:00
Robert Khasanov	dd09a8f320	[AVX512] Bring back vector-shuffle lowering support through broadcasts Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774	2014-10-28 12:28:51 +00:00
Adam Nemet	cf7a4a2660	[AVX512] Add vpermil variable version This is implemented via a multiclass that derives from the vperm imm multiclass. Fixes <rdar://problem/18426089> llvm-svn: 220737	2014-10-27 23:08:40 +00:00
Adam Nemet	8d85b0cd3e	[AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo No functionality change. No change in X86.td.expanded except that we only set the CD8 attributes for the memory variants. (This shouldn't be used unless we have a memory operand.) llvm-svn: 220736	2014-10-27 23:08:37 +00:00
Adam Nemet	9aad13164e	[AVX512] Derive vpermil* from avx512_perm_imm This used to derive from avx512_pshuf_imm which is confusing. NFC. Compared X86.td.expanded. llvm-svn: 220735	2014-10-27 23:08:34 +00:00
Adam Nemet	c51cee85b7	[AVX512] Fix copy-and-paste bugs in vpermil 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734	2014-10-27 23:08:31 +00:00
Elena Demikhovsky	4b01b7306c	AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction llvm-svn: 220638	2014-10-26 09:52:24 +00:00
Adam Nemet	832ec5e911	[AVX512] FMA support for the 231 variants This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540	2014-10-24 00:03:00 +00:00
Adam Nemet	26371ce131	[AVX512] Introduce fma3p_forms from AVX This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539	2014-10-24 00:02:55 +00:00
Adam Nemet	4285c1f8cc	[AVX512] Add DQ subvector inserts In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874	2014-10-15 23:42:17 +00:00
Adam Nemet	449b3f0931	[AVX512] Two new attributes in X86VectorVTInfo for subvector insert The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871	2014-10-15 23:42:09 +00:00
Adam Nemet	b1c3ef4b60	[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870	2014-10-15 23:42:04 +00:00
Robert Khasanov	1a77f6664e	[AVX512] Extended avx512_binop_rm to DQ/VL subsets. Added encoding tests. llvm-svn: 219686	2014-10-14 15:13:56 +00:00
Robert Khasanov	545d1b7726	[AVX512] Extended avx512_binop_rm to BW/VL subsets. Added encoding tests. llvm-svn: 219685	2014-10-14 14:36:19 +00:00
Robert Khasanov	d5b14f7994	[AVX512] Extended avx512_binop_rm for AVX512VL subsets. Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests llvm-svn: 219390	2014-10-09 08:38:48 +00:00
Adam Nemet	3480142ef0	[AVX512] Rename AVX512_masking* to AVX512_maskable* No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src llvm-svn: 219363	2014-10-08 23:25:39 +00:00
Adam Nemet	47b2d5f1e0	[AVX512] Intrinsics for vextract*x4 This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. llvm-svn: 219362	2014-10-08 23:25:37 +00:00
Adam Nemet	2b5cdbb3de	[AVX512] Add asm-only support for vextractx4 masking variants These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4 we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. llvm-svn: 219361	2014-10-08 23:25:33 +00:00
Adam Nemet	0937723b49	[AVX512] Move DAG for all-zero node to X86VectorVTInfo No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. llvm-svn: 219360	2014-10-08 23:25:31 +00:00
Adam Nemet	52bb6cfad6	[AVX512] Peel off an asm-only class from AVX512_masking_common. No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. llvm-svn: 219358	2014-10-08 23:25:23 +00:00
Robert Khasanov	44241440e1	[AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking. Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310	2014-10-08 14:37:45 +00:00
Elena Demikhovsky	44bf0637d5	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083	2014-10-05 14:11:08 +00:00
Adam Nemet	4dca3ce4b0	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	4e2ef472d2	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	dc87aea176	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00
Adam Nemet	05d8c8e682	[AVX512] Remove space before \t in AsmStrings. llvm-svn: 218725	2014-10-01 00:41:32 +00:00
Robert Khasanov	5aa4445bde	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669	2014-09-30 11:41:54 +00:00
Adam Nemet	6bddb8c3a5	[AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637	2014-09-29 22:54:41 +00:00
Adam Nemet	ce465421d7	[AVX512] Simplify use of !con() No change in X86.td.expanded. llvm-svn: 218485	2014-09-26 00:53:12 +00:00
Adam Nemet	f7988d7364	[AVX512] Pull pattern for subvector extract into the instruction definition No functional change. I initially thought that pulling the Pat<> into the instruction pattern was not possible because it was doing a transform on the index in order to convert it from a per-element (extract_subvector) index into a per-chunk (vextract*x4) index. Turns out this also works inside the pattern because the vextract_extract PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the index in $idx goes through the same conversion. The existing test CodeGen/X86/avx512-insert-extract.ll extended in the previous commit provides coverage for this change. llvm-svn: 218480	2014-09-25 23:48:49 +00:00
Adam Nemet	55536c6a8f	[AVX512] Refactor subvector extracts No functional change. These are now implemented as two levels of multiclasses heavily relying on the new X86VectorVTInfo class. The multiclass at the first level that is called with float or int provides the 128 or 256 bit subvector extracts. The second level provides the register and memory variants and some more Pat<>s. I've compared the td.expanded files before and after. One change is that ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a bugfix. (BTW, this is the change that was blocked on the recent tablegen fix. The class-instance values X86VectorVTInfo inside vextract_for_type weren't properly evaluated.) Part of <rdar://problem/17688758> llvm-svn: 218478	2014-09-25 23:48:45 +00:00
Adam Nemet	6ea09eb148	[AVX512] Fix typo F->I in VEXTRACTF32x4rr. llvm-svn: 218477	2014-09-25 23:48:42 +00:00
Chandler Carruth	ed5dfff865	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282	2014-09-22 22:29:42 +00:00
Robert Khasanov	f70f798474	[SKX] Deriving rmb multiclasses from general one (avx512_icmp_packed_rmb and avx512_icmp_cc_rmb). Thanks Adam Nemet for notice about this. llvm-svn: 218051	2014-09-18 14:06:55 +00:00
Chandler Carruth	373b2b1728	[x86] Fix a pretty horrible bug and inconsistency in the x86 asm parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310	2014-09-06 10:00:01 +00:00
Robert Khasanov	29e3b96734	[SKX] Added new versions of cmp instructions in avx512_icmp_cc multiclass, added VL multiclass. Added encoding tests llvm-svn: 216532	2014-08-27 09:34:37 +00:00
Elena Demikhovsky	ff620edd3c	AVX-512: Added intrinsic for VMOVSS store form with mask. llvm-svn: 216530	2014-08-27 07:38:43 +00:00
Robert Khasanov	2ea081d4d1	[SKX] avx512_icmp_packed multiclass extension Extended avx512_icmp_packed multiclass by masking versions. Added avx512_icmp_packed_rmb multiclass for embedded broadcast versions. Added corresponding _vl multiclasses. Added encoding tests for CPCMP{EQ\|GT}* instructions. Add more fields for X86VectorVTInfo. Added AVX512VLVectorVTInfo that include X86VectorVTInfo for 512/256/128-bit versions Differential Revision: http://reviews.llvm.org/D5024 llvm-svn: 216383	2014-08-25 14:49:34 +00:00
Adam Nemet	5ed17dad95	[AVX512] Add class to group common template arguments related to vector type We discussed the issue of generality vs. readability of the AVX512 classes recently. I proposed this approach to try to hide and centralize the mappings we commonly perform based on the vector type. A new class X86VectorVTInfo captures these. The idea is to pass an instance of this class to classes/multiclasses instead of the corresponding ValueType. Then the class/multiclass can use its field for things that derive from the type rather than passing all those as separate arguments. I modified avx512_valign to demonstrate this new approach. As you can see instead of 7 related template parameters we now have one. The downside is that we have to refer to fields for the derived values. I named the argument '_' in order to make this as invisible as possible. Please let me know if you absolutely hate this. (Also once we allow local initializations in multiclasses we can recover the original version by assigning the fields to local variables.) Another possible use-case for this class is to directly map things, e.g.: RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC llvm-svn: 216209	2014-08-21 19:50:07 +00:00
Elena Demikhovsky	34d2d76d25	AVX-512: Fixed a bug in emitting compare for MVT:i1 type. Added a test. llvm-svn: 215889	2014-08-18 11:59:06 +00:00
Adam Nemet	2e91ee58fe	[AVX512] Add masking variant for the FMA instructions This change further evolves the base class AVX512_masking in order to make it suitable for the masking variants of the FMA instructions. Besides AVX512_masking there is now a new base class that instructions including FMAs can use: AVX512_masking_3src. With three-source (destructive) instructions one of the sources is already tied to the destination. This difference from AVX512_masking is captured by this new class. The common bits between _masking and _masking_3src are broken out into a new super class called AVX512_masking_common. As with valign, there is some corresponding restructuring of the underlying format classes. The idea is the same we want to derive from two classes essentially: one providing the format bits and another format-independent multiclass supplying the various masking and non-masking instruction variants. Existing fma tests in avx512-fma.ll provide coverage here for the non-masking variants. For masking, the next patches in the series will add intrinsics and intrinsic tests. For AVX512_masking_3src to work, the (ins ...) dag has to be passed without* the leading source operand that is tied to dst ($src1). This is necessary to properly construct the (ins ...) for the different variants. For the record, I did check that if $src is mistakenly included, you do get a fairly intuitive error message from the tablegen backend. Part of <rdar://problem/17688758> llvm-svn: 215660	2014-08-14 17:13:19 +00:00
Robert Khasanov	ed8829703f	[SKX] Extended non-temporal load/store instructions for AVX512VL subsets. Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction. Added encoding and lowering tests. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 215536	2014-08-13 10:46:00 +00:00
Adam Nemet	cee9d0a460	[AVX512] Handle valign masking intrinsic via C++ lowering I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. llvm-svn: 215473	2014-08-12 21:13:12 +00:00
Elena Demikhovsky	40a77144a4	AVX-512: added a missing bitcast from v16f32 to v16i32 llvm-svn: 215351	2014-08-11 09:59:08 +00:00
Adam Nemet	7d498629f1	[AVX512] Add zero-masking variant to AVX512_masking multiclass This completes one item from the todo-list of r215125 "Generate masking instruction variants with tablegen". The AddedComplexity is needed just like for the k variant. Added a codegen test based on valignq. llvm-svn: 215173	2014-08-07 23:53:38 +00:00
Adam Nemet	fa1f7201fc	[AVX512] Add codegen test for the masking variant of valign The AddedComplexity is needed just like in avx512_perm_3src. There may be a bug in the complexity computation... llvm-svn: 215168	2014-08-07 23:18:18 +00:00
Adam Nemet	2e2537f665	[AVX512] Generate masking instruction variants with tablegen After adding the masking variants to several instructions, I have decided to experiment with generating these from the non-masking/unconditional variant. This will hopefully reduce the amount repetition that we currently have in order to define an instruction with all its variants (for a reg/mem instruction this would be 6 instruction defs and 2 Pat<> for the intrinsic). The patch is the first cut that is currently only applied to valignd/q to make the patch small. A few notes on the approach: * In order to stitch together the dag for both the conditional and the unconditional patterns I pass the RHS of the set rather than the full pattern (set dest, RHS). * Rather than subclassing each instruction base class (e.g. AVX512AIi8), with a masking variant which wouldn't scale, I derived the masking instructions from a new base class AVX512 (this is just I<> with Requires<HasAVX512>). The instructions derive from this now, plus a new set of classes that add the format bits and everything else that instruction base class provided (i.e. AVX512AIi8 vs. AVX512AIi8Base). I hope we can go incrementally from here. I expect that: * We will need different variants of the masking class. One example is instructions requiring three vector sources. In this case we tie one of the source operands to dest rather than a new implicit source operand ($src0) * Add the zero-masking variant * Add more AVX512*Base classes as new uses are added I've looked at X86.td.expanded before and after to make sure that nothing got lost for valignd/q. llvm-svn: 215125	2014-08-07 17:53:55 +00:00
Adam Nemet	5ec912881f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00

1 2 3 4

167 Commits