llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	0720c8d90e	[X86][AVX512] VPLZCNT instructions match SchedWriteVecIMul scheduling class not SchedWriteVecALU. llvm-svn: 331473	2018-05-03 18:22:49 +00:00
Simon Pilgrim	f2d2cedab4	[X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472	2018-05-03 17:56:43 +00:00
Simon Pilgrim	39196a1dd3	[X86][AVX512] VPAVG instructions should be tagged as SchedWriteVecALU llvm-svn: 331446	2018-05-03 10:53:17 +00:00
Simon Pilgrim	93c878c76b	[X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...) llvm-svn: 331445	2018-05-03 10:31:20 +00:00
Simon Pilgrim	a1f1a3bf94	[X86] Convert most remaining AVX512 uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths. We've dealt with the majority already. llvm-svn: 331353	2018-05-02 13:32:56 +00:00
Simon Pilgrim	c708868cb1	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes llvm-svn: 331290	2018-05-01 18:06:07 +00:00
Simon Pilgrim	c546f9424f	[X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes Removes more WriteFCmp InstRW overrides llvm-svn: 331283	2018-05-01 16:50:16 +00:00
Simon Pilgrim	1b7a80d80a	[X86] Convert all uses of WriteFAdd to X86SchedWriteWidths. In preparation of splitting WriteFAdd by vector width. llvm-svn: 331273	2018-05-01 15:57:17 +00:00
Simon Pilgrim	f6b81dae9e	[X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths. In preparation of splitting WriteFShuffle by vector width. llvm-svn: 331262	2018-05-01 14:14:42 +00:00
Simon Pilgrim	6f710a6440	[X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths. In preparation of splitting WriteVecLogic by vector width. llvm-svn: 331256	2018-05-01 12:15:29 +00:00
Simon Pilgrim	fc0c26f1a6	[X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts. Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view. llvm-svn: 331253	2018-05-01 11:05:42 +00:00
Simon Pilgrim	3c35408e48	[X86] Introduce X86SchedWriteWidths schedule wrapper for different vector widths. We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required. I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future. This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality. Differential Revision: https://reviews.llvm.org/D46266 llvm-svn: 331208	2018-04-30 18:18:38 +00:00
Craig Topper	06624e1a93	[X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic This patch restricts a lot of these to only one variant so we don't get the duplication. This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen. llvm-svn: 331117	2018-04-28 18:46:11 +00:00
Simon Pilgrim	8a937e00d8	[X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed. llvm-svn: 331065	2018-04-27 18:19:48 +00:00
Simon Pilgrim	b2aa89c909	[X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides. llvm-svn: 331051	2018-04-27 15:50:33 +00:00
Simon Pilgrim	dbd1ae7ddd	[X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes This removes all the FMA InstRW overrides. If we ever get PR36924, then we can remove many of these declarations from models. llvm-svn: 330820	2018-04-25 13:07:58 +00:00
Simon Pilgrim	cf0199a289	[AVX512] VPERMQ/VPERMPD/VPERMIL single op shuffles are not variable shuffles These variants all take an immediate shuffle mask value and should be scheduled as such. llvm-svn: 330747	2018-04-24 17:59:54 +00:00
Simon Pilgrim	f0945aa0e0	[X86][F16C] Add WriteCvtF2FSt scheduling class Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737	2018-04-24 16:43:07 +00:00
Simon Pilgrim	f7d2a93d5f	[X86] Add vector element insertion/extraction scheduler classes Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714	2018-04-24 13:21:41 +00:00
Simon Pilgrim	d14d2e7b18	[X86] Add WriteFSign/WriteFLogic scheduler classes Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480	2018-04-20 21:16:05 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Simon Pilgrim	86e3c26924	[X86] Add FP comparison scheduler classes Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd Differential Revision: https://reviews.llvm.org/D45656 llvm-svn: 330179	2018-04-17 07:22:44 +00:00
Simon Pilgrim	fe3d59e98b	[X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd llvm-svn: 330023	2018-04-13 14:41:05 +00:00
Simon Pilgrim	21e89795cc	[X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093) llvm-svn: 330022	2018-04-13 14:36:59 +00:00
Simon Pilgrim	ae0c2711b6	[X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093) llvm-svn: 330013	2018-04-13 12:50:31 +00:00
Simon Pilgrim	1f070c334c	[X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers. Was being used to move around empty/unused itineraries... llvm-svn: 329970	2018-04-12 22:57:34 +00:00
Simon Pilgrim	6551d405dc	[X86] Remove x86 InstrItinClass entries (PR37093) This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093. llvm-svn: 329967	2018-04-12 22:44:47 +00:00
Simon Pilgrim	0e45634f4e	[X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093) llvm-svn: 329953	2018-04-12 20:47:34 +00:00
Simon Pilgrim	e9376b9fdc	[X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093) llvm-svn: 329945	2018-04-12 19:59:35 +00:00
Simon Pilgrim	577ae24feb	[X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093) llvm-svn: 329940	2018-04-12 19:25:07 +00:00
Simon Pilgrim	8904a86f65	[X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093) llvm-svn: 329912	2018-04-12 14:31:42 +00:00
Simon Pilgrim	294556d40e	[X86] Remove remaining system/special schedule itineraries (PR37093) llvm-svn: 329906	2018-04-12 12:43:49 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	a30db995b3	[X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ. These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154	2018-04-04 07:00:24 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	5fb1dc2d22	[X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions. llvm-svn: 328958	2018-04-02 02:44:55 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Craig Topper	a406796f5f	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991	2018-03-08 08:02:52 +00:00
Craig Topper	f2aae62228	[X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores. llvm-svn: 326679	2018-03-04 19:33:15 +00:00
Craig Topper	be31585be8	[X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669	2018-03-04 01:48:00 +00:00
Craig Topper	e31b9d1e5f	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375	2018-02-28 22:23:55 +00:00
Craig Topper	ac799b05d4	[X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306	2018-02-28 06:19:55 +00:00
Craig Topper	6694df14e6	[X86] Use SDNode instead of SDPatternOperator. NFC llvm-svn: 326048	2018-02-25 06:21:04 +00:00
Craig Topper	7bcac492d4	[X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999	2018-02-24 00:15:05 +00:00
Craig Topper	16b20245ba	[X86] Add assembler/disassembler support for blendm with zero masking and broacast. Fixes PR31617 llvm-svn: 325957	2018-02-23 20:48:44 +00:00
Craig Topper	61d6ddbf0a	[X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector. These can be created by type legalization promoting the inputs to select to match scalar boolean contents. We were trying to pattern match them away during isel, but its better to just remove them from the DAG. I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal. llvm-svn: 325949	2018-02-23 20:13:42 +00:00

1 2 3 4 5 ...

918 Commits