llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Cameron McInally	872ed41a1e	[AVX512] Update typo in comment Should be "Sae" for "Suppress All Exceptions". NFC llvm-svn: 348763	2018-12-10 15:21:35 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Craig Topper	def82a81af	[X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170	2018-11-05 22:08:17 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Craig Topper	da54bbf52a	[X86] Correct a bad isel predicate. Though I don't think it can be exposed. This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed. llvm-svn: 345112	2018-10-24 06:13:36 +00:00
Craig Topper	c8e183f9ee	Recommit r344877 "[X86] Stop promoting integer loads to vXi64" I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965	2018-10-22 22:14:05 +00:00
Craig Topper	8d8dcfe690	Revert r344877 "[X86] Stop promoting integer loads to vXi64" Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921	2018-10-22 16:59:24 +00:00
Craig Topper	290c081d91	[X86] Add patterns for vector and/or/xor/andn with other types than vXi64. This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884	2018-10-22 06:30:22 +00:00
Craig Topper	321df5b0d4	[X86] Stop promoting integer loads to vXi64 Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877	2018-10-21 21:30:26 +00:00
Craig Topper	e70c560b6d	[X86] Remove some isel patterns that shouldn't be possible. These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573	2018-10-15 23:34:58 +00:00
Craig Topper	2909a3d9d0	[X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions. llvm-svn: 344563	2018-10-15 21:51:32 +00:00
Simon Pilgrim	f09fc3bc12	[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957) Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868	2018-10-05 17:57:29 +00:00
Craig Topper	c296436a30	[X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI. Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830	2018-09-23 06:49:48 +00:00
Craig Topper	a11a3b3818	[SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689	2018-08-25 17:48:17 +00:00
Craig Topper	633fe98e27	[X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns. AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics. The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics. Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns. llvm-svn: 339749	2018-08-15 01:23:00 +00:00
Craig Topper	28ac623f6f	[X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types. Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590	2018-07-20 17:57:53 +00:00
Craig Topper	92ea7a7b48	[X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357	2018-07-18 07:31:32 +00:00
Craig Topper	95063a45b8	[X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types. The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349	2018-07-18 05:10:53 +00:00
Craig Topper	9ef92865ec	[X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only. llvm-svn: 337320	2018-07-17 20:16:18 +00:00
Craig Topper	9187bca71b	[X86] Remove some standalone patterns in favor of the patterns in the MOVLPD instruction definitions. Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns. llvm-svn: 337299	2018-07-17 16:24:33 +00:00
Craig Topper	c376a1916b	[X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234	2018-07-17 05:48:48 +00:00
Craig Topper	07a1787501	[X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147	2018-07-16 06:56:09 +00:00
Chandler Carruth	cdf0addc65	[x86/SLH] Teach speculative load hardening to correctly harden the indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144	2018-07-16 04:17:51 +00:00
Craig Topper	ec0038398a	[X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns to match AVX. llvm-svn: 337135	2018-07-15 18:51:08 +00:00
Craig Topper	f0b164415c	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083	2018-07-14 02:05:08 +00:00
Craig Topper	2ab325ba23	[X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions. This is not an optimization we should be doing in isel. This is more suitable for a DAG combine. My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this. llvm-svn: 336971	2018-07-13 04:50:39 +00:00
Craig Topper	3a13477214	[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions. These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336954	2018-07-12 22:14:10 +00:00
Craig Topper	b0053b79d6	Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions." and "foo" One of them had a bad title and they should have been squashed. llvm-svn: 336953	2018-07-12 21:58:03 +00:00
Craig Topper	3b837b6b63	[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions. These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336951	2018-07-12 21:53:23 +00:00
Craig Topper	b01a355354	foo llvm-svn: 336950	2018-07-12 21:53:07 +00:00
Craig Topper	73347ec081	[X86] Remove patterns and ISD nodes for the old scalar FMA intrinsic lowering. We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector. llvm-svn: 336883	2018-07-12 03:42:41 +00:00
Craig Topper	be996bd2d9	[X86] Add patterns to use VMOVSS/SD zero masking for scalar f32/f64 select with zero. These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now. llvm-svn: 336875	2018-07-12 00:54:40 +00:00
Craig Topper	034adf2683	[X86] Remove and autoupgrade the scalar fma intrinsics with masking. This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. llvm-svn: 336871	2018-07-12 00:29:56 +00:00
Andrea Di Biagio	483db141e3	[X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions. Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was inferred directly from the "default" pattern associated with the instruction definition. r336728 removed special node X86Movlps, and all the patterns associated to it. Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the 'mayLoad/hasSideEffects' flags are left unset. When the instruction info is emitted by tablegen, method CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a pattern, and flags are undefined. So, it conservatively sets the "hasSideEffects" flag for it. As a consequence, we were losing the 'mayLoad' flag, and we were gaining a 'hasSideEffect' flag in its place. This patch fixes the issue (originally reported by Michael Holmen). The mca tests show the differences in the instruction info flags. Instructions that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm. Differential Revision: https://reviews.llvm.org/D49182 llvm-svn: 336818	2018-07-11 15:27:50 +00:00
Craig Topper	1d6a80cd95	[X86] Remove some composite MOVSS/MOVSD isel patterns. These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load. In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load. But we have patterns that do each of those 3 things individually so there's no reason to build large patterns. Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before. llvm-svn: 336762	2018-07-11 04:51:40 +00:00
Craig Topper	27c77fe4ce	[X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root. Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all. This patch removes them all. If we start finding real world issues we may need to add them back with proper tests. llvm-svn: 336735	2018-07-10 22:23:54 +00:00
Craig Topper	dea0b88b04	[X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load. There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load. We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason. So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns. llvm-svn: 336728	2018-07-10 21:00:22 +00:00
Craig Topper	866a377e91	[X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer. DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load. llvm-svn: 336626	2018-07-10 00:49:49 +00:00
Craig Topper	e4f46e4f31	[X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.td The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it. llvm-svn: 336624	2018-07-10 00:49:45 +00:00
Craig Topper	e9cff7d47b	[X86] Remove some patterns that include a bitcast of a floating point load to an integer type. DAG combine should have converted the type of the load. llvm-svn: 336557	2018-07-09 16:03:02 +00:00
Craig Topper	16ee4b4957	[X86] Remove some patterns that seems to be unreachable. These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern. Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512. llvm-svn: 336556	2018-07-09 16:03:01 +00:00
Craig Topper	fdf3f1ff82	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma. I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction. Fast-isel tests have been updated to match new clang codegen. We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down. A future patch will also remove the old intrinsics. llvm-svn: 336506	2018-07-08 01:10:43 +00:00
Craig Topper	77edbffabd	[X86] Add more FMA3 memory folding patterns. Remove patterns that are no longer needed. We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions. llvm-svn: 336457	2018-07-06 18:47:55 +00:00
Craig Topper	2db909cfae	[X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked for the v1i1 mask to have come from a scalar_to_vector from GR8. We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them. llvm-svn: 336305	2018-07-05 03:01:29 +00:00
Craig Topper	ecf7c5b75f	[X86] Reduce the number of patterns needed for masked scalar ceil/floor isel. The scalar to vector on the mask register should not be part of the patterns. llvm-svn: 335435	2018-06-25 00:05:09 +00:00
Craig Topper	19772c89c7	[X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions. llvm-svn: 335428	2018-06-24 06:29:50 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Mikhail Dvoretckii	b1ce7765be	[X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns This patch handles back-end folding of generic patterns created by lowering the X86 rounding intrinsics to native IR in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation. Differential Revision: https://reviews.llvm.org/D45203 llvm-svn: 335037	2018-06-19 10:37:52 +00:00
Craig Topper	c2965214ef	[X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter. This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information. Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions. Finally, remove the manual table from the emitter. This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap. llvm-svn: 335018	2018-06-19 04:24:44 +00:00
Craig Topper	0a5e90cc2a	[X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0. EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity. The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1. This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0. This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler. llvm-svn: 335017	2018-06-19 04:24:42 +00:00
Craig Topper	a7b7f2f4d8	[X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass. The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd? llvm-svn: 334994	2018-06-18 23:20:57 +00:00
Craig Topper	17bd84c12c	[X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971	2018-06-18 18:47:07 +00:00
Craig Topper	16fdde5e63	[X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922	2018-06-18 05:00:50 +00:00
Craig Topper	916d0cf649	[X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920	2018-06-18 01:28:05 +00:00
Craig Topper	29f22d7baa	[X86] More additions to the load folding tables based on the autogenerated tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898	2018-06-16 23:25:50 +00:00
Craig Topper	74412c7d59	[X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896	2018-06-16 23:25:47 +00:00
Tomasz Krupa	bcaab53d47	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849	2018-06-15 18:05:24 +00:00
Craig Topper	f43807dd89	[X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency. llvm-svn: 334785	2018-06-15 04:42:54 +00:00
Craig Topper	82fa048371	[X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions. llvm-svn: 334727	2018-06-14 15:40:30 +00:00
Craig Topper	9f829f76e8	[X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions. Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725	2018-06-14 15:40:27 +00:00
Craig Topper	b2552e1e08	[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685	2018-06-14 03:16:58 +00:00
Craig Topper	55488731be	[X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator. Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore. llvm-svn: 334563	2018-06-13 00:04:08 +00:00
Craig Topper	4f9cac667b	[X86] Remove VPCOMPRESSB/W from the autogenerated load folding table. llvm-svn: 334562	2018-06-13 00:04:04 +00:00
Craig Topper	3a34c3596d	[X86] Remove mayLoad flag from AVX512 truncating store instructions. llvm-svn: 334529	2018-06-12 19:59:08 +00:00
Craig Topper	88c230265b	[X86] Add NotMemoryFoldable to the VPCOMPRESS instructions. llvm-svn: 334481	2018-06-12 07:32:19 +00:00
Craig Topper	957b738432	[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460	2018-06-12 00:48:57 +00:00
Simon Pilgrim	14ee66ef37	[X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classes Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged..... UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512). llvm-svn: 334423	2018-06-11 17:28:00 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Craig Topper	d04cc8e640	[X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem. The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256. As vy512mem uses a VR256X index it should have an x. And vz256mem uses a VR512 index so it shouldn't have an x. I admit these names kind of suck and are confusing. llvm-svn: 334120	2018-06-06 19:15:12 +00:00
Nicolai Haehnle	01d261f18d	TableGen: Streamline the semantics of NAME Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900	2018-06-04 14:26:05 +00:00
Craig Topper	93d8fbd8f2	[X86] Add tied source operand to AVX5124FMAPS and AVX5124VNNIW instructions. This doesn't affect the assembly or disassembly, but is more accurate. llvm-svn: 333822	2018-06-02 16:30:39 +00:00
Craig Topper	1534929623	[X86] Add encoding information for the AVX5124FMAPS and AVX5124VNNIW instructions so they can be assembled and disassembled. These instructions are unusual in that they operate on 4 consecutive registers so supporting them in codegen will be more difficult than normal. Includes an assembler check to warn if the source register is not the first register of a 4 register group. llvm-svn: 333812	2018-06-02 02:15:10 +00:00
Craig Topper	aa747412b1	[X86] Add isel patterns to use vexpand with zero masking when the passthru value is a zero vector. llvm-svn: 333800	2018-06-01 22:28:28 +00:00
Craig Topper	5989db0fb4	[X86] Remove some of the extractelts from the new MOVSS+FMA patterns. We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced. So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV. llvm-svn: 333473	2018-05-29 22:52:09 +00:00
Craig Topper	dbd371e931	[X86] Use VR128X instead of VR128 in EVEX instruction patterns. llvm-svn: 333464	2018-05-29 20:46:27 +00:00
Craig Topper	aba57bfebd	[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction. The order should be controlled in the input pattern. llvm-svn: 333463	2018-05-29 20:46:26 +00:00
Alexander Ivchenko	96062eaa8e	[X86] Scalar mask and scalar move optimizations 1. Introduction of mask scalar TableGen patterns. 2. Introduction of new scalar move TableGen patterns and refactoring of existing ones. 3. Folding of pattern created by introducing scalar masking in Clang header files. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D47012 llvm-svn: 333419	2018-05-29 14:27:11 +00:00
Craig Topper	dcfcfdb0d1	[X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode. These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node. This is a step towards reducing the number of intrinsics needed. A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial. llvm-svn: 333383	2018-05-28 19:33:11 +00:00
Craig Topper	26bc84860a	[X86] Stop forcing X86VPermi2X node index operand to match destination type to make masking pattern matching easier. Add extra patterns with bitcasts instead. This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future. This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed. llvm-svn: 333365	2018-05-28 05:37:25 +00:00
Petar Jovanovic	c051000b83	[X86][MIPS][ARM] New machine instruction property 'isMoveReg' This property is needed in order to follow values movement between registers. This property is used in TII to implement method that returns true if simple copy like instruction is recognized, along with source and destination machine operands. Patch by Nikola Prica. Differential Revision: https://reviews.llvm.org/D45204 llvm-svn: 333093	2018-05-23 15:28:28 +00:00
Simon Pilgrim	1273f4ad93	[X86] Add GPR<->XMM Schedule Tags BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745	2018-05-18 17:58:36 +00:00
Simon Pilgrim	c4b8d367a8	[X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332718	2018-05-18 14:08:01 +00:00
Simon Pilgrim	d749b321b2	[X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714	2018-05-18 13:13:59 +00:00
Craig Topper	a2c5264718	[X86] Add OptForSize to a couple load folding patterns. Remove some bad FIXME comments. The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update. llvm-svn: 332573	2018-05-17 05:41:11 +00:00
Simon Pilgrim	5647e89f5a	[X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classes A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332451	2018-05-16 10:53:45 +00:00
Simon Pilgrim	be9a206883	[X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376	2018-05-15 17:36:49 +00:00
Simon Pilgrim	891ebcdbaa	[X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357	2018-05-15 14:12:32 +00:00
Simon Pilgrim	215ce4a1ca	[X86] Add NT load/store scheduler classes llvm-svn: 332274	2018-05-14 18:37:19 +00:00
Craig Topper	53ceb4805f	[X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd intrinsics. llvm-svn: 332271	2018-05-14 18:21:22 +00:00
Craig Topper	0e71c6d5ca	[X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use uitofp+insertelement instead. llvm-svn: 332206	2018-05-14 00:06:49 +00:00
Craig Topper	97e74b05ef	[X86] Add patterns for combining movss+uint_to_fp into the intrinsic instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205	2018-05-13 23:24:21 +00:00
Craig Topper	38b713d4a7	[X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions. llvm-svn: 332189	2018-05-13 01:54:33 +00:00
Craig Topper	38ad7ddabc	[X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time. llvm-svn: 332186	2018-05-12 23:14:39 +00:00
Simon Pilgrim	ead11e4d4b	[X86] Added scheduler helper classes to split move/load/store by size Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves. llvm-svn: 332090	2018-05-11 12:46:54 +00:00
Craig Topper	9968af4a2a	[X86] Remove and autoupgrade the avx512.mask.store.ss intrinsic. llvm-svn: 332079	2018-05-11 04:33:18 +00:00
Craig Topper	1ee19ae126	[X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049	2018-05-10 21:49:16 +00:00
Simon Pilgrim	38ac0e9c6b	[X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999	2018-05-10 17:06:09 +00:00
Simon Pilgrim	1233e1234a	[X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672	2018-05-07 20:52:53 +00:00
Simon Pilgrim	ac5d0a31ef	[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643	2018-05-07 16:15:46 +00:00
Simon Pilgrim	f3ae50fca2	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629	2018-05-07 11:50:44 +00:00
Simon Pilgrim	67cc246dca	[X86] Cleanup SchedWriteFMA classes and use X86SchedWriteWidths directly. Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes. llvm-svn: 331531	2018-05-04 15:20:18 +00:00
Simon Pilgrim	be51b20127	[X86] Add SchedWriteFRnd fp rounding scheduler classes Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515	2018-05-04 12:59:24 +00:00
Simon Pilgrim	0720c8d90e	[X86][AVX512] VPLZCNT instructions match SchedWriteVecIMul scheduling class not SchedWriteVecALU. llvm-svn: 331473	2018-05-03 18:22:49 +00:00
Simon Pilgrim	f2d2cedab4	[X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472	2018-05-03 17:56:43 +00:00
Simon Pilgrim	39196a1dd3	[X86][AVX512] VPAVG instructions should be tagged as SchedWriteVecALU llvm-svn: 331446	2018-05-03 10:53:17 +00:00
Simon Pilgrim	93c878c76b	[X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...) llvm-svn: 331445	2018-05-03 10:31:20 +00:00
Simon Pilgrim	a1f1a3bf94	[X86] Convert most remaining AVX512 uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths. We've dealt with the majority already. llvm-svn: 331353	2018-05-02 13:32:56 +00:00
Simon Pilgrim	c708868cb1	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes llvm-svn: 331290	2018-05-01 18:06:07 +00:00
Simon Pilgrim	c546f9424f	[X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes Removes more WriteFCmp InstRW overrides llvm-svn: 331283	2018-05-01 16:50:16 +00:00
Simon Pilgrim	1b7a80d80a	[X86] Convert all uses of WriteFAdd to X86SchedWriteWidths. In preparation of splitting WriteFAdd by vector width. llvm-svn: 331273	2018-05-01 15:57:17 +00:00
Simon Pilgrim	f6b81dae9e	[X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths. In preparation of splitting WriteFShuffle by vector width. llvm-svn: 331262	2018-05-01 14:14:42 +00:00
Simon Pilgrim	6f710a6440	[X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths. In preparation of splitting WriteVecLogic by vector width. llvm-svn: 331256	2018-05-01 12:15:29 +00:00
Simon Pilgrim	fc0c26f1a6	[X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts. Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view. llvm-svn: 331253	2018-05-01 11:05:42 +00:00
Simon Pilgrim	3c35408e48	[X86] Introduce X86SchedWriteWidths schedule wrapper for different vector widths. We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required. I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future. This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality. Differential Revision: https://reviews.llvm.org/D46266 llvm-svn: 331208	2018-04-30 18:18:38 +00:00
Craig Topper	06624e1a93	[X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic This patch restricts a lot of these to only one variant so we don't get the duplication. This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen. llvm-svn: 331117	2018-04-28 18:46:11 +00:00
Simon Pilgrim	8a937e00d8	[X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed. llvm-svn: 331065	2018-04-27 18:19:48 +00:00
Simon Pilgrim	b2aa89c909	[X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides. llvm-svn: 331051	2018-04-27 15:50:33 +00:00
Simon Pilgrim	dbd1ae7ddd	[X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes This removes all the FMA InstRW overrides. If we ever get PR36924, then we can remove many of these declarations from models. llvm-svn: 330820	2018-04-25 13:07:58 +00:00
Simon Pilgrim	cf0199a289	[AVX512] VPERMQ/VPERMPD/VPERMIL single op shuffles are not variable shuffles These variants all take an immediate shuffle mask value and should be scheduled as such. llvm-svn: 330747	2018-04-24 17:59:54 +00:00
Simon Pilgrim	f0945aa0e0	[X86][F16C] Add WriteCvtF2FSt scheduling class Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737	2018-04-24 16:43:07 +00:00
Simon Pilgrim	f7d2a93d5f	[X86] Add vector element insertion/extraction scheduler classes Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714	2018-04-24 13:21:41 +00:00
Simon Pilgrim	d14d2e7b18	[X86] Add WriteFSign/WriteFLogic scheduler classes Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480	2018-04-20 21:16:05 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Simon Pilgrim	86e3c26924	[X86] Add FP comparison scheduler classes Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd Differential Revision: https://reviews.llvm.org/D45656 llvm-svn: 330179	2018-04-17 07:22:44 +00:00
Simon Pilgrim	fe3d59e98b	[X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd llvm-svn: 330023	2018-04-13 14:41:05 +00:00
Simon Pilgrim	21e89795cc	[X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093) llvm-svn: 330022	2018-04-13 14:36:59 +00:00
Simon Pilgrim	ae0c2711b6	[X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093) llvm-svn: 330013	2018-04-13 12:50:31 +00:00
Simon Pilgrim	1f070c334c	[X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers. Was being used to move around empty/unused itineraries... llvm-svn: 329970	2018-04-12 22:57:34 +00:00
Simon Pilgrim	6551d405dc	[X86] Remove x86 InstrItinClass entries (PR37093) This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093. llvm-svn: 329967	2018-04-12 22:44:47 +00:00
Simon Pilgrim	0e45634f4e	[X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093) llvm-svn: 329953	2018-04-12 20:47:34 +00:00
Simon Pilgrim	e9376b9fdc	[X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093) llvm-svn: 329945	2018-04-12 19:59:35 +00:00
Simon Pilgrim	577ae24feb	[X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093) llvm-svn: 329940	2018-04-12 19:25:07 +00:00
Simon Pilgrim	8904a86f65	[X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093) llvm-svn: 329912	2018-04-12 14:31:42 +00:00
Simon Pilgrim	294556d40e	[X86] Remove remaining system/special schedule itineraries (PR37093) llvm-svn: 329906	2018-04-12 12:43:49 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	a30db995b3	[X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ. These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154	2018-04-04 07:00:24 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	5fb1dc2d22	[X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions. llvm-svn: 328958	2018-04-02 02:44:55 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Craig Topper	a406796f5f	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991	2018-03-08 08:02:52 +00:00
Craig Topper	f2aae62228	[X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores. llvm-svn: 326679	2018-03-04 19:33:15 +00:00
Craig Topper	be31585be8	[X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported. We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns. llvm-svn: 326669	2018-03-04 01:48:00 +00:00
Craig Topper	e31b9d1e5f	[X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating. This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns. llvm-svn: 326375	2018-02-28 22:23:55 +00:00
Craig Topper	ac799b05d4	[X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results. While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together. The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND. Here is pseudocode from the intrinsics guide FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0 llvm-svn: 326306	2018-02-28 06:19:55 +00:00
Craig Topper	6694df14e6	[X86] Use SDNode instead of SDPatternOperator. NFC llvm-svn: 326048	2018-02-25 06:21:04 +00:00
Craig Topper	7bcac492d4	[X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns. This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from. llvm-svn: 325999	2018-02-24 00:15:05 +00:00
Craig Topper	16b20245ba	[X86] Add assembler/disassembler support for blendm with zero masking and broacast. Fixes PR31617 llvm-svn: 325957	2018-02-23 20:48:44 +00:00
Craig Topper	61d6ddbf0a	[X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector. These can be created by type legalization promoting the inputs to select to match scalar boolean contents. We were trying to pattern match them away during isel, but its better to just remove them from the DAG. I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal. llvm-svn: 325949	2018-02-23 20:13:42 +00:00
Craig Topper	9b64bf54b9	[X86] Make a helper function for commuting AVX512 VPCMP immediates since we do it in two places. llvm-svn: 325546	2018-02-20 03:58:11 +00:00
Craig Topper	9471a7c898	[X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching. Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node. This removes over 24000 bytes from the isel table. llvm-svn: 325526	2018-02-19 19:23:31 +00:00
Craig Topper	1040f236a3	[X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding. Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1. This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt llvm-svn: 325456	2018-02-18 02:37:33 +00:00
Simon Pilgrim	07e1337c2a	[X86][AVX512] Add missing scheduling class tag for KMOVB/KMOVW/KMOVD/KMOVQ moves/loads/stores. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324905	2018-02-12 16:59:04 +00:00
Simon Pilgrim	369e59d4d1	[X86][AVX512] Add missing scheduling class tag for VMOVQ/VMOVHLPS/VMOVLHPS/VMOVHPD/VMOVHPS/VMOVLPD/VMOVLPS Tag AVX512 variants to match SSE/AVX originals. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324901	2018-02-12 16:18:36 +00:00
Craig Topper	3ce035acf3	[X86] Add KADD X86ISD opcode instead of reusing ISD::ADD. ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861	2018-02-12 01:33:38 +00:00
Craig Topper	5a2bd99a9e	[X86] Add isel patterns for selecting masked SUBV_BROADCAST with bitcasts. Remove combineBitcastForMaskedOp. Add test cases for the merge masked versions to make sure we have all those covered. llvm-svn: 324210	2018-02-05 08:37:37 +00:00
Craig Topper	25ceba7f30	[X86] Remove X86ISD::SHUF128 from combineBitcastForMaskedOp. Use isel patterns instead. We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking. The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it. llvm-svn: 324205	2018-02-05 06:00:23 +00:00
Craig Topper	571231a7fe	[X86] Use VMOVDQA64 for aligned vXi32 stores. I meant to do this with the unaligned stores in r322820, but looks like I missed it. llvm-svn: 323708	2018-01-29 23:27:23 +00:00
Craig Topper	15d69739e2	[X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm eq/ne with immallzeros. llvm-svn: 323612	2018-01-28 00:56:30 +00:00
Craig Topper	5e4b45361f	[X86] Add patterns for using masked vptestnmd for 256-bit vectors without VLX. We can widen the mask and extract it back down. llvm-svn: 323610	2018-01-27 23:49:14 +00:00
Craig Topper	513d3fa674	[X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604	2018-01-27 20:19:02 +00:00
Craig Topper	8a444ee67c	[X86] Use vpternlog to implement vector not under AVX512. Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation. This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner. llvm-svn: 323572	2018-01-26 22:17:40 +00:00
Craig Topper	05af43fbad	[X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW The weirdest being that PEXTRWrr was tagged as a memory operation. llvm-svn: 323353	2018-01-24 17:58:57 +00:00
Craig Topper	0321ebc054	[X86] Use ISD::SIGN_EXTEND instead of X86ISD::VSEXT for mask to xmm/ymm/zmm conversion There are a couple tricky things with this patch. I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place. I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do. Differential Revision: https://reviews.llvm.org/D42407 llvm-svn: 323301	2018-01-24 04:51:17 +00:00
Craig Topper	002657731b	[X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs. All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up. llvm-svn: 323261	2018-01-23 21:37:51 +00:00
Craig Topper	26a701f24f	[X86] Various vXi1 insertion improvements. Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector. llvm-svn: 323173	2018-01-23 05:36:53 +00:00
Marina Yatsina	6fc2aaae8d	Separate ExecutionDepsFix into 4 parts: 1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087	2018-01-22 10:05:23 +00:00
Craig Topper	83b0a98902	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820	2018-01-18 07:44:09 +00:00
Craig Topper	21c8a8fa49	[X86] Remove isel patterns for using unmasked vmovdqa32/vmovdqu32 for integer vector loads. These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64. llvm-svn: 322819	2018-01-18 07:44:06 +00:00
Clement Courbet	da1fad3ec6	[X86] Add missing predicates for VRNDSCALES{D,S}{m,r} Summary: This is similar to https://reviews.llvm.org/D41983. Reviewers: gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42069 llvm-svn: 322486	2018-01-15 14:24:07 +00:00
Clement Courbet	41a13740c5	[X86] Fix missing predicates HasAVX512 Predicates in avx512_sqrt_scalar. Summary: For example, VSQRTSDZr and VSQRTSSZr were missing the predicate. Also fix braces indentation and braces for consistency. Reviewers: craig.topper, RKSimon Suscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41983 llvm-svn: 322478	2018-01-15 12:05:33 +00:00
Craig Topper	b2868233b7	[X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output types have the same number of elements. llvm-svn: 322455	2018-01-14 08:11:36 +00:00
Craig Topper	e9fc0cd920	[X86] Improve legalization of vXi16/vXi8 selects. Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization. Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already. llvm-svn: 322450	2018-01-14 02:05:51 +00:00
Craig Topper	cb09bd1227	[X86] Remove unused isel pattern for zero extend from v16i1/v8i1 to v16i32/v8i64. We have custom lowering on vzext that produces a vselect and a build vector. So zext never gets to isel. llvm-svn: 322381	2018-01-12 17:34:09 +00:00
Craig Topper	0b59034b15	[X86] Optimize v2i32/v2f32 scatters. If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather. Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering. llvm-svn: 322254	2018-01-11 06:31:28 +00:00
Craig Topper	7c2abdd249	[X86] Remove unnecessary isel pattern that is a combination of two other patterns. The pattern was this def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))), (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>; but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit). So we can just let isel do the two patterns naturally. llvm-svn: 322049	2018-01-09 00:50:42 +00:00
Craig Topper	f090e8a89a	[X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero. CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985	2018-01-08 06:53:54 +00:00
Craig Topper	a2018e799a	[X86] Add patterns to allow 512-bit BWI compare instructions to be used for 128/256-bit compares when VLX is not available. llvm-svn: 321984	2018-01-08 06:53:52 +00:00
Craig Topper	d58c165545	[X86] Make v2i1 and v4i1 legal types without VLX Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967	2018-01-07 18:20:37 +00:00
Craig Topper	61d8a60e23	[X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm matcher table. This is also needed to fix PR35837. llvm-svn: 321946	2018-01-06 21:27:25 +00:00
Craig Topper	0f4ccb7806	[X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si. llvm-svn: 321945	2018-01-06 21:02:26 +00:00
Craig Topper	90353a9f42	[X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead. For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well. llvm-svn: 321944	2018-01-06 21:02:22 +00:00
Craig Topper	a49c354a08	[X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the assembler matcher table We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version. Fixes PR35837. llvm-svn: 321939	2018-01-06 19:20:33 +00:00
Craig Topper	b18d6221ba	[X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFC This makes the names consistent with the mnemonics like every other instruction. llvm-svn: 321931	2018-01-06 07:18:08 +00:00
Craig Topper	e2659d8383	[X86] Add vcvtsd2sil/vcvtsd2siq etc. InstAliases to the EVEX-encoded instructions. This matches their VEX equivalents. llvm-svn: 321912	2018-01-05 23:13:54 +00:00
Craig Topper	29476ab0bd	[X86] Add InstAliases for 'vmovd' with GR64 registers to select EVEX encoded instructions as well. Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16" This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512. llvm-svn: 321903	2018-01-05 21:57:23 +00:00
Craig Topper	694c73adc2	[X86] Add missing NoVLX predicate around some patterns that use zmm registers to implement 128/256-bit operations without VLX. llvm-svn: 321613	2018-01-01 01:11:32 +00:00
Craig Topper	fc3ce4993c	[X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. llvm-svn: 321612	2018-01-01 01:11:29 +00:00
Craig Topper	876ec0b558	[X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we don't have DQI. We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns. llvm-svn: 321598	2017-12-31 07:38:41 +00:00
Craig Topper	6159f5ebd8	[X86] Remove patterns for load/store of vXi with bitcasts to/from integer. This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns. llvm-svn: 321597	2017-12-31 07:38:36 +00:00
Craig Topper	a362dee774	[X86] Remove AND32ri8 from pattern for v1i1 load. I don't think anything would actually expect the other bits to be zero. llvm-svn: 321596	2017-12-31 07:38:33 +00:00
Craig Topper	97cc7b0377	[X86] Remove isel patterns for kshifts with types that don't support kshift natively. We should only be creating natively supported kshifts now. llvm-svn: 321577	2017-12-30 06:45:46 +00:00
Craig Topper	c5fd31a802	[X86] Custom legalize vXi1 extract_subvector with KSHIFTR. This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. llvm-svn: 321576	2017-12-30 06:45:43 +00:00
Craig Topper	88e26a99f8	[X86] Remove unnecessary patterns for sign extending vXi1 without VLX. The custom lowering already widens the result type to 512-bits if VLX isn't supported. llvm-svn: 321533	2017-12-28 19:45:55 +00:00

... 2 3 4 5 6 ...

1173 Commits