llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	a6054328e8	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Zvi Rackover	255488a1e0	X86 Tests: More AVX512 conversions tests. NFC Adding more tests for AVX512 fp<->int conversions that were missing. llvm-svn: 312921	2017-09-11 15:54:38 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Zvi Rackover	5ebe94a84d	X86 Tests: Tidy up AVX512 conversion tests. NFC. Rename functions to a consistent format to make it easier to track coverage. llvm-svn: 312619	2017-09-06 05:33:04 +00:00
Zvi Rackover	2096893f34	X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC. Some of the cases show missing pattern i intend to fix shortly. llvm-svn: 312560	2017-09-05 18:24:39 +00:00
Craig Topper	48a7917079	[AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100	2017-08-30 07:26:12 +00:00
Gadi Haber	d76f7b824e	[X86][Haswell] Updating HSW instruction scheduling information This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target. We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling. The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792. Information includes latency, number of micro-Ops and used ports by each HSW instruction. Please expect some performance fluctuations due to code alignment effects. Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud Differential Revision: https://reviews.llvm.org/D36663 llvm-svn: 311879	2017-08-28 10:04:16 +00:00
Craig Topper	3a622a14f9	[AVX512] Don't switch unmasked subvector insert/extract instructions when AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091	2017-08-17 15:40:25 +00:00
Dinar Temirbulatov	a0beedef1c	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926	2017-08-03 08:50:18 +00:00
Craig Topper	410d252f5b	[AVX-512] Add unmasked subvector inserts and extract to the execution domain tables. llvm-svn: 309632	2017-07-31 22:07:29 +00:00
Dinar Temirbulatov	aead31a36f	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298	2017-07-27 17:47:01 +00:00
Simon Pilgrim	f9ea0959d9	[X86][AVX] Regenerate tests with constant broadcast comments llvm-svn: 308110	2017-07-15 21:17:35 +00:00
Michael Zuckerman	f66840020c	Reverting commit 306414 on behalf of @gadi.haber llvm-svn: 306532	2017-06-28 11:23:31 +00:00
Gadi Haber	13759a7ed6	Updated and extended the information about each instruction in HSW and SNB to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414	2017-06-27 15:05:13 +00:00
Sanjay Patel	6e8e7cc70e	[x86] avoid flipping sign bits for vector icmp by using known bits If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909	2017-06-07 13:46:34 +00:00
Sanjay Patel	56641ac497	[x86] fix over-specific triple; NFC There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304614	2017-06-02 23:40:46 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Craig Topper	058f2f6d72	[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928	2017-03-28 16:35:29 +00:00
Michael Zuckerman	85436ece89	[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128\|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586	2017-03-23 09:57:01 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Simon Pilgrim	511d788a95	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451	2017-02-17 15:14:48 +00:00
Craig Topper	24c3a2395f	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support. llvm-svn: 291747	2017-01-12 06:49:12 +00:00
Craig Topper	69ab67b279	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq. llvm-svn: 291746	2017-01-12 06:49:08 +00:00
Craig Topper	56f9610b98	[AVX-512] Add more varied avx512 feature command lines to the avx512-cvt.ll test to show some poor codegen examples. We're definitely doing bad things when avx512vl is enabled without avx512dq. It looks like avx512vl/dq without avx512bw may also have some issues. llvm-svn: 291744	2017-01-12 06:49:03 +00:00
Elad Cohen	0c2601073e	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d} The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660	2017-01-11 09:11:48 +00:00
Craig Topper	6393afce97	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415	2017-01-09 02:44:34 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Simon Pilgrim	e940daf532	[X86][SSE] Add support for combining target shuffles to SHUFPS. As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064	2016-12-18 14:26:02 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Simon Pilgrim	ab323ec411	[X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering llvm-svn: 287877	2016-11-24 13:38:59 +00:00
Simon Pilgrim	9e355bc5bb	[X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64 llvm-svn: 287106	2016-11-16 14:24:04 +00:00
Simon Pilgrim	ceffb43b1b	[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum) This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979	2016-11-15 16:24:40 +00:00
Craig Topper	b8596e4d1d	[X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787	2016-11-14 01:53:29 +00:00
Craig Topper	d8b2bd492c	[AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate. llvm-svn: 282356	2016-09-25 16:33:57 +00:00
Simon Pilgrim	6c21e6a54e	[X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations. This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware). While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions. Differential Revision: https://reviews.llvm.org/D24343 llvm-svn: 281852	2016-09-18 12:45:23 +00:00
Marina Yatsina	88f0c31f13	Avoid false dependencies of undef machine operands This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref. Pseudo example: loop: xmm0 = ... xmm1 = vcvtsi2sdl eax, xmm0<undef> ... = inst xmm0 jmp loop In this example, selecting xmm0 as the undef register creates false dependency between loop iterations. This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction. Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem. Differential Revision: https://reviews.llvm.org/D22466 llvm-svn: 278321	2016-08-11 07:32:08 +00:00
Sanjay Patel	5ccc85fe83	[x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895) This handles the case in: https://llvm.org/bugs/show_bug.cgi?id=28895 ...but we are not getting all of the possibilities yet. Eg, we use 'X86::FANDN' for scalar FP select combines. That enhancement is filed as: https://llvm.org/bugs/show_bug.cgi?id=28925 Differential Revision: https://reviews.llvm.org/D23337 llvm-svn: 278270	2016-08-10 19:00:11 +00:00
Craig Topper	19505bc354	[AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate. llvm-svn: 277933	2016-08-06 19:31:50 +00:00
Craig Topper	c48c029610	[AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327	2016-08-01 07:55:33 +00:00
Elena Demikhovsky	64e5f929d0	AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors. It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648	2016-07-25 16:51:00 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Elena Demikhovsky	95629caaa9	AVX-512: fixed a bug in fp_to_uint pattern on KNL Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701	2016-03-29 06:33:41 +00:00
Igor Breger	7f69a99c54	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247276	2015-09-10 12:54:54 +00:00
Renato Golin	db7ea86bf4	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177	2015-09-09 19:44:40 +00:00
Igor Breger	ac29a82921	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149	2015-09-09 14:35:09 +00:00
Elena Demikhovsky	17b906058e	AVX-512: Floating point conversions for SKX - DAG Lowering. SKX supports conversion for all FP types. Integer types include doublewords and quardwords. I added "Legal" status for these nodes and a bunch of tests. I added "NoVLX" for AVX DAG selection to force VLX instructions selection when VLX is supported. Differential Revision: http://reviews.llvm.org/D11255 llvm-svn: 242637	2015-07-19 10:17:33 +00:00
Elena Demikhovsky	f40342d6a2	AVX-512: fixed UINT_TO_FP operation for 512-bit types. llvm-svn: 236955	2015-05-10 14:23:52 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Elena Demikhovsky	d5e95b57e0	AVX-512: SINT_TO_FP cost model and some bugfixes Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883	2014-11-13 11:46:16 +00:00

1 2

57 Commits