llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Sanjay Patel	92938657a0	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699	2017-03-01 22:51:31 +00:00
Sanjay Patel	74ca880749	[x86] add alternate IR tests for select of constants; NFC llvm-svn: 296496	2017-02-28 18:02:38 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Craig Topper	aa46204ed9	[DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930	2017-02-12 23:49:49 +00:00
Craig Topper	04840ab752	[X86] Update test case I missed in r294876. llvm-svn: 294878	2017-02-11 23:23:11 +00:00
Igor Breger	ed43f15637	Add new tests for EXTRACT_VECTOR_ELT (vector of packed i8/16/i32/i64/ps/pd data) llvm-svn: 294565	2017-02-09 07:39:19 +00:00
Craig Topper	6393afce97	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415	2017-01-09 02:44:34 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Matt Arsenault	92fede361f	DAG: Fold out out of bounds insert_vector_elt getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603	2016-12-03 23:03:26 +00:00
Matthias Braun	39c3c89cdc	MCStreamer: Use "cfi" for CFI related temp labels. Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290	2016-11-30 23:48:26 +00:00
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Igor Breger	7e2a0dfa0c	revert r279960. https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625	2016-09-04 14:03:52 +00:00
Igor Breger	1a388871b9	[AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960	2016-08-29 08:52:52 +00:00
Michael Kuperstein	2ee911e985	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. (Recommit of r279782 which was broken due to a bad merge.) This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279788	2016-08-25 22:48:11 +00:00
Michael Kuperstein	6e271f4ce8	Revert r279782 due to debug buildbot breakage. llvm-svn: 279785	2016-08-25 22:14:45 +00:00
Michael Kuperstein	a6ccc8d365	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279782	2016-08-25 21:55:41 +00:00
Igor Breger	8672408db0	[AVX512] Fix insertelement i1 lowering. 1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 ) 2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE. Differential Revision: http://reviews.llvm.org/D23347 llvm-svn: 278623	2016-08-14 05:25:07 +00:00
Igor Breger	a77b14d02c	[AVX512] Fix extractelement i1 lowering. The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing. Make extractelement i1 lowering custom for all vector i1. Differential Revision: http://reviews.llvm.org/D23246 llvm-svn: 278328	2016-08-11 12:13:46 +00:00
Elena Demikhovsky	dca03bebd3	AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV. It allows to suppress two opposite BITCAST operations and avoid redundant "movs". Differential Revision: https://reviews.llvm.org/D23247 llvm-svn: 277958	2016-08-07 13:05:58 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Elena Demikhovsky	ad0a56f3da	Re-commit of 274613. The prev commit failed on compilation. A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure. llvm-svn: 274626	2016-07-06 14:15:43 +00:00
Elena Demikhovsky	02ced295aa	Reverted 274613 due to compilation failue. llvm-svn: 274615	2016-07-06 09:11:49 +00:00
Elena Demikhovsky	5a4f2476fd	AVX-512: Optimization for patterns with i1 scalar type The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc". I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction. I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions. This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173. Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails). Differential revision: http://reviews.llvm.org/D21956 llvm-svn: 274613	2016-07-06 09:01:20 +00:00
Craig Topper	99e30e6a66	[AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626	2016-06-14 03:13:00 +00:00
Craig Topper	02626c076b	[AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318	2016-05-21 07:08:56 +00:00
Elena Demikhovsky	5e426f7356	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Elena Demikhovsky	e5bbca6ae2	Optimized loading (zextload) of i1 value from memory. This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793. Extra "and" causes performance degradation. We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits. Differential Revision: http://reviews.llvm.org/D17541 llvm-svn: 261828	2016-02-25 07:05:12 +00:00
Igor Breger	defab3c1ef	AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation. This instructions doesn't have intrincis. Added tests for lowering and encoding. Differential Revision: http://reviews.llvm.org/D12317 llvm-svn: 249688	2015-10-08 12:55:01 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Igor Breger	7f69a99c54	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247276	2015-09-10 12:54:54 +00:00
Renato Golin	db7ea86bf4	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177	2015-09-09 19:44:40 +00:00
Igor Breger	ac29a82921	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149	2015-09-09 14:35:09 +00:00
Elena Demikhovsky	f61727d880	AVX-512: fixed algorithm of building vectors of i1 elements fixed extract-insert i1 element, load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage. added a bunch of new tests. llvm-svn: 237793	2015-05-20 14:32:03 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Elena Demikhovsky	d2cb3c8876	AVX-512: Fixed the "test" operation for i1 type Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits. KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits. I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB. There are some cases where i1 is in the mask register and the upper bits are already zeroed. Then KORTESTW is the better solution, but it is subject for optimization. Meanwhile, I'm fixing the correctness issue. llvm-svn: 228916	2015-02-12 08:40:34 +00:00
Elena Demikhovsky	1a603b3f13	AVX-512: Changes in operations on masks registers for KNL and SKX - Added KSHIFTB/D/Q for skx - Added KORTESTB/D/Q for skx - Fixed store operation for v8i1 type for KNL - Store size of v8i1, v4i1 and v2i1 are changed to 8 bits llvm-svn: 227043	2015-01-25 12:47:15 +00:00
Adam Nemet	4285c1f8cc	[AVX512] Add DQ subvector inserts In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874	2014-10-15 23:42:17 +00:00
Adam Nemet	2b71ca5d88	[AVX512] Add SKX testing to avx512-insert-extract.ll This is in preparation to adding DQ subvector inserts to this testcase. llvm-svn: 219873	2014-10-15 23:42:14 +00:00
Adam Nemet	f3aba14dad	[AVX512] Fix test to produce a defined value We're inserting into a 8 wide vector, so the index should be < 8. llvm-svn: 219872	2014-10-15 23:42:11 +00:00
Adam Nemet	8d5354eaa2	[AVX512] Make vextractx4/vinsertx4 tests check for the index as well Extend test so that it provides coverage for the next commit. llvm-svn: 218479	2014-09-25 23:48:47 +00:00
Elena Demikhovsky	cf0b9bafc3	AVX-512: insert element to mask vector; store i1 data Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors; Implemented "store" for i1 type llvm-svn: 205850	2014-04-09 12:37:50 +00:00
Elena Demikhovsky	9737e3886b	AVX-512: Fixed extract_vector_elt for v8i1 vector llvm-svn: 202624	2014-03-02 09:19:44 +00:00
Elena Demikhovsky	1fad075974	AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence llvm-svn: 201487	2014-02-16 11:34:23 +00:00
Elena Demikhovsky	9f423d6f25	AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors. llvm-svn: 201066	2014-02-10 07:02:39 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00

1 2

56 Commits