llvm-project

Commit Graph

Author	SHA1	Message	Date
Elena Demikhovsky	59f6f8c05d	[X86 Codegen Test] Divided masked_memop into several files. NFC. The masked_memop.ll became huge. I extracted AVX-512 specific tests into separate files. llvm-svn: 281892	2016-09-19 08:58:43 +00:00
Craig Topper	d9ca3d97ef	[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648	2016-09-05 06:43:06 +00:00
Igor Breger	7e2a0dfa0c	revert r279960. https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625	2016-09-04 14:03:52 +00:00
Igor Breger	1a388871b9	[AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960	2016-08-29 08:52:52 +00:00
Michael Kuperstein	2ee911e985	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. (Recommit of r279782 which was broken due to a bad merge.) This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279788	2016-08-25 22:48:11 +00:00
Michael Kuperstein	6e271f4ce8	Revert r279782 due to debug buildbot breakage. llvm-svn: 279785	2016-08-25 22:14:45 +00:00
Michael Kuperstein	a6ccc8d365	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279782	2016-08-25 21:55:41 +00:00
Igor Breger	a77b14d02c	[AVX512] Fix extractelement i1 lowering. The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing. Make extractelement i1 lowering custom for all vector i1. Differential Revision: http://reviews.llvm.org/D23246 llvm-svn: 278328	2016-08-11 12:13:46 +00:00
Craig Topper	4c53e60360	[AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests. llvm-svn: 277304	2016-07-31 20:20:01 +00:00
Craig Topper	52e2e8381b	[AVX512] Add ExeDomain to vector extend and truncate instructions. llvm-svn: 276394	2016-07-22 05:46:44 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Simon Pilgrim	88e0940d3b	[X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW. But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register. This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41). Fix for PR27265. Differential Revision: https://reviews.llvm.org/D22509 llvm-svn: 276289	2016-07-21 14:54:17 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00
Craig Topper	5c913e84df	[AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported. Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step. llvm-svn: 275763	2016-07-18 06:14:34 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Artur Pilipenko	7ad95ec22d	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043	2016-06-28 18:27:25 +00:00
Artur Pilipenko	72f76b8805	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895	2016-06-27 16:54:33 +00:00
Artur Pilipenko	a36aa41519	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892	2016-06-27 16:29:26 +00:00
Craig Topper	db960eddfa	[AVX512] Add patterns for extracting subvectors and storing to memory. llvm-svn: 270334	2016-05-21 22:50:14 +00:00
Craig Topper	02626c076b	[AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318	2016-05-21 07:08:56 +00:00
Craig Topper	e5ce84a33c	[AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used. llvm-svn: 268884	2016-05-08 21:33:53 +00:00
Adam Nemet	7aab648831	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282	2016-04-14 08:47:17 +00:00
Artur Pilipenko	dbe0bc8df4	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086	2016-04-12 15:58:04 +00:00
Matthias Braun	68bb2931cc	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088	2016-03-22 20:24:34 +00:00
Sanjay Patel	62d707c8d9	[x86, AVX] replace masked load with full vector load when possible Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446	2016-03-14 16:54:43 +00:00
Artur Pilipenko	3c8fc57e16	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158	2016-03-10 20:39:22 +00:00
Sanjay Patel	a333dcfc42	give regression test a meaningful name llvm-svn: 263135	2016-03-10 17:52:19 +00:00
Sanjay Patel	4a8dd89128	[x86, AVX] optimize masked loads with constant masks Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067	2016-03-09 22:12:08 +00:00
Igor Breger	4d94d4d5f7	AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803	2016-03-06 12:38:58 +00:00
Igor Breger	f1bd761e00	AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797	2016-03-06 07:46:03 +00:00
Sanjay Patel	216b275994	[x86] add tests for masked loads with constant masks llvm-svn: 262758	2016-03-04 23:28:07 +00:00
Sanjay Patel	c7dde5f502	[x86] convert masked load of exactly one element to scalar load This is the load counterpart to the store optimization that was added in: http://reviews.llvm.org/rL260145 llvm-svn: 260325	2016-02-09 23:44:35 +00:00
Sanjay Patel	264d7e5b68	[x86] convert masked store of one element to scalar store Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145	2016-02-08 21:05:08 +00:00
Igor Breger	6d421419db	AVX1 : Enable vector masked_load/store to AVX1. Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675	2016-01-25 10:17:11 +00:00
Sanjay Patel	a41ecae46e	regenerate checks and note some near-term improvements For the moment, this file takes way too long to run (see inline comments), but that should be a temporary problem. The fact that the compile time is so slow for a target that doesn't support maskmov may be a bug worth investigating too. llvm-svn: 258629	2016-01-23 17:52:56 +00:00
Sanjay Patel	908ea7312a	fixed to test features, not CPU models llvm-svn: 258568	2016-01-22 22:20:56 +00:00
Igor Breger	756c289dd8	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE. Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470	2015-12-27 13:56:16 +00:00
Elena Demikhovsky	6015f5c823	Type legalizer for masked gather and scatter intrinsics. Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double). This intrinsic, for example void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask ) requires type widening for data and type promotion for mask. Differential Revision: http://reviews.llvm.org/D13633 llvm-svn: 255629	2015-12-15 08:40:41 +00:00
Elena Demikhovsky	33e61eceb4	AVX-512: Fixed masked load / store instruction selection for KNL. Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store. This intrinsic comes with all legal types: <8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru), but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only. All data operands should be widened to 512-bit vector. The mask operand should be widened to v16i1 with zeroes. Differential Revision: http://reviews.llvm.org/D15265 llvm-svn: 254909	2015-12-07 13:39:24 +00:00
Elena Demikhovsky	1ca72e1846	Pointers in Masked Load, Store, Gather, Scatter intrinsics The masked intrinsics support all integer and floating point data types. I added the pointer type to this list. Added tests for CodeGen and for Loop Vectorizer. Updated the Language Reference. Differential Revision: http://reviews.llvm.org/D14150 llvm-svn: 253544	2015-11-19 07:17:16 +00:00
Elena Demikhovsky	3ad76a1acd	Masked Load/Store optimization for scalar code When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks. I added optimization for constant mask vector. Differential Revision: http://reviews.llvm.org/D13855 llvm-svn: 250893	2015-10-21 11:50:54 +00:00
Simon Pilgrim	503a2594c3	[DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks. This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading. There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.) Differential Revision: http://reviews.llvm.org/D11518 llvm-svn: 243831	2015-08-01 10:01:46 +00:00
Igor Breger	074a64e72c	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 243122	2015-07-24 17:24:15 +00:00
Chandler Carruth	fe414353db	Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..." This commit broke the build. Numerous build bots broken, and it was blocking my progress so reverting. It should be trivial to reproduce -- enable the BPF backend and it should fail when running llvm-tblgen. llvm-svn: 242992	2015-07-23 08:03:44 +00:00
Igor Breger	da1b2ea955	AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 242990	2015-07-23 07:39:21 +00:00
Elena Demikhovsky	1b60ed7069	Masked gather and scatter intrinsics - enabled codegen for KNL. llvm-svn: 236394	2015-05-03 07:12:25 +00:00
Elena Demikhovsky	de05f10de2	AVX-512, SKX: Enabled masked_load/store operations for this target. Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors, it is needed to pass all masked_memop.ll tests for SKX. llvm-svn: 231371	2015-03-05 15:11:35 +00:00
Chandler Carruth	87e580a659	[x86] Teach the 128-bit vector shuffle lowering routines to take advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350	2015-02-16 01:52:02 +00:00
Elena Demikhovsky	150d9f3187	Fixed a bug in type legalizer for masked load/store intrinsics. The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. llvm-svn: 226808	2015-01-22 12:07:59 +00:00

1 2

58 Commits