llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Craig Topper	5cb13062d2	[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709	2016-11-12 05:05:27 +00:00
Craig Topper	924c5ec472	[AVX-512] Add test cases to show missed opportunities for using VALIGND/Q to handle shuffles. llvm-svn: 286425	2016-11-10 03:39:19 +00:00
Simon Pilgrim	d3829c89bc	[X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3 llvm-svn: 284922	2016-10-22 20:15:39 +00:00
Craig Topper	43973154dd	[AVX-512] Fix execution domain for EVEX encoded VINSERTPS. llvm-svn: 283692	2016-10-09 06:41:47 +00:00
Craig Topper	8aca90507f	[AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests. llvm-svn: 283014	2016-10-01 06:01:18 +00:00
Simon Pilgrim	2683ad54ad	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. llvm-svn: 275543	2016-07-15 09:49:12 +00:00
Simon Pilgrim	420b266d0a	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied) This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497	2016-07-14 23:05:09 +00:00
Nico Weber	3afaf16abc	Revert r275411, it cause PR28552. llvm-svn: 275421	2016-07-14 14:49:35 +00:00
Simon Pilgrim	3ecb6bdd5f	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411	2016-07-14 13:28:43 +00:00
Simon Pilgrim	c15d217831	[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999	2016-06-28 08:08:15 +00:00
Craig Topper	d788498411	[X86] No need to avoid selecting AVX_SET0 for 256-bit integer types when only AVX1 is supported. AVX_SET0 just expands to 256-bit VXORPS which is legal in AVX1. llvm-svn: 268871	2016-05-08 07:10:47 +00:00
Simon Pilgrim	32b1c9fe7f	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Simon Pilgrim	834931554b	[X86][AVX] Fixed copy+paste typo in shuffle test llvm-svn: 260852	2016-02-14 18:11:52 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Simon Pilgrim	a3d674470c	[X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding. llvm-svn: 260034	2016-02-07 15:39:22 +00:00
Simon Pilgrim	3e5fb61978	[X86][AVX2] Broadcast subvectors AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081	2016-01-18 20:59:04 +00:00
Simon Pilgrim	20f31fa31a	[X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000	2016-01-16 22:30:20 +00:00
Simon Pilgrim	17377bdd45	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332	2015-12-23 13:10:07 +00:00
James Y Knight	7c905063c5	Make utils/update_llc_test_checks.py note that the assertions are autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917	2015-11-23 21:33:58 +00:00
Simon Pilgrim	e896f9f8c3	[X86][AVX] Added 256-bit shuffle splat tests. llvm-svn: 253449	2015-11-18 09:39:38 +00:00
Simon Pilgrim	2a7049abe0	[DAGCombiner] Fold CONCAT_VECTORS of bitcasted EXTRACT_SUBVECTOR Minor generalization of D12125 - peek through any bitcast to the original vector that we're extracting from. llvm-svn: 245814	2015-08-23 15:22:14 +00:00
Ahmed Bougacha	69a17acb74	[X86] Add some broadcast-from-memory tests. llvm-svn: 245612	2015-08-20 20:59:41 +00:00
Simon Pilgrim	989cbbd2f5	[DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE. Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle. Differential Revision: http://reviews.llvm.org/D12125 llvm-svn: 245490	2015-08-19 20:09:50 +00:00
Simon Pilgrim	ce30ae62e2	[X86][AVX] Added shuffle concatenation tests llvm-svn: 245351	2015-08-18 20:51:15 +00:00
Ahmed Bougacha	89ae9a1e28	[X86] update_llc_test_checks vector-shuffle-*. NFC. Some of them had gone stale. llvm-svn: 240485	2015-06-24 00:03:48 +00:00
Sanjay Patel	2bb5d695f9	[X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073) For code like this: define <8 x i32> @load_v8i32() { ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0> } We produce this AVX code: _load_v8i32: ## @load_v8i32 movl $7, %eax vmovd %eax, %xmm0 vxorps %ymm1, %ymm1, %ymm1 vblendps $1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7] retq There are at least 2 bugs in play here: We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs). We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd. The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem. [1] AddedComplexity has close to no documentation in the source. The best we have is this comment: "roughly corresponds to the number of nodes that are covered". It appears that x86 has bastardized this definition by inflating its values for some other undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). I searched my way to this page: https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ Differential Revision: http://reviews.llvm.org/D8794 llvm-svn: 233931	2015-04-02 17:56:17 +00:00
Sanjay Patel	30d589536a	[X86, AVX] fix zero-extending integer operand load patterns to use integer instructions This is a follow-on to r233704 and another partial fix for PR22685: https://llvm.org/bugs/show_bug.cgi?id=22685 llvm-svn: 233724	2015-03-31 18:43:43 +00:00
Sanjay Patel	2ae9943881	[X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354) It improves the v4i64 case although not optimally. This AVX codegen: vmovq {{.#+}} xmm0 = mem[0],zero vxorpd %ymm1, %ymm1, %ymm1 vblendpd {{.#+}} ymm0 = ymm0[0],ymm1[1,2,3] Becomes: vmovsd {{.*#+}} xmm0 = mem[0],zero Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here: We're not handling v32i8 / v16i16. We're not getting the FP / int domains right for instruction selection. But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, I'm submitting this patch ahead of fixing the above. Differential Revision: http://reviews.llvm.org/D8341 llvm-svn: 233704	2015-03-31 16:32:11 +00:00
Craig Topper	0ee8470a43	[X86] Use vmovss to handle inserting an element into index 0 of a v8f32 vector of zeros. llvm-svn: 231354	2015-03-05 06:38:42 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Chandler Carruth	eb206aa1ea	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Simon Pilgrim	1d89a02abb	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 llvm-svn: 229571	2015-02-17 22:24:32 +00:00
Craig Topper	7e8dcef094	[X86] Add support for lowering shuffles to 256-bit PALIGNR instruction. llvm-svn: 229359	2015-02-16 06:29:06 +00:00
Craig Topper	b2b4f8a721	[X86] Remove some hard tab characters from tests. llvm-svn: 229358	2015-02-16 06:29:02 +00:00
Simon Pilgrim	d4ed5df3a6	Added (still inefficient) shuffle test case for PR21138 llvm-svn: 229321	2015-02-15 18:21:39 +00:00
Simon Pilgrim	5a6375c3ba	Added some test cases of missed opportunities to use unpckl/unpckh shuffles llvm-svn: 229313	2015-02-15 15:07:45 +00:00
Simon Pilgrim	00bd79d794	[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions. Differential Revision: http://reviews.llvm.org/D7596 llvm-svn: 229311	2015-02-15 13:19:52 +00:00
Chandler Carruth	bf0fb06e0d	[x86] Teach the decomposed shuffle/blend lowering to use an early blend when that will allow it to lower with a single permute instead of multiple permutes. It tries to detect when it will only have to do a single permute in either case to maximize folding of loads and such. This cuts a lot of the avx2 shuffle permute counts in half. =] llvm-svn: 229309	2015-02-15 12:42:15 +00:00
Chandler Carruth	1b5285dd57	[SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats directly into blends of the splats. These patterns show up even very late in the vector shuffle lowering where we don't have any chance for DAG combining to kick in, and blending is a tremendously simpler operation to model. By coercing the shuffle into a blend we can much more easily match and lower shuffles of splats. Immediately with this change there are significantly more blends being matched in the x86 vector shuffle lowering. llvm-svn: 229308	2015-02-15 12:18:12 +00:00
Chandler Carruth	62558c1d4d	[x86] When splitting 256-bit vectors into 128-bit vectors, don't extract subvectors from buildvectors. That doesn't really make any sense and it breaks all of the down-stream matching of buildvectors to cleverly lower shuffles. With this, we now get the shift-based lowering of 256-bit vector shuffles with AVX1 when we split them into 128-bit vectors. We also do much better on the zero-extension patterns, although there remains quite a bit of room for improvement here. llvm-svn: 229299	2015-02-15 10:12:02 +00:00
Chandler Carruth	0ddfe0c7c5	[x86] Add a slight variation on some of the other generic shuffle lowerings -- one which decomposes into an initial blend followed by a permute. Particularly on newer chips, blends are handled independently of shuffles and so this is much less bottlenecked on the single port that floating point shuffles are executed with on Intel. I'll be adding this lowering to a bunch of other code paths in subsequent commits to handle still more places where we can effectively leverage blends when they're available in the ISA. llvm-svn: 229292	2015-02-15 08:26:30 +00:00
Chandler Carruth	fe69608839	[x86] Switch a collection of tests explicitly to the new vector shuffle legality test (essentially, everything is legal). I'm planning to make this the default shortly, but I'd like to fix a collection of the bugs it exposes first, and this will let me easily test them. It also showcases both the improvements and a few of the regressions triggered by the change. The biggest improvements by far are the significantly reduced shuffling and domain crossing in the combining test case. The biggest regressions are missing some clever blending patterns. llvm-svn: 229284	2015-02-15 06:37:21 +00:00
Chandler Carruth	89a60770e0	[x86] Remove the now-default-on flag for the new vector shuffle lowering strategy from a bunch of tests. llvm-svn: 229283	2015-02-15 06:20:51 +00:00
Chandler Carruth	bb525e336b	[x86] Mechanically update a bunch of tests' check lines using the latest version of the script. Changes include: - Using the VEX prefix - Skipping more detail when we have useful shuffle comments to match - Matching more shuffle comments that have been added to the printer (yay!) - Matching the destination registers of some AVX instructions - Stripping trailing whitespace that crept in - Fixing indentation issues Nothing interesting going on here. I'm just trying really hard to ensure these changes don't show up in the diffs with actual changes to the backend. llvm-svn: 228132	2015-02-04 10:46:53 +00:00
Simon Pilgrim	46cd4f7400	[X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2 Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699. Differential Revision: http://reviews.llvm.org/D6649 llvm-svn: 228047	2015-02-03 21:58:29 +00:00
Simon Pilgrim	6544f815b3	[X86][AVX2] Enabled shuffle matching for the AVX2 zero extension (128bit -> 256bit) vpmovzx* instructions. Differential Revision: http://reviews.llvm.org/D7251 llvm-svn: 228014	2015-02-03 19:34:09 +00:00
Simon Pilgrim	9c76b47469	[X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd). Also adds shuffle mask decodes for integer loads (movd/movq). Differential Revision: http://reviews.llvm.org/D7228 llvm-svn: 227688	2015-01-31 14:09:36 +00:00
Simon Pilgrim	106abe47d6	Line endings fix. NFC. llvm-svn: 227138	2015-01-26 21:28:32 +00:00
Simon Pilgrim	b16b09b154	[X86][SSE] Added support for SSE3 lane duplication shuffle instructions This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716	2015-01-21 22:44:35 +00:00

1 2

80 Commits