llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	550cb7e82c	[X86][SSE] Dropped -mcpu from 256-bit vector shuffle tests Use triple and attribute only for consistency llvm-svn: 305916	2017-06-21 14:51:23 +00:00
Simon Pilgrim	5e39cbaee5	Fix shufpd test name. llvm-svn: 298381	2017-03-21 15:12:53 +00:00
Simon Pilgrim	8bda035121	[X86][AVX] Tests showing missing SHUFPD + ZERO lowering This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector. llvm-svn: 298370	2017-03-21 13:30:40 +00:00
Ayman Musa	ac5a2c43af	[X86][AVX512] Add missing entries to EVEX2VEX tables evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries. Differential Revision: https://reviews.llvm.org/D30501 llvm-svn: 297126	2017-03-07 08:05:53 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Craig Topper	09b7e0f01d	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Simon Pilgrim	79fb07066c	[X86][AVX] Bad v4f64/v4i64 '1z3z' shuffle test case This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector. llvm-svn: 291924	2017-01-13 18:23:47 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Craig Topper	5cb13062d2	[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709	2016-11-12 05:05:27 +00:00
Craig Topper	924c5ec472	[AVX-512] Add test cases to show missed opportunities for using VALIGND/Q to handle shuffles. llvm-svn: 286425	2016-11-10 03:39:19 +00:00
Simon Pilgrim	d3829c89bc	[X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3 llvm-svn: 284922	2016-10-22 20:15:39 +00:00
Craig Topper	dde865afb5	[AVX-512] Add shuffle comments for vbroadcast instructions. llvm-svn: 284305	2016-10-15 16:26:07 +00:00
Craig Topper	e7f2611160	[X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table. llvm-svn: 282687	2016-09-29 05:54:39 +00:00
Craig Topper	816a1d7783	[X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables. llvm-svn: 282684	2016-09-29 05:54:28 +00:00
Simon Pilgrim	f16cd361d4	[X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations llvm-svn: 278787	2016-08-16 10:03:23 +00:00
Craig Topper	05948fb36c	[AVX-512] Correct ExeDomain for many AVX-512 instructions. llvm-svn: 277416	2016-08-02 05:11:15 +00:00
Simon Pilgrim	ea0d4f9962	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied) As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276416	2016-07-22 13:58:44 +00:00
Benjamin Kramer	5ba0e20315	Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128" It caused PR28657. This reverts commit r276281. llvm-svn: 276405	2016-07-22 11:03:10 +00:00
Simon Pilgrim	c8e20b1150	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276281	2016-07-21 14:10:54 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Simon Pilgrim	5f71c909f0	[X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied) AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 274013	2016-06-28 13:24:05 +00:00
Simon Pilgrim	c15d217831	[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999	2016-06-28 08:08:15 +00:00
Simon Pilgrim	476e8ceed3	[X86][SSE] Added extra broadcast tests to cover PR28327 llvm-svn: 273891	2016-06-27 16:15:37 +00:00
Nico Weber	1e058160dd	Revert 273848, it caused PR28329 llvm-svn: 273879	2016-06-27 14:36:46 +00:00
Simon Pilgrim	a45da385f8	[X86][AVX] Peek through bitcasts to find the source of broadcasts AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 273848	2016-06-27 07:44:32 +00:00
Simon Pilgrim	32b1c9fe7f	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Simon Pilgrim	96fe4ef5f7	[X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding llvm-svn: 259496	2016-02-02 13:32:56 +00:00
Simon Pilgrim	20f31fa31a	[X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000	2016-01-16 22:30:20 +00:00
Simon Pilgrim	17377bdd45	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332	2015-12-23 13:10:07 +00:00
Simon Pilgrim	323e00d9c7	[X86][AVX] Fold loads + splats into broadcast instructions On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector. This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element. Fix for PR23022 Differential Revision: http://reviews.llvm.org/D15310 llvm-svn: 255061	2015-12-08 22:17:11 +00:00
James Y Knight	7c905063c5	Make utils/update_llc_test_checks.py note that the assertions are autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917	2015-11-23 21:33:58 +00:00
Igor Breger	1f78296869	AVX512: Implemented encoding, intrinsics and DAG lowering for VMOVDDUP instructions. Differential Revision: http://reviews.llvm.org/D14702 llvm-svn: 253548	2015-11-19 08:26:56 +00:00
Simon Pilgrim	e896f9f8c3	[X86][AVX] Added 256-bit shuffle splat tests. llvm-svn: 253449	2015-11-18 09:39:38 +00:00
Simon Pilgrim	2da4178737	[X86][AVX512] Added AVX512 SHUFP/VPERMILP shuffle decode comments. llvm-svn: 253396	2015-11-17 23:29:49 +00:00
Simon Pilgrim	8483df6e24	[X86][AVX512] Added support for AVX512 UNPCK shuffle decode comments. llvm-svn: 253391	2015-11-17 22:35:45 +00:00
Simon Pilgrim	6095410e09	[X86][SSE] Share AVX1/AVX2 shuffle tests with AVX512 where possible llvm-svn: 253379	2015-11-17 21:19:45 +00:00
Igor Breger	a8c9ec85ce	AVX512 : regenerate the test file against trunk. Differential Revision: http://reviews.llvm.org/D14742 llvm-svn: 253321	2015-11-17 08:03:43 +00:00
Igor Breger	78741a1b1e	AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12690 llvm-svn: 249261	2015-10-04 07:20:41 +00:00
Ahmed Bougacha	69a17acb74	[X86] Add some broadcast-from-memory tests. llvm-svn: 245612	2015-08-20 20:59:41 +00:00
Simon Pilgrim	989cbbd2f5	[DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE. Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle. Differential Revision: http://reviews.llvm.org/D12125 llvm-svn: 245490	2015-08-19 20:09:50 +00:00
Simon Pilgrim	ce30ae62e2	[X86][AVX] Added shuffle concatenation tests llvm-svn: 245351	2015-08-18 20:51:15 +00:00
Ahmed Bougacha	dd5da3e7ed	[X86] Don't generate vbroadcasti128 for v4i64 splats from memory. We used to erroneously match: (v4i64 shuffle (v2i64 load), <0,0,0,0>) Whereas vbroadcasti128 is more like: (v4i64 shuffle (v2i64 load), <0,1,0,1>) This problem doesn't exist for vbroadcastf128, which kept matching the intrinsic after r231182. We should perhaps re-introduce the intrinsic here as well, but that's a separate issue still being discussed. While there, add some proper vbroadcastf128 tests. We don't currently match those, like for loading vbroadcastsd/ss on AVX (the reg-reg broadcasts where added in AVX2). Fixes PR23886. llvm-svn: 240488	2015-06-24 00:07:16 +00:00
Ahmed Bougacha	89ae9a1e28	[X86] update_llc_test_checks vector-shuffle-*. NFC. Some of them had gone stale. llvm-svn: 240485	2015-06-24 00:03:48 +00:00
Matthias Braun	165d467125	MachineCopyPropagation: Remove the copies instead of using KILL instructions. For some history here see the commit messages of r199797 and r169060. The original intent was to fix cases like: %EAX<def> = COPY %ECX<kill>, %RAX<imp-def> %RCX<def> = COPY %RAX<kill> where simply removing the copies would have RCX undefined as in terms of machine operands only the ECX part of it is defined. The machine verifier would complain about this so 169060 changed such COPY instructions into KILL instructions so some super-register imp-defs would be preserved. In r199797 it was finally decided to always do this regardless of super-register defs. But this is wrong, consider: R1 = COPY R0 ... R0 = COPY R1 getting changed to: R1 = KILL R0 ... R0 = KILL R1 It now looks like R0 dies at the first KILL and won't be alive until the second KILL, while in reality R0 is alive and must not change in this part of the program. As this only happens after register allocation there is not much code still performing liveness queries so the issue was not noticed. In fact I didn't manage to create a testcase for this, without unrelated changes I am working on at the moment. The fix is simple: As of r223896 the MachineVerifier allows reads from partially defined registers, so the whole transforming COPY->KILL thing is not necessary anymore. This patch also changes a similar (but more benign case as the def and src are the same register) case in the VirtRegRewriter. Differential Revision: http://reviews.llvm.org/D10117 llvm-svn: 238588	2015-05-29 18:19:25 +00:00
Sanjay Patel	2bb5d695f9	[X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073) For code like this: define <8 x i32> @load_v8i32() { ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0> } We produce this AVX code: _load_v8i32: ## @load_v8i32 movl $7, %eax vmovd %eax, %xmm0 vxorps %ymm1, %ymm1, %ymm1 vblendps $1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7] retq There are at least 2 bugs in play here: We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs). We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd. The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem. [1] AddedComplexity has close to no documentation in the source. The best we have is this comment: "roughly corresponds to the number of nodes that are covered". It appears that x86 has bastardized this definition by inflating its values for some other undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). I searched my way to this page: https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ Differential Revision: http://reviews.llvm.org/D8794 llvm-svn: 233931	2015-04-02 17:56:17 +00:00
Sanjay Patel	30d589536a	[X86, AVX] fix zero-extending integer operand load patterns to use integer instructions This is a follow-on to r233704 and another partial fix for PR22685: https://llvm.org/bugs/show_bug.cgi?id=22685 llvm-svn: 233724	2015-03-31 18:43:43 +00:00
Sanjay Patel	2ae9943881	[X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354) It improves the v4i64 case although not optimally. This AVX codegen: vmovq {{.#+}} xmm0 = mem[0],zero vxorpd %ymm1, %ymm1, %ymm1 vblendpd {{.#+}} ymm0 = ymm0[0],ymm1[1,2,3] Becomes: vmovsd {{.*#+}} xmm0 = mem[0],zero Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here: We're not handling v32i8 / v16i16. We're not getting the FP / int domains right for instruction selection. But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, I'm submitting this patch ahead of fixing the above. Differential Revision: http://reviews.llvm.org/D8341 llvm-svn: 233704	2015-03-31 16:32:11 +00:00
Simon Pilgrim	7189084bef	[DagCombiner] Allow shuffles to merge through bitcasts Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening). This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead. Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow. Differential Revision: http://reviews.llvm.org/D7939 llvm-svn: 231380	2015-03-05 17:14:04 +00:00

1 2 3

103 Commits