llvm-project

Commit Graph

Author	SHA1	Message	Date
Dinar Temirbulatov	a0beedef1c	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926	2017-08-03 08:50:18 +00:00
Craig Topper	410d252f5b	[AVX-512] Add unmasked subvector inserts and extract to the execution domain tables. llvm-svn: 309632	2017-07-31 22:07:29 +00:00
Zvi Rackover	da3943d600	[X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC llvm-svn: 306646	2017-06-29 06:22:01 +00:00
Sanjay Patel	33f4a97287	[DAGCombiner] use narrow load to avoid vector extract If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector). This is intentionally conservative. Follow-ups may loosen the one-use constraint to account for the extract cost or just remove the one-use check. The memop chain updating is based on code that already exists multiple times in x86 lowering, so that should be pulled into a helper function as a follow-up. Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137). Differential Revision: https://reviews.llvm.org/D33578 llvm-svn: 304072	2017-05-27 14:07:03 +00:00
Ahmed Bougacha	ec8b1fb539	[X86] Relax assert in broadcast-of-subvector lowering. Before r294774, there was a problem when lowering broadcasts to use 128-bit subvectors. When we looked through a bitcast to find the broadcast input, we'd keep using the original type, so you'd end up with things like: (v8f32 (broadcast (v4f32 (extract_subvector (v8i32 V), ...)) )) r294774 fixed it to always emit subvectors with the scalar type of the original source. It also introduced some asserts, to check that we use scalars with the same size, and vectors with the same number of elements. The scalar size equality is checked earlier when looking through bitcasts, and is a useful assert. However, the number of elements don't have to be identical: we're always going to extract a 128-bit subvector, and we can have different size inputs if we looked through a concat_vector to find a 256-bit source. Relax the overzealous assert. Replace it with a check of the original source vector being 256 or 512 bits. If it's 128 bits, we can't extract_subvector from it. Fixes PR32371. llvm-svn: 299490	2017-04-05 00:14:39 +00:00
Craig Topper	058f2f6d72	[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register. This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8. I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa. Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition. This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI. Differential Revision: https://reviews.llvm.org/D30968 llvm-svn: 298928	2017-03-28 16:35:29 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Craig Topper	b8e92f775d	[AVX-512] Add test cases that show where we are using two subvector inserts to broadcast a 128-bit subvector into a 512-bit vector. We'd be better off using something like SHUFF32X4. If the subvector comes from a load, we convert to SUBV_BROADCAST and use a broadcast instruction. But if there is no load we keep the inserts. I think we should create the SUBV_BROADCAST even without the load and let isel use the fallback patterns that are used if the load can't be folded. This will use the SHUFF32X4 or similar instruction for the 128-bit into 512-bit case and a single insert for 128 into 256 or 256 into 512. This should be fixed so subvector broadcast intrinsics can be replaced with native IR since some of those currently lower directly to SHUFF32X4. llvm-svn: 292475	2017-01-19 07:37:45 +00:00
Michael Zuckerman	6baa3838e9	Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE. llvm-svn: 292066	2017-01-15 16:43:14 +00:00
Craig Topper	63e2cd6caa	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Michael Zuckerman	558a4d8419	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Craig Topper	d0aa53b9ae	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946	2017-01-04 07:32:03 +00:00
Craig Topper	a3b9a4edd5	[AVX-512] Add more test cases for shuffles that should be handled with subvector insert instructions. llvm-svn: 290945	2017-01-04 07:31:59 +00:00
Craig Topper	9e065c5b5c	[AVX-512] Fix a typo in a couple case names to match their behavior. llvm-svn: 290944	2017-01-04 07:31:57 +00:00
Craig Topper	42e8e33ccd	[AVX-512] Add avx512dq to the vector-shuffle-512-v16.ll test command lines in preparation for a future change that needs these features. llvm-svn: 290943	2017-01-04 07:31:54 +00:00
Simon Pilgrim	9519bd9232	[X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946	2016-12-16 14:30:04 +00:00
Simon Pilgrim	224416a9e4	[X86][AVX512] Add tests showing missed opportunity to efficiently lower v16i32 to VSHUFPS (PR27885) llvm-svn: 289945	2016-12-16 14:21:57 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Craig Topper	88071b37ab	[AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939	2016-11-25 16:48:05 +00:00
Craig Topper	00758090ca	[AVX-512] Add tests demonstrating failure to generated masked instructions for VSHUFF32x4 and VSHUFI32x4 due to shuffle lowering widening elements. llvm-svn: 287897	2016-11-24 18:24:46 +00:00
Craig Topper	993c7416d3	[AVX-512] Move a 16 x float shuffle test to the v16 test file and add an integer variant. llvm-svn: 287853	2016-11-24 05:36:47 +00:00
Craig Topper	da22267055	[AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type Summary: Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking. This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018. I plan to add support for more operations in future patches. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26902 llvm-svn: 287612	2016-11-22 03:51:53 +00:00
Craig Topper	85a1f5c20c	[AVX-512] Add tests for masked palignr/valignd/valignq shuffles, many of which show failures to fold the masking into the operation. Many of these problems are because shuffle lowering widens element size and reduces element count when possible. This causes the shuffle to become separated from the select by a bitcast. Future patches will work to improve these cases by rewriting the shuffle back to a narrow element type if we think it can result in folding the mask. llvm-svn: 287503	2016-11-20 19:50:32 +00:00
Craig Topper	0637099f24	[AVX-512] Add an example test case for PR31018. llvm-svn: 286934	2016-11-15 05:21:55 +00:00
Craig Topper	5cb13062d2	[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709	2016-11-12 05:05:27 +00:00
Craig Topper	924c5ec472	[AVX-512] Add test cases to show missed opportunities for using VALIGND/Q to handle shuffles. llvm-svn: 286425	2016-11-10 03:39:19 +00:00
Craig Topper	4729fe8bb6	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS. llvm-svn: 284328	2016-10-16 04:54:31 +00:00
Craig Topper	3d41f91f61	[AVX-512] Fix v16i32 zero extending shuffle test case so it's really zero extend. llvm-svn: 284106	2016-10-13 05:41:01 +00:00
Craig Topper	05242739c2	[AVX-512] Add tests for basic 512-bit zero extending shuffle patterns. Code will be improved in a future commit. llvm-svn: 284104	2016-10-13 05:29:37 +00:00
Craig Topper	e7f2611160	[X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table. llvm-svn: 282687	2016-09-29 05:54:39 +00:00
Craig Topper	600685d510	[AVX-512] Add patterns to support VZEXT_MOVL from 512-bit vectors with 64-bit and 32-bit elements. Fixes PR28961. llvm-svn: 278592	2016-08-13 05:33:12 +00:00
Craig Topper	f44423120f	[AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd. llvm-svn: 277965	2016-08-07 21:52:59 +00:00
Craig Topper	05948fb36c	[AVX-512] Correct ExeDomain for many AVX-512 instructions. llvm-svn: 277416	2016-08-02 05:11:15 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	0b0954570a	[AVX512] Add support for lowering to 512-bit SHUFPS. llvm-svn: 275011	2016-07-10 05:55:53 +00:00
Simon Pilgrim	129b720c18	[X86][AVX512] Add support for lowering shuffles to VPERMILPS llvm-svn: 274458	2016-07-03 12:47:21 +00:00
Simon Pilgrim	f040d8c061	[X86][AVX512] Add support for lowering shuffles to MOVDDUP/MOVSLDUP/MOVSHDUP llvm-svn: 274436	2016-07-02 12:45:03 +00:00
Simon Pilgrim	5e95390957	[X86][AVX512] Add test cases that should lower to MOVSLDUP/MOVSHDUP llvm-svn: 274435	2016-07-02 12:20:35 +00:00
Craig Topper	504fba5c8a	[AVX512] Lower v8i64 and v16i32 to pshufd when possible. llvm-svn: 272473	2016-06-11 13:43:21 +00:00
Simon Pilgrim	47c76e201a	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR llvm-svn: 272307	2016-06-09 20:53:12 +00:00
Simon Pilgrim	3e5fb61978	[X86][AVX2] Broadcast subvectors AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081	2016-01-18 20:59:04 +00:00
James Y Knight	7c905063c5	Make utils/update_llc_test_checks.py note that the assertions are autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917	2015-11-23 21:33:58 +00:00
Simon Pilgrim	df993479c9	[X86][AVX512] Fixed shuffle test name to match shuffle llvm-svn: 251984	2015-11-03 21:39:30 +00:00
Simon Pilgrim	94c4943562	[X86][AVX512] Test UNPCK with non-sequential scalars Missing tests for r251297 llvm-svn: 251453	2015-10-27 21:18:45 +00:00
Igor Breger	684af8156c	AVX-512: Use correct extract vector length. Bug https://llvm.org/bugs/show_bug.cgi?id=25318 Differential Revision: http://reviews.llvm.org/D14062 llvm-svn: 251285	2015-10-26 12:26:34 +00:00
Elena Demikhovsky	e88038f235	AVX-512: Lowering for 512-bit vector shuffles. Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981	2015-09-08 06:38:21 +00:00
Chandler Carruth	eb206aa1ea	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Chandler Carruth	fe69608839	[x86] Switch a collection of tests explicitly to the new vector shuffle legality test (essentially, everything is legal). I'm planning to make this the default shortly, but I'd like to fix a collection of the bugs it exposes first, and this will let me easily test them. It also showcases both the improvements and a few of the regressions triggered by the change. The biggest improvements by far are the significantly reduced shuffling and domain crossing in the combining test case. The biggest regressions are missing some clever blending patterns. llvm-svn: 229284	2015-02-15 06:37:21 +00:00
Chandler Carruth	89a60770e0	[x86] Remove the now-default-on flag for the new vector shuffle lowering strategy from a bunch of tests. llvm-svn: 229283	2015-02-15 06:20:51 +00:00
Adam Nemet	d23c88db15	[AVX512] Add 16x32 unpck tests as well Forgot this from r225838. llvm-svn: 225850	2015-01-13 23:27:55 +00:00

50 Commits