llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	82e54871d0	[DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well. The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles. I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result. Differential Revision: http://reviews.llvm.org/D18944 llvm-svn: 265998	2016-04-11 21:10:33 +00:00
Simon Pilgrim	fba9352f31	[X86][SSE] Added bitmask pattern shuffle tests Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns llvm-svn: 265697	2016-04-07 17:23:55 +00:00
Simon Pilgrim	a3d674470c	[X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding. llvm-svn: 260034	2016-02-07 15:39:22 +00:00
Simon Pilgrim	672808a853	[X86][SSE] Add tests for MOVHLPS/MOVLHPS shuffle lowering. As raised in PR26491, we don't make use of these instructions at the moment. llvm-svn: 260008	2016-02-06 20:11:52 +00:00
Simon Pilgrim	e74653b67a	[X86][SSE] Add INSERTPS target shuffle combines. As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205	2016-01-19 22:24:12 +00:00
Simon Pilgrim	d17a1df783	[X86][SSE] Added v4f32 shuffle with zero tests This is mainly test cases for improvements to insertps matching, but pre-SSE41 shuffles could be improved as well llvm-svn: 256705	2016-01-03 17:02:56 +00:00
James Y Knight	7c905063c5	Make utils/update_llc_test_checks.py note that the assertions are autogenerated. Also update existing test cases which appear to be generated by it and weren't modified (other than addition of the header) by rerunning it. llvm-svn: 253917	2015-11-23 21:33:58 +00:00
Simon Pilgrim	ca56a72af9	[X86][SSE] Shuffle blends with zero This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary. Differential Revision: http://reviews.llvm.org/D14050 llvm-svn: 251659	2015-10-29 22:11:28 +00:00
Simon Pilgrim	be187a0a1a	[X86][SSE] Added tests for shuffling through bitcasts. llvm-svn: 251236	2015-10-25 15:32:04 +00:00
Ahmed Bougacha	69a17acb74	[X86] Add some broadcast-from-memory tests. llvm-svn: 245612	2015-08-20 20:59:41 +00:00
Simon Pilgrim	7189084bef	[DagCombiner] Allow shuffles to merge through bitcasts Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening). This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead. Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow. Differential Revision: http://reviews.llvm.org/D7939 llvm-svn: 231380	2015-03-05 17:14:04 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Chandler Carruth	eb206aa1ea	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Chandler Carruth	0b39536390	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. llvm-svn: 229856	2015-02-19 15:06:13 +00:00
Chandler Carruth	c8e6877065	[x86] Merge checks for a recently added test case that is the same on all SSE variants and AVX variants. llvm-svn: 229770	2015-02-18 23:20:49 +00:00
Simon Pilgrim	1d89a02abb	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 llvm-svn: 229571	2015-02-17 22:24:32 +00:00
Chandler Carruth	87e580a659	[x86] Teach the 128-bit vector shuffle lowering routines to take advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350	2015-02-16 01:52:02 +00:00
Simon Pilgrim	5a6375c3ba	Added some test cases of missed opportunities to use unpckl/unpckh shuffles llvm-svn: 229313	2015-02-15 15:07:45 +00:00
Chandler Carruth	1b5285dd57	[SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats directly into blends of the splats. These patterns show up even very late in the vector shuffle lowering where we don't have any chance for DAG combining to kick in, and blending is a tremendously simpler operation to model. By coercing the shuffle into a blend we can much more easily match and lower shuffles of splats. Immediately with this change there are significantly more blends being matched in the x86 vector shuffle lowering. llvm-svn: 229308	2015-02-15 12:18:12 +00:00
Chandler Carruth	fe69608839	[x86] Switch a collection of tests explicitly to the new vector shuffle legality test (essentially, everything is legal). I'm planning to make this the default shortly, but I'd like to fix a collection of the bugs it exposes first, and this will let me easily test them. It also showcases both the improvements and a few of the regressions triggered by the change. The biggest improvements by far are the significantly reduced shuffling and domain crossing in the combining test case. The biggest regressions are missing some clever blending patterns. llvm-svn: 229284	2015-02-15 06:37:21 +00:00
Chandler Carruth	89a60770e0	[x86] Remove the now-default-on flag for the new vector shuffle lowering strategy from a bunch of tests. llvm-svn: 229283	2015-02-15 06:20:51 +00:00
Chandler Carruth	4d31f58c88	[x86] Give movss and movsd execution domains in the x86 backend. This associates movss and movsd with the packed single and packed double execution domains (resp.). While this is largely cosmetic, as we now don't have weird ping-pong-ing between single and double precision, it is also useful because it avoids the domain fixing algorithm from seeing domain breaks that don't actually exist. It will also be much more important if we have an execution domain default other than packed single, as that would cause us to mix movss and movsd with integer vector code on a regular basis, a very bad mixture. llvm-svn: 228135	2015-02-04 10:58:53 +00:00
Chandler Carruth	024cf8efd7	[x86] Start to introduce bit-masking based blend lowering. This is the simplest form of bit-math based blending which only fires when we are blending with zero and is relatively profitable. I've only enabled this path on very specific lowering strategies. I'm planning to widen its applicability in subsequent patches, but so far you'll notice that even though we get fewer shufps instructions, we still do the bit math in the FP execution port. I'm looking into why this is still happening. llvm-svn: 228124	2015-02-04 09:06:05 +00:00
Chandler Carruth	872d80e7a4	[x86] Add tests for blends-with-zero on 4-element vectors. llvm-svn: 228122	2015-02-04 09:05:58 +00:00
Chandler Carruth	abd09a1f35	[x86] Refresh the checks of a number of tests using update_llc_test_checks.py. The exact format of the checks has changed over time. This includes different indenting rules, new shuffle comments that have been added, and more operand hiding behind regular expressions. No functional change to the tests are expected here, but this will make subsequent patches have a clean diff as they change shuffle lowering. llvm-svn: 228097	2015-02-04 00:58:42 +00:00
Simon Pilgrim	46cd4f7400	[X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2 Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699. Differential Revision: http://reviews.llvm.org/D6649 llvm-svn: 228047	2015-02-03 21:58:29 +00:00
Simon Pilgrim	d9885856e6	[X86][SSE] Added general integer shuffle matching for MOVQ instruction This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend. llvm-svn: 228022	2015-02-03 20:09:18 +00:00
Simon Pilgrim	9c76b47469	[X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd). Also adds shuffle mask decodes for integer loads (movd/movq). Differential Revision: http://reviews.llvm.org/D7228 llvm-svn: 227688	2015-01-31 14:09:36 +00:00
Chandler Carruth	1d7d7aa1f5	[x86] Clean up the shift lowering vector shuffle tests a bit using my script. Notably this folds all the SSE cases together into a single FileCheck block. It also adds a vex prefix. llvm-svn: 223610	2014-12-07 17:15:53 +00:00
Simon Pilgrim	6b988ad8f2	[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets 4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165	2014-12-02 22:31:23 +00:00
Simon Pilgrim	371417db34	[X86][SSE] Improvements to byte shift shuffle matching Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it. Differential Revision: http://reviews.llvm.org/D6409 llvm-svn: 222796	2014-11-25 22:34:59 +00:00
Craig Topper	12f0d9ef2c	Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves. llvm-svn: 221336	2014-11-05 06:43:02 +00:00
Chandler Carruth	0adda1e4d4	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Chandler Carruth	971a560cb8	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	080cab91e1	[x86] Add some important, missing test coverage for blending from one vector to a zero vector for the v2 cases and fix the v4 integer cases to actually blend from a vector. There are already seprate tests for the case of inserting from a scalar. These cases cover a lot of the regressions I've seen in the regression test suite for the new vector shuffle lowering and specifically cover the reported lack of using various zext-ing instruction patterns. My next patch should fix a big chunk of this, but wanted to get a nice baseline for these patterns in the test cases first. llvm-svn: 218976	2014-10-03 11:16:45 +00:00
Chandler Carruth	75e182b414	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Chandler Carruth	846baf2ca1	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733	2014-10-01 02:25:54 +00:00
Chandler Carruth	bebedbaf36	[x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle test cases. While clearly we don't need the AVX vector width, these ISA extensions often cause us to select different instructions and we should cover them even with the narrow vector width. Also, while here, nuke the stress_test2 contents. There is no reason to try to FileCheck this entire body when it is mostly a test for successfully surviving the code generator. llvm-svn: 218710	2014-09-30 22:16:23 +00:00
Chandler Carruth	6a62cd3538	[x86] Rework all of the 128-bit vector shuffle tests with my handy test updating script so that they are more thorough and consistent. Specific fixes here include: - Actually test VEX-encoded AVX mnemonics. - Actually use an SSE 4.1 run to test SSE 4.1 features! - Correctly check instructions sequences from the start of the function. - Elide the shuffle operands and comment designator in a consistent way. - Test all of the architectures instead of just the ones I was motivated to manually author. I've gone back through and fixed up any egregious issues I spotted. Let me know if I missed something you really dislike. One downside to this is that we're now not as diligently using FileCheck variables for registers. I would be much more concerned with this if we had larger register usage, but there just aren't that interesting of register choices here and most of the registers are constrained by the ABI. Ultimately, I don't think this is likely to be the maintenance burden for these tests and updating them again should be staright forward. llvm-svn: 218707	2014-09-30 21:44:34 +00:00
Chandler Carruth	6f80abac4e	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. llvm-svn: 218190	2014-09-20 20:52:07 +00:00
Chandler Carruth	78a761ce8c	[x86] Start moving to a fancier check syntax to reduce the need for duplication of check lines. The idea is to have broad sets of compilation modes that will frequently diverge without having to always and immediately explode to the precise ISA feature set. While this already helps due to VEX encoded differences, it will help much more as I teach the new shuffle lowering about more of the new VEX encoded instructions which can still be used to implement 128-bit shuffles. llvm-svn: 218188	2014-09-20 18:36:39 +00:00
Chandler Carruth	8c4cccd4aa	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? llvm-svn: 218177	2014-09-20 04:15:22 +00:00
Chandler Carruth	0fc0c22fa9	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. llvm-svn: 218143	2014-09-19 20:00:32 +00:00
Chandler Carruth	2e275142cd	[x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32 shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. llvm-svn: 218112	2014-09-19 08:37:44 +00:00
Chandler Carruth	9057fcaf82	[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate. There is no purpose in using it for single-input shuffles as pshufd is just as fast and doesn't tie the two operands. This removes a substantial amount of wrong-domain blend operations in SSSE3 mode. It also completes the usage of PALIGNR for integer shuffles and addresses one of the test cases Quentin hit with the new vector shuffle lowering. There is still the question of whether and when to use this for floating point shuffles. It is faster than shufps or shufpd but in the integer domain. I don't yet really have a good heuristic here for when to use this instruction for floating point vectors. llvm-svn: 218038	2014-09-18 09:00:25 +00:00
Chandler Carruth	e0d77ef053	[x86] Add an SSSE3 run to the v4 shuffle test. llvm-svn: 218028	2014-09-18 04:38:32 +00:00
Chandler Carruth	00b1e0fc9d	[x86] Add an explicit SSE3 run to this test and flesh out a bunch of missing specific checks. While there is a lot of redundancy here where all-but-one mode use the same code generation, I'd rather have each variant spelled out and checked so that readers aren't misled by an omission in the test suite. llvm-svn: 217765	2014-09-15 11:40:20 +00:00
Chandler Carruth	12d4a70cbd	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761	2014-09-15 11:26:25 +00:00
Chandler Carruth	41a25dd7ef	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758	2014-09-15 11:15:23 +00:00
Chandler Carruth	0a98790b32	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD. These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744	2014-09-14 23:43:33 +00:00

1 2

60 Commits