llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b70ca5060f	[X86] Teach LowerBUILD_VECTOR to recognize pair-wise splats of 32-bit elements and use a 64-bit broadcast If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done. We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement. I've also restricted this to AVX2 only because we can only broadcast loads under AVX. Differential Revision: https://reviews.llvm.org/D42086 llvm-svn: 322730	2018-01-17 18:58:22 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Craig Topper	bb0e316dc7	[X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load. We already have these patterns for AVX512VL, but not AVX1 or 2. llvm-svn: 315382	2017-10-10 22:40:31 +00:00
Reid Kleckner	ab23dace56	[MC] Suppress .Lcfi labels when emitting textual assembly Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - \| llvm-mc \| llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259	2017-10-10 00:57:36 +00:00
Simon Pilgrim	cf99d069c3	[X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF llvm-svn: 314792	2017-10-03 12:41:39 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Simon Pilgrim	037f6d10d5	Regenerate test. NFCI. llvm-svn: 314679	2017-10-02 15:16:30 +00:00
Nikolai Bozhenov	84af99b3b1	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Dinar Temirbulatov	a0beedef1c	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926	2017-08-03 08:50:18 +00:00
Dinar Temirbulatov	aead31a36f	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298	2017-07-27 17:47:01 +00:00
Simon Pilgrim	c402839c72	[X86][AVX2] Regenerated and cleaned up broadcast tests. llvm-svn: 309099	2017-07-26 10:47:51 +00:00
Craig Topper	ad140cfb68	[X86] Add comment string for broadcast loads from the constant pool. Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062	2017-07-04 05:46:11 +00:00
Sanjay Patel	44e3d4c812	x86] adjust test constants to maintain coverage; NFC Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns. llvm-svn: 305646	2017-06-18 14:45:23 +00:00
Michael Kuperstein	6129887d21	[X86] Revert r299387 due to AVX legalization infinite loop. llvm-svn: 299720	2017-04-06 22:33:25 +00:00
Simon Pilgrim	af33757b5d	[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging. This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values. There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch. Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this. Differential Revision: https://reviews.llvm.org/D31373 llvm-svn: 299387	2017-04-03 21:06:51 +00:00
Simon Pilgrim	1d8235a022	Regenerate tests to remove duplicated checks llvm-svn: 298801	2017-03-26 10:28:39 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Craig Topper	fe25988c68	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. llvm-svn: 296290	2017-02-26 06:45:51 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Craig Topper	15d116ab41	[AVX-512] Re-generate tests that were updated for r290663 without using update_llc_test_checks.py so duplicate check lines weren't merged. llvm-svn: 290868	2017-01-03 05:46:10 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Simon Pilgrim	8893bd95f0	[X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain. Differential Revision: https://reviews.llvm.org/D27419 llvm-svn: 288902	2016-12-07 12:10:49 +00:00
Matthias Braun	39c3c89cdc	MCStreamer: Use "cfi" for CFI related temp labels. Choosing a "cfi" name makes the intend a bit clearer in an assembly dump and more importantly the assembly dumps are slightly more stable as the numbers don't move around anymore when unrelated code calls createTempSymbol() more or less often. As they are temp labels the name doesn't influence the generated object code. Differential Revision: https://reviews.llvm.org/D27244 llvm-svn: 288290	2016-11-30 23:48:26 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Craig Topper	5eb5ade894	[X86] Cleanup patterns for using VMOVDDUP for broadcasts. -Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern. -Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP. -Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported. llvm-svn: 283020	2016-10-01 07:11:24 +00:00
Craig Topper	f91830e6ee	[X86] Remove extra FileCheck lines that got left behind in r282688. llvm-svn: 282689	2016-09-29 06:07:07 +00:00
Craig Topper	7eb0e7ce1f	[AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition. llvm-svn: 282688	2016-09-29 05:54:43 +00:00
Simon Pilgrim	7a50c8c2ba	[X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101) llvm-svn: 279622	2016-08-24 12:42:31 +00:00
Simon Pilgrim	5f71c909f0	[X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied) AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 274013	2016-06-28 13:24:05 +00:00
Simon Pilgrim	c02b72627a	[X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478	2016-03-02 11:43:05 +00:00
Dan Gohman	61d15ae4f5	[MC] Use .p2align instead of .align For historic reasons, the behavior of .align differs between targets. Fortunately, there are alternatives, .p2align and .balign, which make the interpretation of the parameter explicit, and which behave consistently across targets. This patch teaches MC to use .p2align instead of .align, so that people reading code for multiple architectures don't have to remember which way each platform does its .align directive. Differential Revision: http://reviews.llvm.org/D16549 llvm-svn: 258750	2016-01-26 00:03:25 +00:00
Simon Pilgrim	2e7a1849c9	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264	2016-01-09 19:59:27 +00:00
Simon Pilgrim	323e00d9c7	[X86][AVX] Fold loads + splats into broadcast instructions On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector. This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element. Fix for PR23022 Differential Revision: http://reviews.llvm.org/D15310 llvm-svn: 255061	2015-12-08 22:17:11 +00:00
Simon Pilgrim	69aa463780	Fix line endings llvm-svn: 254939	2015-12-07 20:36:00 +00:00
Simon Pilgrim	12301b0814	[X86][AVX] Added tests to load+broadcast non-zero'th vector elements Baseline for an upcoming patch for PR23022 llvm-svn: 254898	2015-12-07 09:09:54 +00:00
Simon Pilgrim	29412ee45f	[X86][AVX2] Tidied up PBROADCAST tests Tidied up triple and regenerate tests using update_llc_test_checks.py llvm-svn: 254231	2015-11-28 14:15:40 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
Simon Pilgrim	106abe47d6	Line endings fix. NFC. llvm-svn: 227138	2015-01-26 21:28:32 +00:00
Simon Pilgrim	b16b09b154	[X86][SSE] Added support for SSE3 lane duplication shuffle instructions This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds. Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles. Also adds a missing tablegen pattern for MOVDDUP. Differential Revision: http://reviews.llvm.org/D7042 llvm-svn: 226716	2015-01-21 22:44:35 +00:00
Chandler Carruth	99627bfbff	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Quentin Colombet	6f12ae0d5c	[X86] Add broadcast instructions to the table used by ExeDepsFix pass. Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table. That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are across domain (int <-> float). In particular, prior to this patch we were generating: vpbroadcastd LCPI1_0(%rip), %ymm2 vpand %ymm2, %ymm0, %ymm0 vmaxps %ymm1, %ymm0, %ymm0 ## <- domain change penalty Now, we generate the following nice sequence where everything is in the float domain: vbroadcastss LCPI1_0(%rip), %ymm2 vandps %ymm2, %ymm0, %ymm0 vmaxps %ymm1, %ymm0, %ymm0 <rdar://problem/16354675> llvm-svn: 204770	2014-03-26 00:10:22 +00:00
Quentin Colombet	2d5c156b96	[X86][ISelDAG] Add missing fallback patterns for avx2 broadcast instructions. Those patterns are used when the load cannot be folded into the related broadcast during the select phase. This happens when the load gets additional uses that were not anticipated during the previous lowering phases (constant vector to constant load, then constant load reused) or when selection DAG is not able to prove that folding the load will not create a cycle in the DAG. <rdar://problem/16074331> llvm-svn: 204631	2014-03-24 17:54:19 +00:00
Robert Lougher	7d9084ffa1	Teach the DAGCombiner how to fold concat_vector nodes when the input is two BUILD_VECTOR nodes, e.g.: (concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4)) -> (BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4) This fixes an issue with AVX, where a sequence was not recognized as a 256-bit vbroadcast due to the concat_vectors. llvm-svn: 201158	2014-02-11 15:42:46 +00:00
Stephen Lin	6f36b45076	Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" grep -q "^; RUN: llc.debug" $NAME && continue grep -q "^; RUN:.llvm-objdump" $NAME && continue grep -q "^; RUN: opt." $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$[A-Za-z0-9_-]$$[A-Za-z0-9_-]$:$ $$FUNC[:] \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624	2013-07-18 22:47:09 +00:00
Elena Demikhovsky	9af899fa88	Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2. llvm-svn: 159504	2012-07-01 06:12:26 +00:00
Nadav Rotem	900c7cb7ce	Add support for additional in-reg vbroadcast patterns llvm-svn: 157127	2012-05-19 19:57:37 +00:00
Nadav Rotem	aa3ff8da00	AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions using the pattern (vbroadcast (i32load src)). In some cases, after we generate this pattern new users are added to the load node, which prevent the selection of the blend pattern. This commit provides fallback patterns which perform in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1). llvm-svn: 155437	2012-04-24 11:07:03 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00

1 2

53 Commits