llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	42fc7852f5	[X86] Print k-mask in FMA3 comments.	2020-04-12 13:16:53 -07:00
Simon Pilgrim	7e9747b50b	[X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554) This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default. Differential Revision: https://reviews.llvm.org/D75162	2020-02-29 18:57:35 +00:00
Craig Topper	57eb56b839	[X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.	2020-01-14 17:05:23 -08:00
Simon Pilgrim	31ed36d044	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3	2019-11-04 11:37:57 +00:00
Simon Pilgrim	6ada70d1b5	[X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hops We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea. Differential Revision: https://reviews.llvm.org/D69175 llvm-svn: 375341	2019-10-19 11:53:48 +00:00
Jordan Rupprecht	d0797ece46	Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied) This reverts r368662 (git commit `1a8d790cf5`) The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024 llvm-svn: 369167	2019-08-16 23:08:56 +00:00
Simon Pilgrim	1a8d790cf5	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276 llvm-svn: 368662	2019-08-13 10:51:39 +00:00
Hans Wennborg	5390d25f2b	Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660	2019-08-13 09:33:25 +00:00
Simon Pilgrim	67c246bbe6	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit. llvm-svn: 368307	2019-08-08 15:54:20 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Cameron McInally	8608afa964	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `41e0b9f280`. llvm-svn: 363301	2019-06-13 19:24:24 +00:00
Cameron McInally	675be5db46	Revert "[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll" This reverts commit `aeb89f8b33`. llvm-svn: 363300	2019-06-13 19:24:21 +00:00
Cameron McInally	aeb89f8b33	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 2 of n. llvm-svn: 363275	2019-06-13 15:54:20 +00:00
Cameron McInally	41e0b9f280	[NFC][CodeGen] Add unary FNeg tests to X86/avx512-intrinsics-fast-isel.ll Patch 1 of n. llvm-svn: 363215	2019-06-12 22:50:44 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Craig Topper	df02beb416	[X86] Add the rounding control operand to the printing for some scalar FMA instructions. llvm-svn: 358844	2019-04-21 07:12:56 +00:00
Sanjay Patel	26e06e859e	[x86] add x86-specific opcodes to extractelement scalarization list llvm-svn: 355792	2019-03-10 18:56:21 +00:00
Sanjay Patel	7fc6ef7dd7	[x86] scalarize extract element 0 of FP math This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs: 1. Memory loads may get scalarized resulting in more efficient code. 2. Memory stores may get scalarized resulting in more efficient code. 3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch. 4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch. The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions. Differential Revision: https://reviews.llvm.org/D58282 llvm-svn: 355130	2019-02-28 19:47:04 +00:00
Craig Topper	5d20eb240f	[X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR Summary: This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that. Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary. The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0. Fixes PR39741 Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56460 llvm-svn: 350800	2019-01-10 07:43:54 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
Simon Pilgrim	2a25360ae3	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795	2018-12-20 19:01:07 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Sanjay Patel	4cf1da0e02	[SelectionDAG] allow FP binops in SimplifyDemandedVectorElts This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541	2018-10-15 18:05:34 +00:00
Sanjay Patel	e28c8ecd72	[x86] add and use fast horizontal vector math subtarget feature This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361	2018-10-12 16:41:02 +00:00
Sanjay Patel	6cca8af227	[x86] allow single source horizontal op matching (PR39195) This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141	2018-10-10 13:39:59 +00:00
Simon Pilgrim	3b04a4e322	[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....). Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ. Thanks to Jan Vesely (@jvesely) for bisecting the bug. llvm-svn: 343935	2018-10-07 11:45:46 +00:00
Simon Pilgrim	9c9c97bcf4	[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector. There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify. As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.). Fixes PR39178 Differential Revision: https://reviews.llvm.org/D52935 llvm-svn: 343913	2018-10-06 10:20:04 +00:00
Simon Pilgrim	01ae462fef	[X86][SSE] Combine (some) target shuffles with multiple uses As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335	2018-08-09 12:30:02 +00:00
Craig Topper	a7a12399a1	[X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824	2018-08-03 07:01:10 +00:00
Craig Topper	f0b164415c	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083	2018-07-14 02:05:08 +00:00
Craig Topper	9e17073c21	[X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT. llvm-svn: 336514	2018-07-08 18:04:00 +00:00
Craig Topper	fdf3f1ff82	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma. I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction. Fast-isel tests have been updated to match new clang codegen. We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down. A future patch will also remove the old intrinsics. llvm-svn: 336506	2018-07-08 01:10:43 +00:00
Craig Topper	d679d01a1f	[X86] Use a rounding mode other than 4 in the scalar fma intrinsic fast-isel tests to match clang test cases. llvm-svn: 336505	2018-07-08 00:32:56 +00:00
Craig Topper	df99cdb95b	[X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their clang intrinsic names. I thought I fixed these yesterday, but I guess I missed a few. llvm-svn: 336071	2018-07-01 23:49:06 +00:00
Craig Topper	50a10ba6e0	[X86] Update some avx512 fast-isel tests to match their real clang IRgen. Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value. Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed. Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend. llvm-svn: 336045	2018-06-30 07:25:29 +00:00
Craig Topper	59f2f38fe0	[X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead. llvm-svn: 336035	2018-06-30 01:32:04 +00:00
Craig Topper	875e9f8fa4	[X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944	2018-06-29 05:43:26 +00:00
Craig Topper	8014053cbd	[X86] Update fast-isel tests for clang r335253. The new IR fixes a mismatch in the final extractelement for the i32 intrinsics. Previously we extracted a 64-bit element even though we only wanted 32 bits. SimplifyDemandedElts isn't able to make FP elements undef now and the shuffle mask I used prevents the use of horizontal add we had before. Not sure we should have been using horizontal add anyway. It's implemented on Intel with two port 5 shuffles and an add. So we have on less shuffle now, but an additional instruction to decode. Differential Revision: https://reviews.llvm.org/D48347 llvm-svn: 335256	2018-06-21 16:54:18 +00:00
Craig Topper	296526bf46	[X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead. llvm-svn: 335199	2018-06-21 05:00:56 +00:00
Craig Topper	31a64ee76c	[X86] Remove a fptosi from the test_mm512_mask_reduce_max_pd fast-isel test. The clang test inadvertently turned a floating point value into a double by having the wrong return type on the test function relative to the intrinsic it was testing. This resulted in an extra fptosi instruction that propagated into this test when I copied the clang output. llvm-svn: 335094	2018-06-20 04:32:06 +00:00
Craig Topper	31961f051f	[X86] Update fast-isel tests for clang's avx512f reduction intrinsics to match the codegen from r335070. llvm-svn: 335071	2018-06-19 19:14:50 +00:00
Craig Topper	858afbd165	[X86] Add fast-isel tests for clang's AVX512F vector reduction intrinsics. llvm-svn: 335068	2018-06-19 18:52:15 +00:00
Craig Topper	3c4cc01226	[X86] Add more instructions to the memory folding tables using the autogenerated table as a guide. I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions. There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass. llvm-svn: 334800	2018-06-15 05:49:19 +00:00
Craig Topper	3b060daba5	[X86] Fix some checks to use X86 instead of X32. These tests were recently updated so it looks like gone wrong. llvm-svn: 334786	2018-06-15 04:42:55 +00:00
Tomasz Krupa	d8d66a6b28	[X86] Lowering Mask Scalar intrinsics to native IR (LLVM part) Summary: Complementary patch to lowering add, sub, mul and div mask scalar intrinsics in Clang. Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed by: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47978 llvm-svn: 334740	2018-06-14 17:32:58 +00:00
Craig Topper	304bd747af	[X86] Add expandload and compresstore fast-isel tests for avx512f and avx512vl. Update existing tests for avx512vbmi2 to use target independent intrinsics. llvm-svn: 334368	2018-06-10 18:55:37 +00:00
Simon Pilgrim	1f60e2b41b	[X86][AVX512] Cleanup intrinsics tests Ensure we test on 32-bit and 64-bit targets, and strip -mcpu usage. Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible. llvm-svn: 333843	2018-06-03 14:56:04 +00:00
Gabor Buella	890e363e11	[X86] Lowering FMA intrinsics to native IR (LLVM part) Support for Clang lowering of fused intrinsics. This patch: 1. Removes bindings to clang fma intrinsics. 2. Introduces new LLVM unmasked intrinsics with rounding mode: int_x86_avx512_vfmadd_pd_512 int_x86_avx512_vfmadd_ps_512 int_x86_avx512_vfmaddsub_pd_512 int_x86_avx512_vfmaddsub_ps_512 supported with a new intrinsic type (INTR_TYPE_3OP_RM). 3. Introduces new x86 fmaddsub/fmsubadd folding. 4. Introduces new tests for code emitted by sequentions introduced in Clang part. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D47443 llvm-svn: 333554	2018-05-30 15:25:16 +00:00
Craig Topper	2adc7d956c	[X86] Add unmasked vermi2var intrinsics so we can use explicit select instructions for masking in clang. This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch. llvm-svn: 333386	2018-05-29 03:26:30 +00:00

1 2

91 Commits