llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	783dbe402f	[X86][AVX] combineX86ShufflesRecursively - peek through extract_subvector If we have more then 2 shuffle ops to combine, try to use combineX86ShuffleChainWithExtract to see if some are from the same super vector. llvm-svn: 365050	2019-07-03 15:46:08 +00:00
Simon Pilgrim	868d0b7fd9	[X86][AVX] Combine vpermi(bitcast(x)) -> bitcast(vpermi(x)) iff the number of elements doesn't change. This gets around an issue with combineX86ShuffleChain not being able to hint which domain is preferred for shuffles that can be done with either. Fixes regression introduced in rL365041 llvm-svn: 365044	2019-07-03 14:34:16 +00:00
Simon Pilgrim	0c230209fe	[X86][AVX] combineX86ShuffleChainWithExtract - add number of non-zero extract_subvectors to the combine depth This better accounts for the cost/benefit of removing extract_subvectors from the shuffle and will be more useful in future patches. The vpermq predicate regression will be fixed shortly. llvm-svn: 365041	2019-07-03 14:17:21 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Alex Lorenz	3dbdbbec84	[triple] Use 'macabi' environment name for the Mac Catalyst triples The 'macabi' environment name is preferred instead of 'maccatalyst'. llvm-svn: 364988	2019-07-03 01:02:43 +00:00
Alex Lorenz	da1dfecd32	Add support for the 'macCatalyst' MachO platform Mac Catalyst is a new MachO platform in macOS Catalina. It always uses the build_version MachO load command. Differential Revision: https://reviews.llvm.org/D64107 llvm-svn: 364981	2019-07-02 23:47:11 +00:00
Craig Topper	fa4e825a3b	[X86] Copy test cases from vector-zext.ll to vector-zext-widen.ll. Same for vector-sext.ll. NFC llvm-svn: 364957	2019-07-02 18:39:59 +00:00
Simon Pilgrim	5613874947	[X86] getTargetConstantBitsFromNode - remove unnecessary getZExtValue() (PR42486) Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits. In this case it was completely superfluous as we'd moved the rest of the code to always use APInt. Fixes the <1 x i128> addition bug in PR42486 llvm-svn: 364953	2019-07-02 18:20:38 +00:00
Craig Topper	cffbaa93b7	[X86] Add patterns to select (scalar_to_vector (loadf32)) as (V)MOVSSrm instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt. Similar for (V)MOVSD. Ultimately, I'd like to see about folding scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm so this is closer to that. llvm-svn: 364948	2019-07-02 17:51:02 +00:00
Roman Lebedev	059f495831	[NFC][Codegen][X86][AArch64][ARM][PowerPC] Recommit: Add test coverage for "add-of-inc" vs "sub-of-not" I initially committed it with --check-prefix instead of --check-prefixes (again, shame on me, and utils/update_*.py not complaining!) and did not have a moment to understand the failure, so i reverted it initially in rL64939. llvm-svn: 364945	2019-07-02 16:48:49 +00:00
Roman Lebedev	893bbc9001	Revert "[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for "add-of-inc" vs "sub-of-not"" Some test failures i don't have a moment to investigate. This reverts commit r364930. llvm-svn: 364939	2019-07-02 15:54:24 +00:00
Roman Lebedev	39639261cc	[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for "add-of-inc" vs "sub-of-not" As it is pointed out in https://reviews.llvm.org/D63992, before we get to pick canonical variant in middle-end we should ensure best codegen in backend. llvm-svn: 364930	2019-07-02 14:48:52 +00:00
Craig Topper	5e7815b695	[X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patterns These instructions only read 64-bits of memory so we shouldn't allow a full vector width load to be pattern matched in case it is marked volatile. Instead allow vzload or scalar_to_vector+load. Also add a DAG combine to turn full vector loads into vzload when used by one of these instructions if the load isn't volatile. This fixes another case for PR42079 llvm-svn: 364838	2019-07-01 19:01:37 +00:00
Robert Lougher	e20030f612	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info(2) The function findPotentialBlockers may consider debug info instructions as potential blockers and may stop searching for a store-load pair prematurely. This patch corrects this and tests the cases where the store is separated from the load by more than InspectionLimit debug instructions. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62408 llvm-svn: 364829	2019-07-01 18:28:21 +00:00
Craig Topper	fcda45a9eb	[X86] Add more load folding tests for vcvt(t)ps2(u)qq showing missed foldings. NFC llvm-svn: 364730	2019-07-01 07:59:42 +00:00
Craig Topper	29fff0797b	[X86] Improve the type checking fast-isel handling of vector bitcasts. We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729	2019-07-01 07:09:34 +00:00
Craig Topper	4ca81a9b99	[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728	2019-07-01 07:09:31 +00:00
Craig Topper	fc233c9108	[X86] Add some additional load folding tests to vec_int_to_fp.ll/vec_int_to_fp-widen.ll and disable the peephole pass. Also copy some missing test cases from vec_int_to_fp.ll to vec_int_to_fp-widen.ll llvm-svn: 364727	2019-07-01 07:09:26 +00:00
Craig Topper	d1728f8987	[X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD. We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726	2019-07-01 07:09:23 +00:00
Simon Pilgrim	978a08c885	[X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain llvm-svn: 364667	2019-06-28 17:57:32 +00:00
Roman Lebedev	0b8b419537	[NFC][Codegen] Revisit test coverage for X % C == 0 fold once more (add tests with '1' divisor) llvm-svn: 364661	2019-06-28 17:26:28 +00:00
Simon Pilgrim	a54e1a0f01	[X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be. Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all). llvm-svn: 364644	2019-06-28 12:24:49 +00:00
Roman Lebedev	9af4474253	[NFC][Codegen] Revisit test coverage for X % C == 0 fold llvm-svn: 364642	2019-06-28 11:36:34 +00:00
Roman Lebedev	29d05c005f	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364600	2019-06-27 21:52:10 +00:00
Sanjay Patel	7ecf1ec49a	[x86] remove whitespace; NFC llvm-svn: 364588	2019-06-27 20:37:12 +00:00
Sanjay Patel	a95ca2b5ff	[x86] prevent crashing from select narrowing with AVX512 llvm-svn: 364585	2019-06-27 20:16:58 +00:00
Roman Lebedev	bd34e50cf0	[NFC][CodeGen] Add negative test for X u% C == 0 fold (D63391) The fold (D63391) uses multiplicativeInverse(), but it is not guaranteed to always succeed, and '100' appears to be one of the problematic values. llvm-svn: 364578	2019-06-27 19:09:51 +00:00
Roman Lebedev	0a2b7b79fa	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. llvm-svn: 364568	2019-06-27 17:22:31 +00:00
Roman Lebedev	0627b09863	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364563	2019-06-27 16:45:42 +00:00
Paul Robinson	1339f74b8a	[debug-info] Make a couple of tests more robust. llvm-svn: 364556	2019-06-27 15:53:07 +00:00
Simon Pilgrim	83e1a1e79b	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. llvm-svn: 364548	2019-06-27 14:25:54 +00:00
Simon Pilgrim	c692a8dc51	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts llvm-svn: 364541	2019-06-27 13:48:43 +00:00
Simon Pilgrim	d45b4f861e	[X86][SSE] Regenerate v48 shuffle test on a variety of targets llvm-svn: 364520	2019-06-27 11:22:23 +00:00
Simon Pilgrim	90e121fbe6	[X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X) If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ. llvm-svn: 364518	2019-06-27 11:16:03 +00:00
Djordje Todorovic	7eeeb5947e	[ISEL][X86] Tracking of registers that forward call arguments While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516	2019-06-27 10:51:15 +00:00
Craig Topper	9ea5a32251	[X86] Teach selectScalarSSELoad to not narrow volatile loads. llvm-svn: 364498	2019-06-27 05:51:56 +00:00
Craig Topper	3d12971e1c	[X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 llvm-svn: 364469	2019-06-26 20:16:19 +00:00
Simon Pilgrim	dfe079ffbf	[X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 llvm-svn: 364458	2019-06-26 18:21:26 +00:00
Simon Pilgrim	16ba077a2d	[X86][AVX] Add reduced test case for PR41545 llvm-svn: 364454	2019-06-26 17:56:53 +00:00
Simon Pilgrim	6b687bf681	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432	2019-06-26 14:40:49 +00:00
Simon Pilgrim	b13c6f1a9d	[X86][SSE] X86TargetLowering::isBinOp - add PCMPGT Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364431	2019-06-26 14:34:41 +00:00
Roman Lebedev	fbb2e40d5c	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419	2019-06-26 12:19:47 +00:00
Roman Lebedev	b0ecc1cc6b	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418	2019-06-26 12:19:39 +00:00
Roman Lebedev	8b9a03973a	[X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness Summary: Finally tying up loose ends here. The problem is quite simple: If we have pattern `(x >> start) & (1 << nbits) - 1`, and then truncate the result, that truncation will be propagated upwards, into the `and`. And that isn't currently handled. I'm only fixing pattern `a` here, the same fix will be needed for patterns `b`/`c` too. I think this isn't missing any extra legality checks, since we only look past truncations. Similary, i don't think we can get any other truncation there other than i64->i32. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62786 llvm-svn: 364417	2019-06-26 12:19:11 +00:00
Clement Courbet	2851248fa1	Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416	2019-06-26 12:13:13 +00:00
Clement Courbet	7b3a5f0e6d	[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline. This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412	2019-06-26 11:50:18 +00:00
Simon Pilgrim	c0711af7f9	[X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407	2019-06-26 11:21:09 +00:00
Simon Pilgrim	3845a4f849	[X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts. This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type. llvm-svn: 364401	2019-06-26 09:50:11 +00:00
Clement Courbet	be98e0ab78	[ExpandMemCmp] Honor prefer-vector-width. Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384	2019-06-26 07:06:49 +00:00
QingShan Zhang	e0e7d4c366	Teach the DAGCombine to fold this pattern(c1 and c2 is constant). // fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382	2019-06-26 05:12:53 +00:00
Philip Reames	be0dedb2e1	[Peephole] Allow folding loads into instructions w/multiple uses (such as test64rr) Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one user, but instead returned true only when there was one use. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one. All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection. Differential Revision: https://reviews.llvm.org/D63656 llvm-svn: 364336	2019-06-25 17:29:18 +00:00
Craig Topper	14ea14ae85	[X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333	2019-06-25 17:08:26 +00:00
Sanjay Patel	685c5cbc65	[SDAG] expand ctpop != 1 Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. llvm-svn: 364319	2019-06-25 14:46:52 +00:00
Sanjay Patel	0baacea2c7	[AArch64][x86] add tests for ctpop != 1; NFC This is the inverted predicate pattern for D63004. llvm-svn: 364314	2019-06-25 13:37:16 +00:00
Simon Pilgrim	1a18bb6f25	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311	2019-06-25 13:25:57 +00:00
Simon Pilgrim	36953ce769	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303	2019-06-25 12:57:43 +00:00
Simon Pilgrim	69fc111184	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299	2019-06-25 12:19:12 +00:00
Simon Pilgrim	49b3778e32	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290	2019-06-25 10:51:15 +00:00
Roman Lebedev	cdd43eac4f	[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286	2019-06-25 10:01:42 +00:00
Craig Topper	33e547bdde	[X86] Add test case that led to the revert of r363802, r363850, and r363856 in r364264 I've been trying to fix this, but hit some roadblocks. So I'm committing the test case for now so we'll at least avoid recreating that failure. llvm-svn: 364272	2019-06-25 06:40:28 +00:00
Craig Topper	079924b0b7	Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..." This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264	2019-06-25 01:32:42 +00:00
Roland Froese	ea08248b2b	[CodeGen] Add missing vector type legalization for ctlz_zero_undef Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as ctlz and cttz. Differential Revision: https://reviews.llvm.org/D63463 llvm-svn: 364221	2019-06-24 19:27:07 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Simon Pilgrim	cf6917c6bd	[X86] Regenerate fast fadd reduction tests. NFCI Fix whitespace. llvm-svn: 364200	2019-06-24 16:25:30 +00:00
Simon Pilgrim	69144a925e	[DAGCombine] visitMUL - allow shift by zero in MulByConstant. This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward. Fixes OSS Fuzz #15429 llvm-svn: 364179	2019-06-24 12:47:17 +00:00
Craig Topper	e8da65c698	[X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163	2019-06-23 23:51:21 +00:00
Simon Pilgrim	a962c1bc0f	[X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136	2019-06-22 17:57:01 +00:00
Craig Topper	f5a5785632	[X86] Add test cases for incorrect shrinking of volatile vector loads from 128-bits to 32 or 64 bits. NFC This is caused by isel patterns that look for vzmovl+load and treat it the same as vzload. llvm-svn: 364101	2019-06-21 20:16:26 +00:00
Craig Topper	4649a051bf	[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095	2019-06-21 19:10:21 +00:00
Craig Topper	4569cdbcf5	[X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types. We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093	2019-06-21 18:50:00 +00:00
Craig Topper	91ea99295c	[X86] Add avx512bw command lines to avx512-select.ll Prep for fixing PR42355 and ensuring we have coverage of ISD::SELECT for v64i8/v32i16 on KNL and SKX configs. llvm-svn: 364092	2019-06-21 18:49:42 +00:00
Simon Pilgrim	5dba4ed208	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090	2019-06-21 18:35:04 +00:00
Craig Topper	6af1be9664	[X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl. We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079	2019-06-21 17:24:21 +00:00
Amara Emerson	8f25a021dd	[AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal. We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075	2019-06-21 16:43:50 +00:00
Simon Pilgrim	36a999ffb8	[X86] X86ISD::ANDNP is a (non-commutative) binop The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038	2019-06-21 12:42:39 +00:00
Simon Pilgrim	771c33e375	[X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern llvm-svn: 364022	2019-06-21 10:44:15 +00:00
David Bolvansky	642ed40e57	[NFC] Add more tests for D46262 llvm-svn: 363970	2019-06-20 19:39:15 +00:00
Craig Topper	9e1665f2d6	[X86] Add BLSI to isUseDefConvertible. Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957	2019-06-20 17:52:53 +00:00
Simon Pilgrim	1d8093249f	[DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929	2019-06-20 14:42:27 +00:00
Craig Topper	3ba20e943e	[X86] Add test cases showing missed opportunities to use the C flag from the BLSI instruction to avoid a TEST instruction llvm-svn: 363909	2019-06-20 06:45:01 +00:00
Sanjay Patel	b5640b6fe8	[x86] avoid vector load narrowing with extracted store uses (PR42305) This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853	2019-06-19 18:13:47 +00:00
Sanjay Patel	33ef687d94	[x86] add test for unaligned 32-byte load/store splitting; NFC llvm-svn: 363852	2019-06-19 18:06:59 +00:00
Simon Pilgrim	6016fb726c	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850	2019-06-19 18:00:24 +00:00
Simon Pilgrim	34279db355	[X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG. We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841	2019-06-19 17:21:15 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	85f70baa23	[X86] Add non-uniform (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) test llvm-svn: 363791	2019-06-19 11:36:01 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Evandro Menezes	1933cbe866	[test] Change comment wording (NFC) llvm-svn: 363751	2019-06-18 23:31:10 +00:00
Sanjay Patel	413ed69b4b	[x86] add another test for load splitting with extracted stores (PR42305); NFC llvm-svn: 363732	2019-06-18 20:13:35 +00:00
Sanjay Patel	223176f5d7	[x86] add test for load splitting with extracted store (PR42305); NFC llvm-svn: 363704	2019-06-18 17:16:17 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Pilgrim	83bacd8d72	[SelectionDAG] Legalize vaargs that require vector splitting This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671	2019-06-18 12:24:02 +00:00
Simon Pilgrim	6658bfb171	[X86] Regenerate promote.ll. NFC. llvm-svn: 363657	2019-06-18 10:10:53 +00:00
Craig Topper	02a445c245	[X86] Add i128 ctpop and i32/i64/i128 optsize test cases to popcnt.ll Test cases for PR41151 and D59909. llvm-svn: 363647	2019-06-18 04:52:49 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00

1 2 3 4 5 ...

14296 Commits