llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	220cf53540	[X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303	2019-10-01 07:10:09 +00:00
Craig Topper	5dc49a8374	[X86] Add test case to show missed opportunity to shrink a constant index to a gather in order to avoid splitting. Also add a test case for an index that could be shrunk, but would create a narrow type. We can go ahead and do it we just need to be before type legalization. Similar test cases for scatter as well. llvm-svn: 373290	2019-10-01 01:27:52 +00:00
Amaury Sechet	d60c297d1d	Add partial bswap test to the X86 backend. NFC llvm-svn: 373271	2019-09-30 22:52:28 +00:00
Craig Topper	3405237f77	[X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246	2019-09-30 18:43:44 +00:00
Craig Topper	299ebacfe9	[X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234	2019-09-30 17:14:22 +00:00
Amaury Sechet	09025ca6fc	Add tests for rotate with demanded bits. NFC llvm-svn: 373223	2019-09-30 16:26:09 +00:00
Paul Robinson	ed1f3f36ae	[SSP] [3/3] cmpxchg and addrspacecast instructions can now trigger stack protectors. Fixes PR42238. Add test coverage for llvm.memset, as proxy for all llvm.mem* intrinsics. There are two issues here: (1) they could be lowered to a libc call, which could be intercepted, and do Bad Stuff; (2) with a non-constant size, they could overwrite the current stack frame. The test was mostly written by Matt Arsenault in r363169, which was later reverted; I tweaked what he had and added the llvm.memset part. Differential Revision: https://reviews.llvm.org/D67845 llvm-svn: 373220	2019-09-30 15:11:23 +00:00
Paul Robinson	14945186c2	[SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured" "Captured" and "relevant to Stack Protector" are not the same thing. This reverts commit `f29366b1f5`. aka r363169. Differential Revision: https://reviews.llvm.org/D67842 llvm-svn: 373216	2019-09-30 15:01:35 +00:00
Hans Wennborg	8569c0f1ab	Pre-commit a test case for PR43129. llvm-svn: 373190	2019-09-30 08:47:46 +00:00
Roger Ferrer Ibanez	5a2a14db0b	[TargetLowering] Simplify expansion of S{ADD,SUB}O ISD::SADDO uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for (non-zero) positive. Differential Revision: https://reviews.llvm.org/D47927 llvm-svn: 373187	2019-09-30 07:58:50 +00:00
Craig Topper	1b0ea0a12e	[X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177	2019-09-30 03:14:38 +00:00
Craig Topper	0e3f659137	[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops. There's room from improvement here, but this is a decent starting point. There are a few minor regressions in the vector-rotate tests, where we are now forming a vpternlog from an and before we get a chance to form it for a bitselect that we were matching previously. This results in an AND and an ANDN feeding the vpternlog where previously we just had an AND after the vpternlog. I think we can probably DAG combine the AND with the bitselect to get back to similar codegen. llvm-svn: 373172	2019-09-29 18:43:08 +00:00
Amaury Sechet	aabf8cbfca	Add test case peeking through vector concat when combining insert into shuffles. NFC llvm-svn: 373171	2019-09-29 17:54:03 +00:00
Craig Topper	494bfd9fed	[X86] Enable isel to fold broadcast loads that have been bitcasted from FP into a vpternlog. llvm-svn: 373157	2019-09-29 01:24:33 +00:00
Craig Topper	b6a2207ba2	[X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp This allows us to reduce the use count on the condition node before the match. This enables load folding for that operand without relying on the peephole pass. This will be improved on for broadcast load folding in a subsequent commit. This still requires a bunch of isel patterns for vXi16/vXi8 types though. llvm-svn: 373156	2019-09-29 01:24:29 +00:00
Craig Topper	0ac4aacea8	[X86] Enable canonicalizeBitSelect for AVX512 since we can use VPTERNLOG now. llvm-svn: 373155	2019-09-29 01:24:22 +00:00
Craig Topper	6195ed8397	[X86] Match (or (and A, B), (andn (A, C))) to VPTERNLOG with AVX512. This uses a similar isel pattern as we used for vpcmov with XOP. llvm-svn: 373154	2019-09-29 01:24:16 +00:00
Amara Emerson	509a4947c9	Add an operand to memory intrinsics to denote the "tail" marker. We need to propagate this information from the IR in order to be able to safely do tail call optimizations on the intrinsics during legalization. Assuming it's safe to do tail call opt without checking for the marker isn't safe because the mem libcall may use allocas from the caller. This adds an extra immediate operand to the end of the intrinsics and fixes the legalizer to handle it. Differential Revision: https://reviews.llvm.org/D68151 llvm-svn: 373140	2019-09-28 05:33:21 +00:00
Craig Topper	8b5ad3d16e	[X86] Add broadcast load unfolding support for VPTESTMD/Q and VPTESTNMD/Q. llvm-svn: 373138	2019-09-28 01:56:36 +00:00
Craig Topper	22984ebd0e	[X86] Split combineGatherScatter into a version for generic ISD nodes and another version for X86 specific nodes. The majority of the code doesn't run on the X86 nodes today since its gated by isBeforeLegalizeOps and we don't formm X86 nodes until after that. Except for a couple special case in type legalization. But I think we would probably break those if some of the transforms fire on them. I want to remove the hardcoded operand numbers and the unusual use of UpdateNodeOperands. Being able to know which ISD opcodes are present should help with that. llvm-svn: 373136	2019-09-28 01:06:58 +00:00
Craig Topper	305c811fd4	[X86] Add test case to show missed opportunity to turn (add (zext (vXi1 X)), Y) -> (sub Y, (sext (vXi1 X))) with avx512. With avx512, the vXi1 type is legal. And we can more easily sign extend them to vector registers. zext requires a sign extend and a shift. If we can easily turn the zext into a sext we should. llvm-svn: 373131	2019-09-27 22:30:24 +00:00
Craig Topper	750bdda638	[X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask element is wider than i1, not just when AVX512 is disabled. The AVX2 intrinsics can still be used when AVX512 is enabled and those go through this path. So we should simplify them. llvm-svn: 373108	2019-09-27 18:23:55 +00:00
Craig Topper	432a88bf04	[X86] Add test case to show failure to perform SimplifyDemandedBits on mask of avx2 gather intrinsics when avx512 is enabled. llvm-svn: 373107	2019-09-27 18:23:46 +00:00
Jesper Antonsson	39b81f1cbc	[CodeGenPrepare] Mend "avoid crashing from replacing a phi twice" fix. Summary: An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst() in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was: https://bugs.llvm.org/show_bug.cgi?id=41052 ... and the previous fix was D59358. Reviewers: aprantl, skatkov Reviewed By: skatkov Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67838 llvm-svn: 373084	2019-09-27 13:01:37 +00:00
Craig Topper	d3f82b8b97	[X86] Add VMOVSSZrrk/VMOVSDZrrk/VMOVSSZrrkz/VMOVSDZrrkz to getUndefRegClearance. We have isel patterns that can put an IMPLICIT_DEF on one of the sources for these instructions. So we should make sure we break any dependencies there. This should be done by just using one of the other sources. llvm-svn: 373025	2019-09-26 22:56:06 +00:00
Craig Topper	c898724974	[X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 addr), fp32imm0) to use masked MOVSS from memory. Similar for f64 and having a non-zero passthru value. We were previously not trying to fold the load at all. Using a CodeGenOnly instruction allows us to use FR32X/FR64X as the register class to avoid a bunch of COPY_TO_REGCLASS. llvm-svn: 373021	2019-09-26 22:23:09 +00:00
Roman Lebedev	3a5ca1c8b5	[DAGCombine][X86][AArch64][NFC] Add tests for shift-by-signext llvm-svn: 373014	2019-09-26 20:49:49 +00:00
Craig Topper	ee78e44126	[X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load folding of the other operand. The SSE and VEX versions are already correct. llvm-svn: 372941	2019-09-26 04:42:58 +00:00
Sanjay Patel	831a7e7068	[DAGCombiner] add one-use restriction to vector transform with cheap extract We might be able to do better on the example in the test, but in general, we should not scalarize a splatted vector binop if there are other uses of the binop. Otherwise, we can end up with code as we had - a scalar op that is redundant with a vector op. llvm-svn: 372886	2019-09-25 15:08:33 +00:00
Sanjay Patel	1aa09e0585	[x86] add test for multi-use scalarization of vector binop; NFC llvm-svn: 372883	2019-09-25 14:57:45 +00:00
Simon Pilgrim	a7f27f357d	[X86] Add MMX MOVD/MOVQ stores to folding tables to support stack folding llvm-svn: 372770	2019-09-24 16:15:32 +00:00
Simon Pilgrim	682d41a506	[X86] Add tests showing failure to stack fold MMX MOVD/MOVQ stores llvm-svn: 372766	2019-09-24 15:40:09 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Craig Topper	8a6916e6db	[X86] Reduce the number of unique check prefixes in memset-nonzero.ll. NFC The avx512 with prefer-256-bit generates the same code as AVX2 so just reuse that prefix. llvm-svn: 372661	2019-09-23 21:29:28 +00:00
Sanjay Patel	7414151929	[BreakFalseDeps] ignore function with minsize attribute This came up in the x86-specific: https://bugs.llvm.org/show_bug.cgi?id=43239 ...but it is a general problem for the BreakFalseDeps pass. Dependencies may be broken by adding some other instruction, so that should be avoided if the overall goal is to minimize size. Differential Revision: https://reviews.llvm.org/D67363 llvm-svn: 372628	2019-09-23 17:01:01 +00:00
Sanjay Patel	31b9dfe23f	[x86] fix assert with horizontal math + broadcast of vector (PR43402) https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606	2019-09-23 13:30:23 +00:00
Craig Topper	03b5a13ee3	[X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM. llvm-svn: 372544	2019-09-23 05:35:23 +00:00
Craig Topper	5e26064c40	[X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543	2019-09-23 05:35:20 +00:00
Craig Topper	1f058538e0	[X86] Add 32-bit command line to avx512f-vec-test-testn.ll llvm-svn: 372542	2019-09-23 05:35:15 +00:00
David Zarzycki	a7a515cb77	Prefer AVX512 memcpy when applicable When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540	2019-09-23 05:00:59 +00:00
Craig Topper	a533e87792	[X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535	2019-09-23 01:05:33 +00:00
Roman Lebedev	7c3d6f5a1b	[X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback to BZHI is profitable (PR43381) Summary: PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI, we only do that if we either have BEXTRI (TBM), or if BEXTR is marked as being fast (`-mattr=+fast-bextr`). In all other cases we don't match. But that is mainly only true for AMD CPU's. However, for all the CPU's for which we have sched models, the BZHI is always fast (or the sched models are all bad.) So if we decide that it's unprofitable to emit BEXTR/BEXTRI, we should consider falling-back to BZHI if it is available, and follow-up with the shift. While it's really tempting to do something because it's cool it is wise to first think whether it actually makes sense to do. We shouldn't just use BZHI because we can, but only it it is beneficial. In particular, it isn't really worth it if the input is a register, mask is small, or we can fold a load. But it is worth it if the mask does not fit into 32-bits. (careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here) Thus we manage to fold a load: https://godbolt.org/z/Er0OQz Or if we'd end up using BZHI anyways because the mask is large: https://godbolt.org/z/dBJ_5h But this isn'r actually profitable in general case, e.g. here we'd increase microop count (the register renaming is free, mca does not model that there it seems) https://godbolt.org/z/k6wFoz Likewise, not worth it if we just get load folding: https://godbolt.org/z/1M1deG https://bugs.llvm.org/show_bug.cgi?id=43381 Reviewers: RKSimon, craig.topper, davezarzycki, spatel Reviewed By: craig.topper, davezarzycki Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67875 llvm-svn: 372532	2019-09-22 22:04:29 +00:00
Roman Lebedev	24159592ca	[NFC][X86] Add BEXTR test with load and 33-bit mask (PR43381 / D67875) llvm-svn: 372524	2019-09-22 19:36:38 +00:00
Craig Topper	a1d86857ff	[X86] Update commutable EVEX vcmp patterns to use timm instead of imm. We need to match TargetConstant, not Constant. This was broken in r372338, but we lacked test coverage. llvm-svn: 372523	2019-09-22 19:06:13 +00:00
Craig Topper	ac84771261	[X86] Add more tests for commuting evex vcmp instructions during isel to fold a load. Some of the isel patterns were not updated to check for TargetConstant instead of Constant in r372338. llvm-svn: 372522	2019-09-22 19:06:08 +00:00
Craig Topper	38014c553f	[X86] Add test memset and memcpy testcases for D67874. NFC llvm-svn: 372494	2019-09-22 06:52:25 +00:00
Roman Lebedev	854b0f0f00	[NFC][X86] Adjust check prefixes in bmi.ll (PR43381) llvm-svn: 372468	2019-09-21 11:12:55 +00:00
Craig Topper	04682939eb	[X86] Use sse_load_f32/f64 and timm in patterns for memory form of vgetmantss/sd. Previously we only matched scalar_to_vector and scalar load, but we should be able to narrow a vector load or match vzload. Also need to match TargetConstant instead of Constant. The register patterns were previously updated, but not the memory patterns. llvm-svn: 372458	2019-09-21 06:44:29 +00:00
Craig Topper	4fa12ac92c	[X86] Add test case to show failure to fold load with getmantss due to isel pattern looking for Constant instead of TargetConstant The intrinsic has an immarg so its gets created with a TargetConstant instead of a Constant after r372338. The isel pattern was only updated for the register form, but not the memory form. llvm-svn: 372457	2019-09-21 06:44:24 +00:00
Sterling Augustine	4a58936716	Fix missed case of switching getConstant to getTargetConstant. Try 2. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67850 llvm-svn: 372434	2019-09-20 22:26:55 +00:00

1 2 3 4 5 ...

14680 Commits