llvm-project

Commit Graph

Author	SHA1	Message	Date
Francesco Petrogalli	4dde9e9b02	[llvm][CodeGen] IR intrinsics for SVE2 contiguous conflict detection instructions. Summary: The IR intrinsics are mapped to the following SVE2 instructions: * WHILERW <Pd>.<T>, <Xn>, <Xm> * WHILEWR <Pd>.<T>, <Xn>, <Xm> The intrinsics introduced in this patch are the IR counterpart of the SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type variants). Patch by Maciej Gąbka <maciej.gabka@arm.com>. Reviewers: kmclaughlin, rengolin Reviewed By: kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75862	2020-03-11 18:28:02 +00:00
Stanislav Mekhanoshin	9801e5469b	[AMDGPU] Disable nested endcf collapse The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask restored in the outer. It turns out with our current structurizer this is not always the case. Disable the optimization for now, but I want to keep it around for a while to either try after further structurizer changes or to move it into control flow lowering where we have more info and reuse the test. Differential Revision: https://reviews.llvm.org/D75958	2020-03-11 11:24:20 -07:00
Jay Foad	a46dba24fa	[AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV Summary: There's a lot of test case churn but the overall effect is to increase the number of back-to-back v_sub,v_subbrev pairs, which can execute with no delay even on gfx10. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75999	2020-03-11 17:59:21 +00:00
Andrzej Warzynski	a9f1583228	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928	2020-03-11 17:05:21 +00:00
Philip Reames	e671641844	[GC] Remove buggy untested optimization from statepoint lowering A downstream test case (see included reduced test) revealed that we have a bug in how we handle duplicate relocations. If we have the same SDValue relocated twice, and that value happens to be a constant (such as null), we only export one of the two llvm::Values. Exporting on a per llvm::Value basis is required to allow lowering of gc.relocates in following basic blocks (e.g. invokes). Without it, we end up with a use of an undefined vreg and bad things happen. Rather than fixing the optimization - which appears to be hard - I propose we simply remove it. There are no tests in tree that change with this code removed. If we find out later that this did matter for something, we can reimplement a variation of this in CodeGenPrepare to catch the easy cases without complicating the lowering code. Thanks to Denis and Serguei who did all the hard work of figuring out what went wrong here. The patch is by far the easy part. :) Differential Revision: https://reviews.llvm.org/D75964	2020-03-11 10:03:24 -07:00
David Green	72bf26feb3	[ARM] Extra VFMA tests. NFC	2020-03-11 15:14:07 +00:00
Matt Arsenault	a2202f6a3f	AMDGPU/GlobalISel: Manually RegBankSelect copies This was failng on any pre-assigned copy to the VCC bank. This is something of a workaround for the default implementation in getInstrMappingImpl, and how it treats copy-like operations in general. Copy-like operations are considered to only have one result register bank, rather than separate banks for each source like a normal instruction. To avoid potentially mishandling reg_sequence with impossible operand combinations, the generic implementation errors on impossible costs. If the bank was already assigned, is treated it as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't get any value from any of what getInstrMappingImpl tries to do for copies, so just directly emit the simple mapping we really want.	2020-03-11 11:12:12 -04:00
Sam Parker	51cad66e97	[NFC][ARM] Add test Precommit test for LowOverheadLoops.	2020-03-11 11:51:52 +00:00
Simon Pilgrim	b3b4727a3e	[X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467) For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD. The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms. Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases. The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up. Differential Revision: https://reviews.llvm.org/D75748	2020-03-11 11:17:49 +00:00
Victor Campos	8a12553223	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
QingShan Zhang	9304decdee	[NFC][Test] Add a PowerPC test to verify the behavior of ab +/- cd	2020-03-11 09:35:40 +00:00
Sebastian Neubauer	2f857eadf5	[AMDGPU] Use script to generate atomic optimizations test This is a preparation for introducing a llvm.amdgcn.ballot intrinsic in D65088.	2020-03-11 09:59:36 +01:00
QingShan Zhang	5edf900da0	[NFC][Test] Format the test PowerPC/recipest.ll with update_llc_test_checks.py	2020-03-11 08:49:53 +00:00
Matt Arsenault	c0ad75e758	GlobalISel: Don't try to narrow extending loads/trunc store If the loaded memory size was smaller than the result size, this would produce out of bounds memory accesses. I'm wondering if we need a distinct narrow memory legalize action type, since a case I care about is decomposing a 4-byte unaligned access into 4 extending loads, which would leave the original result register type. I'm currently awkwardly using narrowScalar to handle unaligned accesses that need to be split.	2020-03-10 23:34:10 -04:00
Matt Arsenault	206d46a192	AMDGPU/GlobalISel: Add some tests that used to infinite loop	2020-03-10 22:12:56 -04:00
Carl Ritson	d07f9e7309	[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32 Summary: In the same manner as struct.buffer.load / struct.buffer.store, allow struct.buffer.load.format / struct.buffer.store.format to return / accept any type. This simplifies front-end code gen. Reviewers: tpr, arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75789	2020-03-11 08:20:32 +09:00
Matt Arsenault	edd0dfca0d	AMDGPU/GlobalISel: Refine G_TRUNC legality rules Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.	2020-03-10 15:32:22 -07:00
Matt Arsenault	ce8a1f7294	GlobalISel: Implement fewerElementsVector for G_TRUNC Extend fewerElementsVectorBasic to handle operands with different element types.	2020-03-10 15:17:20 -07:00
Matt Arsenault	200b20639a	AMDGPU: Use V_MAC_F32 for fmad.ftz This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?	2020-03-10 14:41:06 -07:00
Matt Arsenault	c4de8935a5	ARM: Fixup some tests using denormal-fp-math attribute Don't use the deprecated, single mode form in tests. Also make sure to parse the attribute, in case of the deprecated form.	2020-03-10 14:02:06 -04:00
Kazushi (Jam) Marukawa	3dabad1af3	[VE] Target-specific bit size for sjljehprepare Summary: This patch extends the TargetMachine to let targets specify the integer size used by the sjljehprepare pass. This is 64bit for the VE target and otherwise defaults to 32bit for all targets, which was hard-wired before. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71337	2020-03-10 17:51:16 +01:00
Simon Pilgrim	c8ede5e485	[X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.	2020-03-10 15:42:37 +00:00
Simon Pilgrim	e6a7e3b5e3	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles. This causes one minor test change but is mainly necessary for an upcoming patch.	2020-03-10 15:42:36 +00:00
Simon Pilgrim	417fe39be5	[X86][SSE] Add some extract+insert shuffle tests Shows failure to avoid xmm<->gpr transfers by using insertps/blendps	2020-03-10 15:42:36 +00:00
Matt Arsenault	67cfbec746	AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns In case the source value ends up in a VGPR, insert a readfirstlane to avoid producing an illegal copy later. If it turns out to be unnecessary, it can be folded out.	2020-03-10 11:18:48 -04:00
Jonas Paulsson	62ff9960d3	[SystemZ] Improve foldMemoryOperandImpl(). Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users. This is relatively straight forward since the live-in lists for the CC register can be assumed to be correct during register allocation (thanks to `659efa2`). Also fold a spilled operand of an LOCR/SELR into an LOC(G). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D67437	2020-03-10 15:54:47 +01:00
Simon Pilgrim	e71fb46a8f	[TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call. This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.	2020-03-10 13:39:10 +00:00
Simon Pilgrim	b9b96adcf5	[X86][SSE] Add SSE41 coverage for fmaxnum/fminnum tests	2020-03-10 11:18:27 +00:00
alex-t	39e1a90784	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472	2020-03-10 14:04:22 +03:00
Kerry McLaughlin	0bba37a320	[AArch64][SVE] Add SVE intrinsics for address calculations Summary: Adds the @llvm.aarch64.sve.adr[b\|h\|w\|d] intrinsics Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75858	2020-03-10 10:53:37 +00:00
James Greenhalgh	f0de8d0940	[Arm] Do not lower vmax/vmin to Neon instructions On some Arm cores there is a performance penalty when forwarding from an S register to a D register. Calculating VMAX in a D register creates false forwarding hazards, so don't do that unless we're on a core which specifically asks for it. Patch by James Greenhalgh Differential Revision: https://reviews.llvm.org/D75248	2020-03-10 10:48:48 +00:00
Simon Pilgrim	18c19441d1	[X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128 For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern. At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.	2020-03-10 10:44:28 +00:00
Clement Courbet	30477197b3	[ExpandMemCmp][NFC] Add more tests.	2020-03-10 11:34:19 +01:00
Sam Parker	ff9ac33e1e	[ARM][MVE] Validate tail predication values Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not. We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the (observable) results. We're doing this because many instructions in the loop will not be predicated and so the conversion from VPT predication to tail predication can result in different values being produced, because of falsely predicated lanes not being updated in the converted form. A masked load, whether through VPT or tail predication, will write zeros to any of the falsely predicated bytes. So, from the loads, we know that the false lanes are zeroed and here we're trying to track that those false lanes remain zero, or where they change, the differences are masked away by their user(s). All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already. Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form too. Because of this, we can insert loads, stores and other predicated instructions into our KnownFalseZeros set and build from there. Differential Revision: https://reviews.llvm.org/D75452	2020-03-10 09:59:01 +00:00
Djordje Todorovic	5aa5c943f7	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-03-10 09:15:06 +01:00
Puyan Lotfi	4b8af31f63	[llvm][MIRVRegNamer] Avoid collisions across constant pool indices. When hashing on MachineOperand::MO_ConstantPoolIndex, now MIR-Canon and MIRVRegNamer will no longer result in a hash collision. Differential Revision: https://reviews.llvm.org/D74449	2020-03-10 01:13:20 -04:00
Matt Arsenault	627bb31a28	AMDGPU/GlobalISel: Avoid illegal vector exts for add/sub/mul When expanding scalar packed operations, we should not introduce illegal vector casts LegalizerHelper introduces. We're not in a legalizer context, and there's no RegBankSelect apply or legalize worklist.	2020-03-09 23:42:17 -04:00
Matt Arsenault	ed72bcae34	AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul We weren't considering the packed case correctly, and this was passing through to the selector. The selector only checked the size, so this would incorrectly compile to a single 32-bit scalar add. As usual, the LegalizerHelper is somewhat awkward to use from applyMappingImpl. I think this is the first place we've needed multi-step legalization here though.	2020-03-09 22:51:54 -04:00
Matt Arsenault	eb41627799	AMDGPU/GlobalISel: Improve handling of illegal return types Most importantly, this fixes ret i8. Also make sure to handle signext/zeroext for odd types > i32. Some of the corresponding argument passing fixes also need to be handled.	2020-03-09 13:11:30 -07:00
Matt Arsenault	156a1b59df	AMDGPU: Make signext/zeroext behave more sensibly over > i32 Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i16}, which should extend the high part.	2020-03-09 12:56:10 -07:00
Matt Arsenault	209094eeb6	AMDGPU/GlobalISel: Start matching s_lshlN_add_u32 instructions Use a hack to only enable this for GlobalISel. Technically this also works with SelectionDAG, but the divergence selection isn't reliable enough and a few cases fail, but I have no desire to spend time writing the manual expansion code for it. The DAG actually does a better job since it catches using v_add_lshl_u32 in the mixed SGPR/VGPR cases.	2020-03-09 12:36:51 -07:00
Cameron McInally	2ab8065df6	[AArch64][SVE] Add missing fp16 DestructiveInstType tests These tests should have been added with `a5b22b768f` in D73711. Differential Revision: https://reviews.llvm.org/D75767	2020-03-09 14:09:23 -05:00
Simon Pilgrim	4b130b883d	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - reduce vector width of X86ISD::BLENDI If we don't need the upper subvector elements of the BLENDI node then use a smaller vector size. This causes a couple of minor regressions in insertelement-ones.ll which are more examples of PR26018; given how cheap allones generation is I don't consider that a showstopper, just an annoyance (and there's plenty of other poor codegen cases in that file).	2020-03-09 18:29:28 +00:00
Craig Topper	3dcc0db15e	[X86] Teach combineToExtendBoolVectorInReg to create opportunities for using broadcast load instructions. If we're inserting a scalar that is smaller than the element size of the final VT, the value of the extra bits doesn't matter. Previously we any_extended in the scalar domain before inserting. This patch changes this to use a broadcast of the original scalar type and then a bitcast to the final type. This might enable the use of a broadcast load. This recovers regressions from `07d68c24aa` and `9fcd212e2f` without relying on alignment of the load. Differential Revision: https://reviews.llvm.org/D75835	2020-03-09 11:26:12 -07:00
Krzysztof Parzyszek	44205891ed	[Hexagon] Fix match pattern in a testcase	2020-03-09 09:09:58 -05:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
KAWASHIMA Takahiro	c8cd1a994d	[AArch64] Add support for Fujitsu A64FX A64FX is an Armv8.2-A CPU used in FUJITSU Supercomputer PRIMEHPC FX1000, PRIMEHPC FX700, and supercomputer Fugaku. https://www.fujitsu.com/global/products/computing/servers/supercomputer/specifications/ Differential Revision: https://reviews.llvm.org/D75594	2020-03-09 19:15:09 +09:00
Clement Courbet	6518b72f93	[ExpandMemCmp] Properly constant-fold all compares. Summary: This gets rid of duplicated code and diverging behaviour w.r.t. constants. Fixes PR45086. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75519	2020-03-09 10:40:52 +01:00
Clement Courbet	f7e6f5f8e3	[ExpandMemCmp] Properly constant-fold all compares. Summary: This gets rid of duplicated code and diverging behaviour w.r.t. constants. Fixes PR45086. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75519	2020-03-09 09:10:34 +01:00
Craig Topper	07d68c24aa	[X86] Remove isel patterns that matched vXi16 X86VBroadcast with i8->i16 aextload input. This was selecting VBROADCASTW which turned the 8-bit load into a 16-bit load if it happened to be 2 byte aligned. I have a plan to fix the regression with a follow up patch which I'll post shortly.	2020-03-08 19:16:24 -07:00

1 2 3 4 5 ...

33058 Commits