llvm-project

Commit Graph

Author	SHA1	Message	Date
Haocong.Lu	865fe131f8	[RISCV] Fix a mistake in PostprocessISelDAG With the condition N->use_empty(), the root node of DAG always misses peephole optimization. So a dummy node is needed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119934	2022-02-25 12:38:31 +00:00
Chenbing Zheng	b20e80aa59	[RISCV] DAG Combine vcpop and vfirst with VL=0 to li imm vcpop and vfirst are still useful when VL=0. vcpop equivalents to li 0 and vfirst equivalents to li -1, since no mask elements are active. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120302	2022-02-25 14:44:25 +08:00
Zakk Chen	4e115b7d88	[RISCV] Update computeTargetABI from llc as well as clang Clang computes the default ABI if -mabi is empty and encode it in LLVM IR module flag since D105555. For correctness, llc need to give the same target-abi (Options.MCOptions.ABIName) with ABI encoded in IR. The getSubtargetImpl already has a check for them only if Options.MCOptions.ABIName is not empty. In order to get more robustness we could have a check for explicit ABI, but now we have two different logic to compute the default ABI. The front-end ABI is defautl to the ilp32/ilp32e/lp64, and ilp32d/lp64d when hardware support for extension D. The backend ABI is default to the ilp32/ilp32e/lp64. Reviewed by: asb, jrtc27 Differential Revision: https://reviews.llvm.org/D118333	2022-02-24 21:55:44 -08:00
Craig Topper	6159f05955	[RISCV] Add tests for (neg (abs X)) where the abs has an additional user.	2022-02-24 12:59:57 -08:00
Craig Topper	2aa1c6fca1	[RISCV] Add Zbb RUN lines to neg-abs.ll.	2022-02-24 10:21:10 -08:00
Craig Topper	f69078b77f	[RISCV] Update some tests to use floating point ABI where it makes sense. Trying to reduce the diffs from D118333 for cases where it makes more sense to use an FP ABI. Reviewed By: asb, kito-cheng Differential Revision: https://reviews.llvm.org/D120447	2022-02-24 09:27:57 -08:00
Craig Topper	a975ca97c3	[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X). Add a new ISD opcode to represent the sign extending behavior of vmv.x.h. Keep the previous anyext opcode to allow the existing (fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing to generate a sign extend. For fmv.x.w we are able to match the sext_inreg in an isel pattern, but a 16-bit sext_inreg is lowered to a shift pair before isel. This seemed like a larger match than we should do in isel. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118974	2022-02-24 09:19:01 -08:00
Sanjay Patel	4a3708cd6b	[SDAG] remove shift that is redundant with part of funnel shift This is the SDAG translation of D120253 : https://alive2.llvm.org/ce/z/qHpmNn The SDAG nodes can have different operand types than the result value. We can see an example of that with AArch64 - the funnel shift amount is an i64 rather than i32. We may need to make that match even more flexible to handle post-legalization nodes, but I have not stepped into that yet. Differential Revision: https://reviews.llvm.org/D120264	2022-02-24 11:25:46 -05:00
lian wang	05e82be7ea	[RISCV][NFC] Remove useless intrinsic function declare in test of Zbp extension Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D120393	2022-02-24 08:43:13 +00:00
Chenbing.Zheng	2aac00e4a6	[RISCV] Add more tests for vcpop and vfirst with VL=0 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120300	2022-02-24 05:59:12 +00:00
Sanjay Patel	21d7c3bcc6	[DAG] try to convert multiply to shift via demanded bits This is a fix for a regression discussed in: https://github.com/llvm/llvm-project/issues/53829 We cleared more high multiplier bits with `995d400`, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216	2022-02-23 12:09:32 -05:00
Alex Bradbury	c5bcfb983e	[RISCV] Avoid infinite loop between DAGCombiner::visitMUL and RISCVISelLowering::transformAddImmMulImm See https://github.com/llvm/llvm-project/issues/53831 for a full discussion. The basic issue is that DAGCombiner::visitMUL and RISCVISelLowering;:transformAddImmMullImm get stuck in a loop, as the current checks in transformAddImmMulImm aren't sufficient to avoid all cases where DAGCombiner::isMulAddWithConstProfitable might trigger a transformation. This patch makes transformAddImmMulImm bail out if C0 (the constant used for multiplication) has more than one use. Differential Revision: https://reviews.llvm.org/D120332	2022-02-23 11:05:46 +00:00
Lian Wang	3497124771	[RISCV] Add more test for GORCI aliaes in Zbp extension Supplement tests for some aliaes of gorci. RV32: add orc4.h/orc2.h in rv32zbp.ll add orc.h/orc16/orc8/orc4/orc2/orc in rv32zbp-intrinsic.ll RV64: add orc4.h/orc2.h in rv64zbp.ll add orc.h/orc32/orc16/orc8/orc4/orc2/orc/orc16.w/orc8.w/ orc4.w/orc2.w/orc.w in rv64zbp-intrinsic.ll Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120388	2022-02-23 08:02:33 +00:00
Lian Wang	7abcb7ba87	[RISCV] Supplement more tests for GREVI aliaes in Zbp extension Supplement tests for some aliaes of grevi. RV32: add rev4.h/rev2.h in rv32zbp.ll add rev/rev2/rev4/rev8/rev16 in rv32zbp-intrinsic.ll RV64: add rev4.h/rev2.h in rv64zbp.ll add rev.h/rev/rev2/rev4/rev8/rev16/rev32/rev.w/rev2.w/ rev4.w/rev8.w/rev16.w in rv64zbp-intrinsic.ll Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120304	2022-02-23 06:15:52 +00:00
jacquesguan	5acd9c49a8	[RISCV] Add patterns for vector widening integer reduction instructions Add patterns for vector widening integer reduction instructions. Differential Revision: https://reviews.llvm.org/D117643	2022-02-22 14:14:05 +08:00
Lian Wang	294072e10b	[RISCV] Add more tests for SHLFI and UNSHFLI aliaes in Zbp extension RV32/RV64: zip.n/zip2.b/zip.b/zip4.h/zip2.h/zip.h unzip.n/unzip2.b/unzip.b/unzip4.h/unzip2.h/unzip.h Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120241	2022-02-22 02:20:32 +00:00
Zakk Chen	f7dfc5d1af	[RISCV] Optimize tail agnostic vmv.s.x which don't need to select tail value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120250	2022-02-21 14:53:37 -08:00
Craig Topper	90d240553d	[RISCV] Teach shouldSinkOperands to sink splat operands of vp.fma intrinsics. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120167	2022-02-21 11:52:59 -08:00
Sanjay Patel	bb850d422b	[AArch64][RISCV][x86] add tests for funnel shift with shift logic; NFC	2022-02-21 10:24:45 -05:00
Lian Wang	2e153038b4	[RISCV] Add tests for SHFLI and UNSHFLI aliases in Zbp extension Supplement tests alias of SHFLI and UNSHFLI instructions. RV32: zip8/zip4/zip2/unzip8/unzip4/unzip2 RV64: zip8.w/zip4.w/zip2.w/zip.w/zip8/zip4/zip2/zip/ unzip8.w/unzip4.w/unzip2.w/unzip.w/unzip8/unzip4/unzip2/unzip Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120015	2022-02-21 10:10:32 +00:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Sanjay Patel	7f827ebddc	[AArch64][RISCV][x86] add tests for mul-add demanded bits; NFC See #53829	2022-02-20 15:06:55 -05:00
Craig Topper	5489969550	[RISCV] Add IsRV32 to the isel pattern for ZIP_RV32/UNZIP_RV32. NFC I think the i32 in the pattern prevents this from matching on RV64, but using IsRV32 is safer. Add tests for RV64 to make sure we don't print zip or unzip because we incorrectly picked ZIP_RV32/UNZIP_RV32.	2022-02-18 22:38:14 -08:00
Zakk Chen	c6a3225bb0	[RISCV][NFC] Add some tail agnostic tests for nomask operations. Improve test coverage for tail agnostic nomask vslidedown/up, vmv.s.x vfmv.s.f and vcompress. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D120008	2022-02-17 16:37:51 -08:00
Zakk Chen	6f6ac4af62	[RISCV][NFC] Add tail agnostic tests for nomask Vector Reduction IR intrinsics. Improve test coverage for tail agnostic nomask Vector Reduction IR. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D119920	2022-02-17 16:33:55 -08:00
Zakk Chen	ca78312407	[RISCV] Add the policy operand for nomask vector Multiply-Add IR intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. The nomask vector Multiply-Add need a policy operand because merge value could not be undef. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D119727	2022-02-17 09:12:46 -08:00
Craig Topper	bbee9e77f3	[RISCV] Match shufflevector corresponding to slideup. This generalizes isElementRotate to work when there's only a single slide needed. I've removed matchShuffleAsSlideDown which is now redundant. Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D119759	2022-02-17 08:19:10 -08:00
Craig Topper	954fe404ab	[RISCV] Fix incorrect MemOperand copy converting splat+load to vlse. Due to an incorrect copy/paste from load intrinsic handling we checked if the splat node was a MemSDNode which of course it isn't. Instead get the MemOperand from the LoadSDNode for the source of the splat. This enables LICM to see the load is loop invariant and hoist it out of the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120014	2022-02-17 08:15:50 -08:00
Zakk Chen	eeb7754f68	[RISCV] Add the passthru operand for vmv.vv/vmv.vx/vfmv.vf IR intrinsics. Add the passthru operand for VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D119688	2022-02-17 06:38:14 -08:00
Zakk Chen	093ecccdab	[RISCV] Add the passthru operand for vadc/vsbc/vmerge/vfmerge IR intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D119686	2022-02-17 02:21:39 -08:00
Ben Shi	0b93e90971	Revert "[RISCV] LUI used for address computation should not isAsCheapAsAMove" This reverts commit `23a5073600`. Although this patch achieved better codegen in most cases, it is really important to accurately describe the cost of instructions. So I revert it.	2022-02-17 17:27:37 +08:00
Craig Topper	cfbbcc544c	[RISCV] Improve lowering of SHL_PARTS/SRL_PARTS/SRA_PARTS. Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power of 2 we can replace the sub with an xor. This allows us to use XORI instead of LI+SUB. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D119411	2022-02-16 09:22:11 -08:00
Craig Topper	1daa66d3fd	[SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP. Matches what is done for the int version. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D119793	2022-02-16 09:22:11 -08:00
Zakk Chen	e8973dd389	[RISCV] Add the passthru operand for some RVV nomask unary and nullary intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. My plan is to handle more complex operations in follow-up patches. Reviewers: frasercrmck Differential Revision: https://reviews.llvm.org/D118253	2022-02-15 22:34:06 -08:00
Zakk Chen	b784719904	[RISCV] Add the passthru operand for RVV nomask binary intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support i64 scalar in rv32. The masked VSLIDE1 would only emit mask undisturbed policy regardless of giving mask agnostic policy until InsertVSETVLI supports mask agnostic. Reviewed by: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D117989	2022-02-15 18:36:18 -08:00
Craig Topper	ab6e02dded	[RISCV] Match vwmulsu_vx with scalar splat input. This is a more generic version of D119110 that uses MaskedValueIsZero to do the matching and SimplifyDemandedBits to remove any unneeded AND instructions. Tests were taken from D119110. Reviewed By: Chenbing.Zheng Differential Revision: https://reviews.llvm.org/D119622	2022-02-15 08:45:21 -08:00
jacquesguan	bfb4c0c370	[RISCV] Recover the implication between Zve* extensions and the V extension. This revision recover the implication between Zve* extensions and the V extension. Differential Revision: https://reviews.llvm.org/D119210	2022-02-14 15:52:07 +08:00
Craig Topper	478c237e21	[RISCV] Fix incorrect extend type in vwmulsu combine. While matching widening multiply, if we matched an extend from i8->i32, i16->i64 or i8->i64, we need to reintroduce a narrower extend. If we're matching a vwmulsu we need to use a sext for op0 and a zext for op1. This bug exists in LLVM 14 and will need to be backported. Differential Revision: https://reviews.llvm.org/D119618	2022-02-12 12:47:20 -08:00
Dimitry Andric	7af3d4ab3d	Revert "[RISCV] Enable shrink wrap by default" This reverts commit `5ebdb07e7e`. Enabling shrink wrap by default can cause assertions or crashes, and these should first be investigated and fixed. For now, reverting the change so it can be cherry-picked into 14.0.0 is the safest choice.	2022-02-12 19:04:12 +01:00
Haocong.Lu	23a5073600	[RISCV] LUI used for address computation should not isAsCheapAsAMove A LUI instruction with flag RISCVII::MO_HI is usually used in conjunction with ADDI, and jointly complete address computation. To bind the cost evaluation of address computation, the LUI should not be regarded as a cheap move separately, which is consistent with ADDI. In this test case, it improves the unroll-loop code that the rematerialization of array's base address miss MachineCSE with Heuristics #1 at isProfitableToCSE. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D118216	2022-02-12 07:14:38 +00:00
Haocong.Lu	2e57156fea	[RISCV][test] Precommit a test of CSE within an unroll loop Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D118218	2022-02-12 06:30:03 +00:00
Craig Topper	541c9ba842	[RISCV] Insert VSETVLI at the end of a basic block if we didn't produce BlockInfo.Exit. This is an alternative to D118667 that instead of fixing the store to match phase 1, it tries to detect the mismatch with the expected value at the end of the block. This inserts a vsetvli after the vse to satisfy the requirement of the other basic block. We still have serious design issues in the pass, that is going to require some rethinking. Differential Revision: https://reviews.llvm.org/D119518	2022-02-11 09:34:16 -08:00
Craig Topper	f35ac872b8	Revert "[RISCV] Fix a vsetvli insertion bug involving loads/stores." and "[RISCC] Add missing words to comment. NFC" This reverts commit `f943c58cae`. and commit `7eb7810727`. This introduced a new bug that appears to be easier to hit. Differential Revision: https://reviews.llvm.org/D119517	2022-02-11 09:34:16 -08:00
Craig Topper	ba9a7ae798	[RISCV] Add test case for a vsetvli insertion bug found after D118667. We're missing a vsetvli before a vse after a redsum in this test. This appears to be because the vmv.s.x has a VL of 1, but did not trigger a vsetvli because it is a scalar move op and any non-zero VL would work. So it looked at it the predecessors and decided it was that they all had a non-zero vl. Then the redsum was visited, it also took the VL from the predecessors since the vmv.s.x and the 4 was found compatible. Finally we visit the vse and it looks at the BBLocalInfo and sees that is compatible because it contains a VL of 1 from the vmv.s.x, the first instruction in the block. BBLocalInfo was not updated when the vredsum was visited because BBLocalInfo was valid and no vsetvli was generated. I think fundamentally the vmv.s.x optimization has the same first phase and third phase not matching problem that D118667 was trying to fix for stores. Differential Revision: https://reviews.llvm.org/D119516	2022-02-11 09:34:16 -08:00
Zakk Chen	d224be3b99	[RISCV] Add the policy operand for some masked RVV ternary IR intrinsics. Masked reduction intrinsics are specical cases which don't need to have policy operand. The mask only affects which elements are read. It doesn't effect the destination register. The reduction intrinsics have a dedicated destination operand. If it is undef, we use tail agnostic. If it not undef we use tail undisturbed. Co-Authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D117681	2022-02-11 05:02:03 -08:00
Craig Topper	b0e77d5e48	[RISCV] Lower the shufflevector equivalent of vector.splice We can lower a vector splice to a vslidedown and a vslideup. The majority of the matching code here came from X86's code for matching PALIGNR and VPALIGND/Q. The slidedown and slideup lowering don't really require it to be concatenation, but it happened to be an interesting pattern with existing analysis code I could use. This helps with cases where the scalar loop optimizer forwarded a load result from a previous loop iteration. For example, this happens if the loop uses x[i] and x[i+1] on the same iteration. The scalar optimizer will forward x[i+1] load from the previous loop to satisfy x[i] on this loop. When this get vectorized it results in one element of a vector being forwarded from the previous loop to be concatenated with elements loaded on this iteration. Whether that's more efficient than doing a shifted loaded or reloading the single scalar and using vslide1up is an interesting question. But that's not something the backend can help with. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D119039	2022-02-10 09:39:35 -08:00
Fraser Cormack	fd43d99c93	[RISCV] Pre-process FP SPLAT_VECTOR to RISCVISD::VFMV_V_F_VL This patch builds on top of D119197 to canonicalize floating-point SPLAT_VECTOR as RISCVISD::VFMV_V_F_VL as a pre-process ISel step. This primarily benefits scalable-vector VP code, where our VP patterns only match VFMV_V_F_VL to reduce the burden on our ISel patterns, but where at the same time, scalable-vector code doesn't custom-legalize SPLAT_VECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117670	2022-02-10 09:56:00 +00:00
Chenbing.Zheng	c5d3b231e0	[RISCV] Add support for matching vwmaccsu/vwmaccus from fixed vectors Add pattern to match add and widening mul to vwmacc, and two multipliers are sext and zext. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119314	2022-02-10 01:59:31 +00:00
Craig Topper	c45c1b130b	[RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to replace sub from constant with neg. If the shift amount is (sub C, X) where C is 0 modulo the size of the shift, we can replace it with neg or negw. Similar is is done for AArch64 and X86. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D119089	2022-02-09 12:33:01 -08:00
Craig Topper	279b3b8179	[RISCV][VP] Lower VP_FMA to RVV instructions. We already had FMA_VL node, but we didn't have masked patterns. I have not added the fneg variations. I'll do those after I add llvm.vp.fneg. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D119196	2022-02-09 11:33:12 -08:00
Craig Topper	63e711549c	[RISCV] Lower VP_FNEG to RVV instructions Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D119269	2022-02-09 10:56:39 -08:00
Sander de Smalen	ec46232517	[DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)` This seems like an obvious fold, which leads to a few improvements. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118920	2022-02-09 14:30:01 +00:00
jacquesguan	5e71bbfb6c	[RISCV] Add patterns for vector widening floating-point fused multiply-add instructions Add patterns for vector widening floating-point fused multiply-add instructions. Differential Revision: https://reviews.llvm.org/D117546	2022-02-09 10:34:39 +08:00
Fraser Cormack	62c4ac764b	[RISCV] Optimize splats of extracted vector elements This patch adds an optimization to splat-like operations where the splatted value is extracted from a identically-sized vector. On RVV we can splat that via vrgather.vx/vrgather.vi without dropping to scalar beforehand. We do have a similar VECTOR_SHUFFLE-specific optimization but that only works on fixed-length vector types and for those with a constant splat lane. This patch extends this optimization to make it work on scalable-vector types and on unknown extract indices. It is performed during fixed-vector BUILD_VECTOR lowering and during a new DAGCombine on SPLAT_VECTOR for scalable vectors. Reviewed By: craig.topper, khchen Differential Revision: https://reviews.llvm.org/D118456	2022-02-08 10:35:25 +00:00
wangpc	c53d99c37d	[RISCV] Split f64 undef into two i32 undefs So that no store instruction will be generated. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118222	2022-02-08 13:42:15 +08:00
wangpc	cb0fff4397	[RISCV] Pre-commit test for D118222 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119212	2022-02-08 12:52:13 +08:00
Craig Topper	c35ccd2ac8	[DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb. rv64izbb has a RORW/ROLW instructions that operate on the lower 32-bits of a 64-bit value and sign extend bit 31 of the result. DAGCombiner won't match rotate idioms because the i32 type isn't Legal on riscv64. This patch teaches DAGCombiner to allow it if the type is going to be promoted and the target has Custom type legalization for ISD::ROTL or ISD::ROTR. I've restricted this to scalar types. It doesn't appear any in tree targets other than riscv64 have custom type legalization for rotates. If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR after type legalization, but I'd like to avoid that if possible. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119062	2022-02-06 10:58:12 -08:00
Craig Topper	f3a725af43	[RISCV] Add signext test for llvm.abs.i32 for rv64 Zbb. This shows that we don't preserve sign bits across the abs expansion, but I think we could if we used negw+max.	2022-02-05 21:26:47 -08:00
Craig Topper	c1cef111a3	Revert "[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X)." This reverts commit `673d68cd92`. This hadn't been reviewed yet.	2022-02-05 12:51:01 -08:00
Craig Topper	d1899da3a2	[RISCV] Add more tests for rotate idioms. Add more RUN lines. NFC We were only testing rotate idioms on rv32i. DAGCombiner won't form ISD::ROTL/ROTR unless those operations are Legal or Custom. They aren't for rv32 so we were only testing shift lowering. This commit adds i64 idioms and the idioms that mask the shift amount to avoid UB for a rotate of 0. I've added riscv64 and Zbb RUN lines to show that we do match rotate for XLen types when available. We currently miss i32 on rv64izbb.	2022-02-05 12:48:22 -08:00
Craig Topper	673d68cd92	[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X). Add a new ISD opcode to represent the sign extending behavior of vmv.x.h. Keep the previous anyext opcode to allow the existing (fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing to generate a sign extend. For fmv.x.w we are able to match the sext_inreg in an isel pattern, but a 16-bit sext_inreg is lowered to a shift pair before isel. This seemed like a larger match than we should do in isel. Differential Revision: https://reviews.llvm.org/D118974	2022-02-05 12:42:12 -08:00
Craig Topper	d752ea9a72	[RISCV] Remove exclusions for zext.h/zext.w from our (and (srl X, C1), C2) selection code. This code tries to replace the pattern with a pair of shifts, but we were excluding if the And could be a zext.h or zext.w. The SLLI/SRL pair is more compressible and doesn't come with much down side. We do regress one test case in rv64i-exhaustive-w-insts.ll but we can probably add a narrower exclusion for that case.	2022-02-04 17:10:48 -08:00
Craig Topper	1d8bbe3d25	[RISCV] Implement a basic version of AArch64RedundantCopyElimination pass. Using AArch64's original implementation for reference, this patch implements a pass to remove unneeded copies of X0. This pass runs after register allocation and looks to see if a register is implied to be 0 by a branch in the predecessor basic block. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118160	2022-02-04 10:43:46 -08:00
Craig Topper	234e54bdd8	[RISCV] Add more types of shuffles isShuffleMaskLegal. Add the vslidedown and interleave patterns that I recently implemented. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118952	2022-02-04 09:13:13 -08:00
Craig Topper	c83905a308	[RISCV] Add inline expansion for vector fround. This avoids a crash for scalable vectors and or scalarization for fixed vectors. The algorithm is different enough that I don't think it makes sense to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86 SSE2 output. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117247	2022-02-04 09:12:09 -08:00
Craig Topper	237eb37260	[RISCV] Add FMV_X_W and FMV_X_H to RISCVSExtWRemoval. Add -target-abi to sextw-removal.ll RUN lines to show benefit on new test case.	2022-02-03 09:40:47 -08:00
Shao-Ce SUN	005fd8aa70	[RISCV] Add support for Zihintpause extention Add support for the 'pause' hint instruction as an alias for 'fence w, 0'. To do this allow the 'fence' operands pred and succ to be set to 0 (the empty set). This will also allow future hints to be encoded as 'fence 0, <x>' and 'fence <x>, 0'. This patch revised from @mundaym's D93019. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117789	2022-02-03 20:55:47 +08:00
Craig Topper	b73d151a11	[RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub. This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv and a similar set for sub. I've included support for narrowing scalar splats that have known sign/zero bits similar to what was done for MUL_VL. The conversion to vwadd.vv proceeds in two phases. First we'll form a vwadd.wv by narrowing one of the operands. Then we'll visit the vwadd.wv to try to narrow the other operand. This turned out to be simpler than catching all the cases in one step. The forming of of vwadd.wv can happen for either operand for add, but only the right hand side for sub since sub isn't commutable. An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed during vector op legalization, but VMV_V_X_VL isn't usually formed until op legalization when BUILD_VECTORS are handled. This leads to VWADD_W_VL forming in one DAG combine round, and then a later DAG combine round sees the VMV_V_X_VL and needs to commute the operands to get the splat in position. This alone necessitated a VWADD_W_VL combine function which made forming vwadd.vv in two stages an easy choice. I've left out trying hard to form vwadd.wx instructions for now. It would only save an extend in the scalar domain which isn't as interesting. Might need to review the test coverage a bit. Most of the vwadd.wv instructions are coming from vXi64 tests on rv64. The tests were copy pasted from the existing multiply tests. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117954	2022-02-02 10:03:08 -08:00
Craig Topper	7f6441f96e	[TableGen][RISCV] Relax a restriction in generating patterns for commutable SDNodes. Previously, all children would be checked to see if any were an explicit Register. If anywhere no commutable patterns would be generated. This patch loosens the restriction to only check the children that are being commuted. Digging back through history, this code predates the existence of commutable intrinsics and commutable SDNodes with more than 2 operands. At that time the loop would count the number of children that weren't registers and if that was equal to 2 it would allow commuting. I don't think this loop was re-considered when commutable intrinsics were added or when we allowed SDNodes with more than 2 operands. This important for RISCV were our isel patterns have a V0 mask operand after the commutable operands on some RISCVISD opcodes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D117955	2022-02-01 21:07:03 -08:00
Craig Topper	7eb7810727	[RISCV] Fix a vsetvli insertion bug involving loads/stores. The first phase of the analysis can avoid a vsetvli if an earlier instruction in the block used an SEW and LMUL that when combined with the EEW of the load/store would produce the desired EMUL. If we avoided a vsetvli this will affect the global analysis we do in the second phase. The third phase where we really insert the vsetvlis needs to agree with the first phase. If it doesn't we can insert vsetvlis that invalidate the global analysis. In the test case there is a VSETVLI in the preheader that sets SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1. This VADD is followed by a store that wants wants SEW=32 LMUL=1/2. Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the VADD can be become EMUL=1 for the store. So the first phase determines no vsetvli is needed. The third phase manages CurInfo differently than BBInfo.Change from the first phase. CurInfo is only updated when we see a vsetvli or insert a vsetvli. This was done to allow predecessor block information from the global analysis to be applied to multiple instructions. Since the loop body has no vsetvli we won't update CurInfo for either the VADD or the VSE. This prevented us from checking the store vsetvli elision for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which invalidated the global analysis. To mitigate this, I've added a BBLocalInfo variable that more closely matches the first phase propagation. This gets updated based on the VADD and prevents emitting a vsetvli for the store like we did in the first phase. I wonder if we should do an earlier phase to handle the load/store case by adding more pseudo opcodes and changing the SEW/LMUL for those instructions before the insertion analysis. That might be more robust than trying to guarantee two phases make the same decision. Fixes the test from D118629. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118667	2022-02-01 07:29:01 -08:00
Fraser Cormack	e9ceeedf30	[RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests	2022-02-01 11:06:56 +00:00
Fraser Cormack	8d1169cf74	[RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests	2022-02-01 11:06:56 +00:00
Fraser Cormack	414f21ed23	[RISCV][1/3] Switch undef -> poison in VP RVV tests Inspired by a recent Discourse post on undef vs. poison usage, this series of patches should reduce the number of undefs in LLVM tests by around 10%. Only undef vector operands to insertelement/shufflevector have been handled, which are by far the most common we've got. The switchover is split into 3 fairly arbitrary clusters to make it slightly more manageable: vector predication, fixed-length vectors, scalable vectors.	2022-02-01 11:06:55 +00:00
Fraser Cormack	b00bce2a93	[RISCV] Add a test showing an incorrect VSETVLI insertion This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector operation. The loop body starts off with another SEW=64, LMUL=1 VADD vector operation, before switching to a SEW=32, LMUL=1/2 vector store instruction. We can see that the VSETVLI insertion pass omits a VSETVLI before the VADD (thinking it inherits its configuration from the preheader) but does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in a miscompilation as when the loop comes back around, the VADD is incorrectly configured with SEW=32, LMUL=1/2. It appears to be a bad load/store optimization, as replacing the vector store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The issue is therefore possibly arising from canSkipVSETVLIForLoadStore. Differential Revision: https://reviews.llvm.org/D118629	2022-02-01 10:21:29 +00:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Craig Topper	aae947e860	[RISCV] Separate the Zfhmin and Zfh extensions. The spec doesn't seem to be written as if Zfh implies Zfhmin. They seem to be separate extensions. This patch moves the instructions from Zfhmin to be enabled with either the Zfh or Zfhmin extensions. Reviewed By: achieveartificialintelligence Differential Revision: https://reviews.llvm.org/D118581	2022-01-31 09:06:43 -08:00
Craig Topper	e1075186a6	[RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV. We can use the RISCVISD::GREV encoding that swaps the bits in each byte. This allows it to use the existing computeKnownBits support for RISCVISD::GREV.	2022-01-30 12:41:09 -08:00
Craig Topper	d8f929a567	[RISCV] Custom legalize BITREVERSE with Zbkb. With Zbkb, a bitreverse can be split into a rev8 and a brev8. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D118430	2022-01-28 23:11:12 -08:00
jacquesguan	1276678982	[RISCV] Improve extract_vector_elt for fixed mask registers. Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit. Differential Revision: https://reviews.llvm.org/D117389	2022-01-29 11:07:53 +08:00
Craig Topper	ea05ee9059	[RISCV] Preserve VL when truncating i64 gather/scatter indices on RV32. We were creating a truncate with the default for the type, but for VP intrinsics we have a VL that we should use. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118406	2022-01-28 09:25:30 -08:00
Kito Cheng	a9d5bb926d	[RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32 `__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as default name for fp16 and fp32 conversion in LLVM. However RISC-V GCC using default naming scheme for that, which is `__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI incompatible issue. Although we didn't have formal runtime ABI spec to specify those naming convention yet, but I think it would be great to fix the incompatible issue first. And I've plan to create a runtime ABI spec undere psABI spec this year. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118207	2022-01-29 00:01:00 +08:00
Fraser Cormack	10879c26a2	[RISCV] Add tests for possible splat optimizations These splats -- whether BUILD_VECTOR or SPLAT_VECTOR -- are formed by first extracting a value from a vector and splatting it to all elements of the destination vector. These could be performed more optimally, avoiding the drop to scalar, using RVV's vrgather, for example.	2022-01-28 12:32:29 +00:00
Fraser Cormack	6297f929f7	[RISCV] Fix FileCheck prefixes in RVV test The LMULMAX check names didn't match the options we were passing to llc (they were swapped around) and we were silently missing coverage for one test which differs between RV32 and RV64.	2022-01-28 12:08:30 +00:00
Craig Topper	3e98ce45b6	[RISCV] Add Zbkb RUN lines to bswap-bitreverse.ll. NFC	2022-01-27 21:21:17 -08:00
Craig Topper	dcd751b26e	[RISCV] Split bswap-bitreverse-ctlz-cttz-ctpop.ll into two files bswap/bitreverse and ctlz/cttz/ctpop. NFC Add Zbkb command lines to the bswap/bitreverse test.	2022-01-27 21:12:03 -08:00
Chenbing.Zheng	6d6c44a3f3	[RISCV] Add support for matching vwmulsu from fixed vectors According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply vwmulsu.vv vd, vs2, vs1, vm # vector-vector vwmulsu.vx vd, vs2, rs1, vm # vector-scalar It is worth noting that signed op is only for vs2. For vwmulsu.vv, we can swap two ops, and don't care which is sign extension, but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1). I specifically added two functions ending with _swap in the test case. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118215	2022-01-28 02:33:30 +00:00
Craig Topper	70e1cc6792	[RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1. At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We don't handle the case where the user used vmsle.vx intrinsic with -1.	2022-01-27 11:48:27 -08:00
Fraser Cormack	fed2f690a9	[RISCV] Fix test case expected output I didn't correctly update this before landing D118058.	2022-01-27 09:28:09 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Wu Xinlong	615d71d9a3	[RISCV][CodeGen] Implement IR Intrinsic support for K extension This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar \| 1.0]] Co-author：@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102310	2022-01-27 15:53:35 +08:00
Craig Topper	b3bec6e453	[RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate. This matches what the spec uses for the vncvt.x.x.w assembly pseudoinstruction. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D118295	2022-01-26 18:38:13 -08:00
Paweł Bylica	bdb7837481	[test][DAGCombine] Add more tests for carry diamond. NFC	2022-01-27 00:25:26 +01:00
Craig Topper	f487a76430	[RISCV] Add hasStdExtZbp() to hasAndNotCompare.	2022-01-26 13:54:05 -08:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
Zakk Chen	9273378b85	[RISCV] Add the passthru operand for RVV nomask load intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com> Reviewers: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D117647	2022-01-25 17:31:36 -08:00
eopXD	b089e4072a	[RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x EEW=64 of mulh and its vairants requires V extension. Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117947	2022-01-25 09:55:05 -08:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Victor Perez	19d3dc6e22	[VP] Update CodeGen/RISCV/rvv/vpgather-sdnode.ll test	2022-01-25 10:49:05 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00

1 2 3 4 5 ...

1458 Commits