Commit Graph

1458 Commits

Author SHA1 Message Date
Haocong.Lu 865fe131f8 [RISCV] Fix a mistake in PostprocessISelDAG
With the condition N->use_empty(), the root node of DAG always
misses peephole optimization. So a dummy node is needed.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119934
2022-02-25 12:38:31 +00:00
Chenbing Zheng b20e80aa59 [RISCV] DAG Combine vcpop and vfirst with VL=0 to li imm
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120302
2022-02-25 14:44:25 +08:00
Zakk Chen 4e115b7d88 [RISCV] Update computeTargetABI from llc as well as clang
Clang computes the default ABI if -mabi is empty
and encode it in LLVM IR module flag since D105555.
For correctness, llc need to give the same target-abi
(Options.MCOptions.ABIName) with ABI encoded in IR.
The getSubtargetImpl already has a check for them only if
Options.MCOptions.ABIName is not empty.

In order to get more robustness we could have a check for
explicit ABI, but now we have two different logic to
compute the default ABI.

The front-end ABI is defautl to the ilp32/ilp32e/lp64, and
ilp32d/lp64d when hardware support for extension D.
The backend ABI is default to the ilp32/ilp32e/lp64.

Reviewed by: asb, jrtc27

Differential Revision: https://reviews.llvm.org/D118333
2022-02-24 21:55:44 -08:00
Craig Topper 6159f05955 [RISCV] Add tests for (neg (abs X)) where the abs has an additional user. 2022-02-24 12:59:57 -08:00
Craig Topper 2aa1c6fca1 [RISCV] Add Zbb RUN lines to neg-abs.ll. 2022-02-24 10:21:10 -08:00
Craig Topper f69078b77f [RISCV] Update some tests to use floating point ABI where it makes sense.
Trying to reduce the diffs from D118333 for cases where it makes
more sense to use an FP ABI.

Reviewed By: asb, kito-cheng

Differential Revision: https://reviews.llvm.org/D120447
2022-02-24 09:27:57 -08:00
Craig Topper a975ca97c3 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118974
2022-02-24 09:19:01 -08:00
Sanjay Patel 4a3708cd6b [SDAG] remove shift that is redundant with part of funnel shift
This is the SDAG translation of D120253 :
https://alive2.llvm.org/ce/z/qHpmNn

The SDAG nodes can have different operand types than the result value.
We can see an example of that with AArch64 - the funnel shift amount
is an i64 rather than i32.

We may need to make that match even more flexible to handle
post-legalization nodes, but I have not stepped into that yet.

Differential Revision: https://reviews.llvm.org/D120264
2022-02-24 11:25:46 -05:00
lian wang 05e82be7ea [RISCV][NFC] Remove useless intrinsic function declare in test of Zbp extension
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D120393
2022-02-24 08:43:13 +00:00
Chenbing.Zheng 2aac00e4a6 [RISCV] Add more tests for vcpop and vfirst with VL=0
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120300
2022-02-24 05:59:12 +00:00
Sanjay Patel 21d7c3bcc6 [DAG] try to convert multiply to shift via demanded bits
This is a fix for a regression discussed in:
https://github.com/llvm/llvm-project/issues/53829

We cleared more high multiplier bits with 995d400,
but that can lead to worse codegen because we would fail
to recognize the now disguised multiplication by neg-power-of-2
as a shift-left. The problem exists independently of the IR
change in the case that the multiply already had cleared high
bits. We also convert shl+sub into mul+add in instcombine's
negator.

This patch fills in the high-bits to see the shift transform
opportunity. Alive2 attempt to show correctness:
https://alive2.llvm.org/ce/z/GgSKVX

The AArch64, RISCV, and MIPS diffs look like clear wins. The
x86 code requires an extra move register in the minimal examples,
but it's still an improvement to get rid of the multiply on all
CPUs that I am aware of (because multiply is never as fast as a
shift).

There's a potential follow-up noted by the TODO comment. We
should already convert that pattern into shl+add in IR, so
it's probably not common:
https://alive2.llvm.org/ce/z/7QY_Ga

Fixes #53829

Differential Revision: https://reviews.llvm.org/D120216
2022-02-23 12:09:32 -05:00
Alex Bradbury c5bcfb983e [RISCV] Avoid infinite loop between DAGCombiner::visitMUL and RISCVISelLowering::transformAddImmMulImm
See https://github.com/llvm/llvm-project/issues/53831 for a full discussion.

The basic issue is that DAGCombiner::visitMUL and
RISCVISelLowering;:transformAddImmMullImm get stuck in a loop, as the
current checks in transformAddImmMulImm aren't sufficient to avoid all
cases where DAGCombiner::isMulAddWithConstProfitable might trigger a
transformation. This patch makes transformAddImmMulImm bail out if C0
(the constant used for multiplication) has more than one use.

Differential Revision: https://reviews.llvm.org/D120332
2022-02-23 11:05:46 +00:00
Lian Wang 3497124771 [RISCV] Add more test for GORCI aliaes in Zbp extension
Supplement tests for some aliaes of gorci.

RV32:
add orc4.h/orc2.h in rv32zbp.ll
add orc.h/orc16/orc8/orc4/orc2/orc in rv32zbp-intrinsic.ll

RV64:
add orc4.h/orc2.h in rv64zbp.ll
add orc.h/orc32/orc16/orc8/orc4/orc2/orc/orc16.w/orc8.w/
    orc4.w/orc2.w/orc.w in rv64zbp-intrinsic.ll

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120388
2022-02-23 08:02:33 +00:00
Lian Wang 7abcb7ba87 [RISCV] Supplement more tests for GREVI aliaes in Zbp extension
Supplement tests for some aliaes of grevi.

RV32:
add rev4.h/rev2.h in rv32zbp.ll
add rev/rev2/rev4/rev8/rev16 in rv32zbp-intrinsic.ll

RV64:
add rev4.h/rev2.h in rv64zbp.ll
add rev.h/rev/rev2/rev4/rev8/rev16/rev32/rev.w/rev2.w/
    rev4.w/rev8.w/rev16.w in rv64zbp-intrinsic.ll

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120304
2022-02-23 06:15:52 +00:00
jacquesguan 5acd9c49a8 [RISCV] Add patterns for vector widening integer reduction instructions
Add patterns for vector widening integer reduction instructions.

Differential Revision: https://reviews.llvm.org/D117643
2022-02-22 14:14:05 +08:00
Lian Wang 294072e10b [RISCV] Add more tests for SHLFI and UNSHFLI aliaes in Zbp extension
RV32/RV64:
zip.n/zip2.b/zip.b/zip4.h/zip2.h/zip.h
unzip.n/unzip2.b/unzip.b/unzip4.h/unzip2.h/unzip.h

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120241
2022-02-22 02:20:32 +00:00
Zakk Chen f7dfc5d1af [RISCV] Optimize tail agnostic vmv.s.x which don't need to select tail value.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120250
2022-02-21 14:53:37 -08:00
Craig Topper 90d240553d [RISCV] Teach shouldSinkOperands to sink splat operands of vp.fma intrinsics.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D120167
2022-02-21 11:52:59 -08:00
Sanjay Patel bb850d422b [AArch64][RISCV][x86] add tests for funnel shift with shift logic; NFC 2022-02-21 10:24:45 -05:00
Lian Wang 2e153038b4 [RISCV] Add tests for SHFLI and UNSHFLI aliases in Zbp extension
Supplement tests alias of SHFLI and UNSHFLI instructions.

RV32: zip8/zip4/zip2/unzip8/unzip4/unzip2

RV64: zip8.w/zip4.w/zip2.w/zip.w/zip8/zip4/zip2/zip/
      unzip8.w/unzip4.w/unzip2.w/unzip.w/unzip8/unzip4/unzip2/unzip

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120015
2022-02-21 10:10:32 +00:00
Craig Topper 440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
Sanjay Patel 7f827ebddc [AArch64][RISCV][x86] add tests for mul-add demanded bits; NFC
See #53829
2022-02-20 15:06:55 -05:00
Craig Topper 5489969550 [RISCV] Add IsRV32 to the isel pattern for ZIP_RV32/UNZIP_RV32. NFC
I think the i32 in the pattern prevents this from matching on RV64,
but using IsRV32 is safer.

Add tests for RV64 to make sure we don't print zip or unzip
because we incorrectly picked ZIP_RV32/UNZIP_RV32.
2022-02-18 22:38:14 -08:00
Zakk Chen c6a3225bb0 [RISCV][NFC] Add some tail agnostic tests for nomask operations.
Improve test coverage for tail agnostic nomask vslidedown/up, vmv.s.x
vfmv.s.f and vcompress.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D120008
2022-02-17 16:37:51 -08:00
Zakk Chen 6f6ac4af62 [RISCV][NFC] Add tail agnostic tests for nomask Vector Reduction IR intrinsics.
Improve test coverage for tail agnostic nomask Vector Reduction IR.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D119920
2022-02-17 16:33:55 -08:00
Zakk Chen ca78312407 [RISCV] Add the policy operand for nomask vector Multiply-Add IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.

The nomask vector Multiply-Add need a policy operand
because merge value could not be undef.

Reviewed By: monkchiang

Differential Revision: https://reviews.llvm.org/D119727
2022-02-17 09:12:46 -08:00
Craig Topper bbee9e77f3 [RISCV] Match shufflevector corresponding to slideup.
This generalizes isElementRotate to work when there's only a single
slide needed. I've removed matchShuffleAsSlideDown which is now
redundant.

Reviewed By: frasercrmck, khchen

Differential Revision: https://reviews.llvm.org/D119759
2022-02-17 08:19:10 -08:00
Craig Topper 954fe404ab [RISCV] Fix incorrect MemOperand copy converting splat+load to vlse.
Due to an incorrect copy/paste from load intrinsic handling we
checked if the splat node was a MemSDNode which of course it isn't.

Instead get the MemOperand from the LoadSDNode for the source of
the splat.

This enables LICM to see the load is loop invariant and hoist it
out of the loop.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D120014
2022-02-17 08:15:50 -08:00
Zakk Chen eeb7754f68 [RISCV] Add the passthru operand for vmv.vv/vmv.vx/vfmv.vf IR intrinsics.
Add the passthru operand for
VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also.

The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D119688
2022-02-17 06:38:14 -08:00
Zakk Chen 093ecccdab [RISCV] Add the passthru operand for vadc/vsbc/vmerge/vfmerge IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D119686
2022-02-17 02:21:39 -08:00
Ben Shi 0b93e90971 Revert "[RISCV] LUI used for address computation should not isAsCheapAsAMove"
This reverts commit 23a5073600.

Although this patch achieved better codegen in most cases, it is really
important to accurately describe the cost of instructions. So I revert it.
2022-02-17 17:27:37 +08:00
Craig Topper cfbbcc544c [RISCV] Improve lowering of SHL_PARTS/SRL_PARTS/SRA_PARTS.
Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this
value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power
of 2 we can replace the sub with an xor. This allows us to use XORI
instead of LI+SUB.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D119411
2022-02-16 09:22:11 -08:00
Craig Topper 1daa66d3fd [SelectionDAG] Add SPLAT_VECTOR to SelectionDAG::isConstantFPBuildVectorOrConstantFP.
Matches what is done for the int version.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D119793
2022-02-16 09:22:11 -08:00
Zakk Chen e8973dd389 [RISCV] Add the passthru operand for some RVV nomask unary and nullary intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

My plan is to handle more complex operations in follow-up patches.

Reviewers: frasercrmck

Differential Revision: https://reviews.llvm.org/D118253
2022-02-15 22:34:06 -08:00
Zakk Chen b784719904 [RISCV] Add the passthru operand for RVV nomask binary intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.

The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.

Reviewed by: craig.topper, rogfer01

Differential Revision: https://reviews.llvm.org/D117989
2022-02-15 18:36:18 -08:00
Craig Topper ab6e02dded [RISCV] Match vwmulsu_vx with scalar splat input.
This is a more generic version of D119110 that uses MaskedValueIsZero
to do the matching and SimplifyDemandedBits to remove any unneeded
AND instructions.

Tests were taken from D119110.

Reviewed By: Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D119622
2022-02-15 08:45:21 -08:00
jacquesguan bfb4c0c370 [RISCV] Recover the implication between Zve* extensions and the V extension.
This revision recover the implication between Zve* extensions and the V extension.

Differential Revision: https://reviews.llvm.org/D119210
2022-02-14 15:52:07 +08:00
Craig Topper 478c237e21 [RISCV] Fix incorrect extend type in vwmulsu combine.
While matching widening multiply, if we matched an extend from i8->i32,
i16->i64 or i8->i64, we need to reintroduce a narrower extend. If we're
matching a vwmulsu we need to use a sext for op0 and a zext for op1.

This bug exists in LLVM 14 and will need to be backported.

Differential Revision: https://reviews.llvm.org/D119618
2022-02-12 12:47:20 -08:00
Dimitry Andric 7af3d4ab3d Revert "[RISCV] Enable shrink wrap by default"
This reverts commit 5ebdb07e7e.

Enabling shrink wrap by default can cause assertions or crashes, and
these should first be investigated and fixed. For now, reverting the
change so it can be cherry-picked into 14.0.0 is the safest choice.
2022-02-12 19:04:12 +01:00
Haocong.Lu 23a5073600 [RISCV] LUI used for address computation should not isAsCheapAsAMove
A LUI instruction with flag RISCVII::MO_HI is usually used in conjunction
with ADDI, and jointly complete address computation. To bind the cost
evaluation of address computation, the LUI should not be regarded as a cheap
 move separately, which is consistent with ADDI.

In this test case, it improves the unroll-loop code that the rematerialization
of array's base address miss MachineCSE with Heuristics #1 at isProfitableToCSE.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D118216
2022-02-12 07:14:38 +00:00
Haocong.Lu 2e57156fea [RISCV][test] Precommit a test of CSE within an unroll loop
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D118218
2022-02-12 06:30:03 +00:00
Craig Topper 541c9ba842 [RISCV] Insert VSETVLI at the end of a basic block if we didn't produce BlockInfo.Exit.
This is an alternative to D118667 that instead of fixing the store
to match phase 1, it tries to detect the mismatch with the expected
value at the end of the block. This inserts a vsetvli after the vse
to satisfy the requirement of the other basic block.

We still have serious design issues in the pass, that is going to
require some rethinking.

Differential Revision: https://reviews.llvm.org/D119518
2022-02-11 09:34:16 -08:00
Craig Topper f35ac872b8 Revert "[RISCV] Fix a vsetvli insertion bug involving loads/stores." and "[RISCC] Add missing words to comment. NFC"
This reverts commit f943c58cae.
and commit 7eb7810727.

This introduced a new bug that appears to be easier to hit.

Differential Revision: https://reviews.llvm.org/D119517
2022-02-11 09:34:16 -08:00
Craig Topper ba9a7ae798 [RISCV] Add test case for a vsetvli insertion bug found after D118667.
We're missing a vsetvli before a vse after a redsum in this test.

This appears to be because the vmv.s.x has a VL of 1, but did not
trigger a vsetvli because it is a scalar move op and any non-zero
VL would work. So it looked at it the predecessors and decided it was
that they all had a non-zero vl. Then the redsum was visited, it
also took the VL from the predecessors since the vmv.s.x and the 4
was found compatible.

Finally we visit the vse and it looks at the BBLocalInfo and sees
that is compatible because it contains a VL of 1 from the vmv.s.x,
the first instruction in the block. BBLocalInfo was not updated
when the vredsum was visited because BBLocalInfo was valid and no
vsetvli was generated.

I think fundamentally the vmv.s.x optimization has the same first
phase and third phase not matching problem that D118667 was trying
to fix for stores.

Differential Revision: https://reviews.llvm.org/D119516
2022-02-11 09:34:16 -08:00
Zakk Chen d224be3b99 [RISCV] Add the policy operand for some masked RVV ternary IR intrinsics.
Masked reduction intrinsics are specical cases which don't need to have policy
operand. The mask only affects which elements are read. It doesn't effect the
destination register.
The reduction intrinsics have a dedicated destination operand. If it
is undef, we use tail agnostic. If it not undef we use tail
undisturbed.

Co-Authored-by: Craig Topper <craig.topper@sifive.com>

Differential Revision: https://reviews.llvm.org/D117681
2022-02-11 05:02:03 -08:00
Craig Topper b0e77d5e48 [RISCV] Lower the shufflevector equivalent of vector.splice
We can lower a vector splice to a vslidedown and a vslideup.

The majority of the matching code here came from X86's code for matching
PALIGNR and VPALIGND/Q.

The slidedown and slideup lowering don't really require it to be concatenation,
but it happened to be an interesting pattern with existing analysis code I
could use.

This helps with cases where the scalar loop optimizer forwarded a load
result from a previous loop iteration. For example, this happens if the
loop uses x[i] and x[i+1] on the same iteration. The scalar optimizer
will forward x[i+1] load from the previous loop to satisfy x[i] on this
loop. When this get vectorized it results in one element of a vector
being forwarded from the previous loop to be concatenated with elements
loaded on this iteration.

Whether that's more efficient than doing a shifted loaded or reloading
the single scalar and using vslide1up is an interesting question.
But that's not something the backend can help with.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D119039
2022-02-10 09:39:35 -08:00
Fraser Cormack fd43d99c93 [RISCV] Pre-process FP SPLAT_VECTOR to RISCVISD::VFMV_V_F_VL
This patch builds on top of D119197 to canonicalize floating-point
SPLAT_VECTOR as RISCVISD::VFMV_V_F_VL as a pre-process ISel step.

This primarily benefits scalable-vector VP code, where our VP patterns
only match VFMV_V_F_VL to reduce the burden on our ISel patterns, but
where at the same time, scalable-vector code doesn't custom-legalize
SPLAT_VECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117670
2022-02-10 09:56:00 +00:00
Chenbing.Zheng c5d3b231e0 [RISCV] Add support for matching vwmaccsu/vwmaccus from fixed vectors
Add pattern to match add and widening mul to vwmacc, and
two multipliers are sext and zext.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119314
2022-02-10 01:59:31 +00:00
Craig Topper c45c1b130b [RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to replace sub from constant with neg.
If the shift amount is (sub C, X) where C is 0 modulo the size of
the shift, we can replace it with neg or negw.

Similar is is done for AArch64 and X86.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D119089
2022-02-09 12:33:01 -08:00
Craig Topper 279b3b8179 [RISCV][VP] Lower VP_FMA to RVV instructions.
We already had FMA_VL node, but we didn't have masked patterns.
I have not added the fneg variations. I'll do those after I add
llvm.vp.fneg.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119196
2022-02-09 11:33:12 -08:00
Craig Topper 63e711549c [RISCV] Lower VP_FNEG to RVV instructions
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119269
2022-02-09 10:56:39 -08:00
Sander de Smalen ec46232517 [DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)`
This seems like an obvious fold, which leads to a few improvements.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118920
2022-02-09 14:30:01 +00:00
jacquesguan 5e71bbfb6c [RISCV] Add patterns for vector widening floating-point fused multiply-add instructions
Add patterns for vector widening floating-point fused multiply-add instructions.

Differential Revision: https://reviews.llvm.org/D117546
2022-02-09 10:34:39 +08:00
Fraser Cormack 62c4ac764b [RISCV] Optimize splats of extracted vector elements
This patch adds an optimization to splat-like operations where the
splatted value is extracted from a identically-sized vector. On RVV we
can splat that via vrgather.vx/vrgather.vi without dropping to scalar
beforehand.

We do have a similar VECTOR_SHUFFLE-specific optimization but that only
works on fixed-length vector types and for those with a constant splat
lane. This patch extends this optimization to make it work on
scalable-vector types and on unknown extract indices.

It is performed during fixed-vector BUILD_VECTOR lowering and during a
new DAGCombine on SPLAT_VECTOR for scalable vectors.

Reviewed By: craig.topper, khchen

Differential Revision: https://reviews.llvm.org/D118456
2022-02-08 10:35:25 +00:00
wangpc c53d99c37d [RISCV] Split f64 undef into two i32 undefs
So that no store instruction will be generated.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118222
2022-02-08 13:42:15 +08:00
wangpc cb0fff4397 [RISCV] Pre-commit test for D118222
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119212
2022-02-08 12:52:13 +08:00
Craig Topper c35ccd2ac8 [DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb.
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.

DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.

This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.

If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119062
2022-02-06 10:58:12 -08:00
Craig Topper f3a725af43 [RISCV] Add signext test for llvm.abs.i32 for rv64 Zbb.
This shows that we don't preserve sign bits across the
abs expansion, but I think we could if we used negw+max.
2022-02-05 21:26:47 -08:00
Craig Topper c1cef111a3 Revert "[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X)."
This reverts commit 673d68cd92.

This hadn't been reviewed yet.
2022-02-05 12:51:01 -08:00
Craig Topper d1899da3a2 [RISCV] Add more tests for rotate idioms. Add more RUN lines. NFC
We were only testing rotate idioms on rv32i. DAGCombiner won't
form ISD::ROTL/ROTR unless those operations are Legal or Custom.
They aren't for rv32 so we were only testing shift lowering.

This commit adds i64 idioms and the idioms that mask the shift
amount to avoid UB for a rotate of 0. I've added riscv64 and Zbb
RUN lines to show that we do match rotate for XLen types when
available. We currently miss i32 on rv64izbb.
2022-02-05 12:48:22 -08:00
Craig Topper 673d68cd92 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Differential Revision: https://reviews.llvm.org/D118974
2022-02-05 12:42:12 -08:00
Craig Topper d752ea9a72 [RISCV] Remove exclusions for zext.h/zext.w from our (and (srl X, C1), C2) selection code.
This code tries to replace the pattern with a pair of shifts, but
we were excluding if the And could be a zext.h or zext.w. The SLLI/SRL
pair is more compressible and doesn't come with much down side.

We do regress one test case in rv64i-exhaustive-w-insts.ll but we
can probably add a narrower exclusion for that case.
2022-02-04 17:10:48 -08:00
Craig Topper 1d8bbe3d25 [RISCV] Implement a basic version of AArch64RedundantCopyElimination pass.
Using AArch64's original implementation for reference, this patch
implements a pass to remove unneeded copies of X0. This pass runs
after register allocation and looks to see if a register is implied
to be 0 by a branch in the predecessor basic block.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118160
2022-02-04 10:43:46 -08:00
Craig Topper 234e54bdd8 [RISCV] Add more types of shuffles isShuffleMaskLegal.
Add the vslidedown and interleave patterns that I recently implemented.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118952
2022-02-04 09:13:13 -08:00
Craig Topper c83905a308 [RISCV] Add inline expansion for vector fround.
This avoids a crash for scalable vectors and or scalarization for
fixed vectors.

The algorithm is different enough that I don't think it makes sense
to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86
SSE2 output.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117247
2022-02-04 09:12:09 -08:00
Craig Topper 237eb37260 [RISCV] Add FMV_X_W and FMV_X_H to RISCVSExtWRemoval.
Add -target-abi to sextw-removal.ll RUN lines to show benefit on
new test case.
2022-02-03 09:40:47 -08:00
Shao-Ce SUN 005fd8aa70 [RISCV] Add support for Zihintpause extention
Add support for the 'pause' hint instruction as an alias for
'fence w, 0'. To do this allow the 'fence' operands pred and succ
to be set to 0 (the empty set). This will also allow future hints
to be encoded as 'fence 0, <x>' and 'fence <x>, 0'.

This patch revised from @mundaym's D93019.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117789
2022-02-03 20:55:47 +08:00
Craig Topper b73d151a11 [RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub.
This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv
and a similar set for sub.

I've included support for narrowing scalar splats that have known
sign/zero bits similar to what was done for MUL_VL.

The conversion to vwadd.vv proceeds in two phases. First we'll form
a vwadd.wv by narrowing one of the operands. Then we'll visit the
vwadd.wv to try to narrow the other operand. This turned out to be
simpler than catching all the cases in one step. The forming of of
vwadd.wv can happen for either operand for add, but only the right
hand side for sub since sub isn't commutable.

An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed
during vector op legalization, but VMV_V_X_VL isn't usually formed
until op legalization when BUILD_VECTORS are handled. This leads to
VWADD_W_VL forming in one DAG combine round, and then a later DAG combine
round sees the VMV_V_X_VL and needs to commute the operands to get the
splat in position. This alone necessitated a VWADD_W_VL combine function
which made forming vwadd.vv in two stages an easy choice.

I've left out trying hard to form vwadd.wx instructions for now. It would
only save an extend in the scalar domain which isn't as interesting.

Might need to review the test coverage a bit. Most of the vwadd.wv
instructions are coming from vXi64 tests on rv64. The tests were
copy pasted from the existing multiply tests.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117954
2022-02-02 10:03:08 -08:00
Craig Topper 7f6441f96e [TableGen][RISCV] Relax a restriction in generating patterns for commutable SDNodes.
Previously, all children would be checked to see if any were an
explicit Register. If anywhere no commutable patterns would be
generated. This patch loosens the restriction to only check the
children that are being commuted.

Digging back through history, this code predates the existence of
commutable intrinsics and commutable SDNodes with more than 2
operands. At that time the loop would count the number of children that
weren't registers and if that was equal to 2 it would allow commuting.
I don't think this loop was re-considered when commutable
intrinsics were added or when we allowed SDNodes with more than 2
operands.

This important for RISCV were our isel patterns have a V0 mask
operand after the commutable operands on some RISCVISD opcodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D117955
2022-02-01 21:07:03 -08:00
Craig Topper 7eb7810727 [RISCV] Fix a vsetvli insertion bug involving loads/stores.
The first phase of the analysis can avoid a vsetvli if an earlier
instruction in the block used an SEW and LMUL that when combined with
the EEW of the load/store would produce the desired EMUL. If we
avoided a vsetvli this will affect the global analysis we do in the
second phase.

The third phase where we really insert the vsetvlis needs to agree
with the first phase. If it doesn't we can insert vsetvlis that
invalidate the global analysis.

In the test case there is a VSETVLI in the preheader that sets
SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1.
This VADD is followed by a store that wants wants SEW=32 LMUL=1/2.
Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the
VADD can be become EMUL=1 for the store. So the first phase determines no
vsetvli is needed.

The third phase manages CurInfo differently than BBInfo.Change from the
first phase. CurInfo is only updated when we see a vsetvli or insert
a vsetvli. This was done to allow predecessor block information from
the global analysis to be applied to multiple instructions. Since the
loop body has no vsetvli we won't update CurInfo for either the VADD
or the VSE. This prevented us from checking the store vsetvli elision
for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which
invalidated the global analysis.

To mitigate this, I've added a BBLocalInfo variable that more closely
matches the first phase propagation. This gets updated based on the
VADD and prevents emitting a vsetvli for the store like we did in the
first phase.

I wonder if we should do an earlier phase to handle the load/store case
by adding more pseudo opcodes and changing the SEW/LMUL for those
instructions before the insertion analysis. That might be more robust
than trying to guarantee two phases make the same decision.

Fixes the test from D118629.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118667
2022-02-01 07:29:01 -08:00
Fraser Cormack e9ceeedf30 [RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack 8d1169cf74 [RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack 414f21ed23 [RISCV][1/3] Switch undef -> poison in VP RVV tests
Inspired by a recent Discourse post on undef vs. poison usage, this
series of patches should reduce the number of undefs in LLVM tests by
around 10%.

Only undef vector operands to insertelement/shufflevector have been
handled, which are by far the most common we've got.

The switchover is split into 3 fairly arbitrary clusters to make it
slightly more manageable: vector predication, fixed-length vectors,
scalable vectors.
2022-02-01 11:06:55 +00:00
Fraser Cormack b00bce2a93 [RISCV] Add a test showing an incorrect VSETVLI insertion
This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector
operation. The loop body starts off with another SEW=64, LMUL=1 VADD
vector operation, before switching to a SEW=32, LMUL=1/2 vector store
instruction.

We can see that the VSETVLI insertion pass omits a VSETVLI before the
VADD (thinking it inherits its configuration from the preheader) but
does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in
a miscompilation as when the loop comes back around, the VADD is
incorrectly configured with SEW=32, LMUL=1/2.

It appears to be a bad load/store optimization, as replacing the vector
store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The
issue is therefore possibly arising from canSkipVSETVLIForLoadStore.

Differential Revision: https://reviews.llvm.org/D118629
2022-02-01 10:21:29 +00:00
David Sherwood daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Craig Topper aae947e860 [RISCV] Separate the Zfhmin and Zfh extensions.
The spec doesn't seem to be written as if Zfh implies Zfhmin. They
seem to be separate extensions.

This patch moves the instructions from Zfhmin to be enabled with
either the Zfh or Zfhmin extensions.

Reviewed By: achieveartificialintelligence

Differential Revision: https://reviews.llvm.org/D118581
2022-01-31 09:06:43 -08:00
Craig Topper e1075186a6 [RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV.
We can use the RISCVISD::GREV encoding that swaps the bits in
each byte.  This allows it to use the existing computeKnownBits
support for RISCVISD::GREV.
2022-01-30 12:41:09 -08:00
Craig Topper d8f929a567 [RISCV] Custom legalize BITREVERSE with Zbkb.
With Zbkb, a bitreverse can be split into a rev8 and a brev8.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D118430
2022-01-28 23:11:12 -08:00
jacquesguan 1276678982 [RISCV] Improve extract_vector_elt for fixed mask registers.
Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit.

Differential Revision: https://reviews.llvm.org/D117389
2022-01-29 11:07:53 +08:00
Craig Topper ea05ee9059 [RISCV] Preserve VL when truncating i64 gather/scatter indices on RV32.
We were creating a truncate with the default for the type, but for
VP intrinsics we have a VL that we should use.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118406
2022-01-28 09:25:30 -08:00
Kito Cheng a9d5bb926d [RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as
default name for fp16 and fp32 conversion in LLVM.

However RISC-V GCC using default naming scheme for that, which is
`__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI
incompatible issue.

Although we didn't have formal runtime ABI spec to specify those naming
convention yet, but I think it would be great to fix the incompatible
issue first.

And I've plan to create a runtime ABI spec undere psABI spec this year.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118207
2022-01-29 00:01:00 +08:00
Fraser Cormack 10879c26a2 [RISCV] Add tests for possible splat optimizations
These splats -- whether BUILD_VECTOR or SPLAT_VECTOR -- are formed by
first extracting a value from a vector and splatting it to all elements
of the destination vector. These could be performed more optimally,
avoiding the drop to scalar, using RVV's vrgather, for example.
2022-01-28 12:32:29 +00:00
Fraser Cormack 6297f929f7 [RISCV] Fix FileCheck prefixes in RVV test
The LMULMAX check names didn't match the options we were passing to llc
(they were swapped around) and we were silently missing coverage for one
test which differs between RV32 and RV64.
2022-01-28 12:08:30 +00:00
Craig Topper 3e98ce45b6 [RISCV] Add Zbkb RUN lines to bswap-bitreverse.ll. NFC 2022-01-27 21:21:17 -08:00
Craig Topper dcd751b26e [RISCV] Split bswap-bitreverse-ctlz-cttz-ctpop.ll into two files bswap/bitreverse and ctlz/cttz/ctpop. NFC
Add Zbkb command lines to the bswap/bitreverse test.
2022-01-27 21:12:03 -08:00
Chenbing.Zheng 6d6c44a3f3 [RISCV] Add support for matching vwmulsu from fixed vectors
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118215
2022-01-28 02:33:30 +00:00
Craig Topper 70e1cc6792 [RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1.
At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We
don't handle the case where the user used vmsle.vx intrinsic with -1.
2022-01-27 11:48:27 -08:00
Fraser Cormack fed2f690a9 [RISCV] Fix test case expected output
I didn't correctly update this before landing D118058.
2022-01-27 09:28:09 +00:00
Fraser Cormack 84e85e025e [SelectionDAG][VP] Provide expansion for VP_MERGE
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.

This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118058
2022-01-27 09:00:41 +00:00
Wu Xinlong 615d71d9a3 [RISCV][CodeGen] Implement IR Intrinsic support for K extension
This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar | 1.0]]
Co-author:@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102310
2022-01-27 15:53:35 +08:00
Craig Topper b3bec6e453 [RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate.
This matches what the spec uses for the vncvt.x.x.w assembly
pseudoinstruction.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D118295
2022-01-26 18:38:13 -08:00
Paweł Bylica bdb7837481
[test][DAGCombine] Add more tests for carry diamond. NFC 2022-01-27 00:25:26 +01:00
Craig Topper f487a76430 [RISCV] Add hasStdExtZbp() to hasAndNotCompare. 2022-01-26 13:54:05 -08:00
David Green 57356d6bb7 [DAG] Create fptoui.sat from clamped fptoui
This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D114964
2022-01-26 08:37:44 +00:00
Zakk Chen 9273378b85 [RISCV] Add the passthru operand for RVV nomask load intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>

Reviewers: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117647
2022-01-25 17:31:36 -08:00
eopXD b089e4072a [RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x
EEW=64 of mulh and its vairants requires V extension.

Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117947
2022-01-25 09:55:05 -08:00
Fraser Cormack 7cb452bfde [SelectionDAG][VP] Add widening support for VP_MERGE
This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118030
2022-01-25 10:59:40 +00:00
Victor Perez 19d3dc6e22 [VP] Update CodeGen/RISCV/rvv/vpgather-sdnode.ll test 2022-01-25 10:49:05 +00:00
Fraser Cormack 5f5c5603ce [SelectionDAG][VP] Add splitting support for VP_MERGE
This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118032
2022-01-25 10:33:23 +00:00
Victor Perez 2233befa5d [LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter
Split these nodes in a similar way as their masked versions.

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D117760
2022-01-25 10:08:07 +00:00