Commit Graph

1458 Commits

Author SHA1 Message Date
Craig Topper 63e711549c [RISCV] Lower VP_FNEG to RVV instructions
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D119269
2022-02-09 10:56:39 -08:00
Sander de Smalen ec46232517 [DAGCombiner] Fold `ty1 extract_vector(ty2 splat(V)) -> ty1 splat(V)`
This seems like an obvious fold, which leads to a few improvements.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118920
2022-02-09 14:30:01 +00:00
jacquesguan 5e71bbfb6c [RISCV] Add patterns for vector widening floating-point fused multiply-add instructions
Add patterns for vector widening floating-point fused multiply-add instructions.

Differential Revision: https://reviews.llvm.org/D117546
2022-02-09 10:34:39 +08:00
Fraser Cormack 62c4ac764b [RISCV] Optimize splats of extracted vector elements
This patch adds an optimization to splat-like operations where the
splatted value is extracted from a identically-sized vector. On RVV we
can splat that via vrgather.vx/vrgather.vi without dropping to scalar
beforehand.

We do have a similar VECTOR_SHUFFLE-specific optimization but that only
works on fixed-length vector types and for those with a constant splat
lane. This patch extends this optimization to make it work on
scalable-vector types and on unknown extract indices.

It is performed during fixed-vector BUILD_VECTOR lowering and during a
new DAGCombine on SPLAT_VECTOR for scalable vectors.

Reviewed By: craig.topper, khchen

Differential Revision: https://reviews.llvm.org/D118456
2022-02-08 10:35:25 +00:00
wangpc c53d99c37d [RISCV] Split f64 undef into two i32 undefs
So that no store instruction will be generated.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118222
2022-02-08 13:42:15 +08:00
wangpc cb0fff4397 [RISCV] Pre-commit test for D118222
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119212
2022-02-08 12:52:13 +08:00
Craig Topper c35ccd2ac8 [DAGCombiner][RISCV] Allow rotates by non-constant to be matched for i32 on riscv64 with Zbb.
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.

DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.

This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.

If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119062
2022-02-06 10:58:12 -08:00
Craig Topper f3a725af43 [RISCV] Add signext test for llvm.abs.i32 for rv64 Zbb.
This shows that we don't preserve sign bits across the
abs expansion, but I think we could if we used negw+max.
2022-02-05 21:26:47 -08:00
Craig Topper c1cef111a3 Revert "[RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X)."
This reverts commit 673d68cd92.

This hadn't been reviewed yet.
2022-02-05 12:51:01 -08:00
Craig Topper d1899da3a2 [RISCV] Add more tests for rotate idioms. Add more RUN lines. NFC
We were only testing rotate idioms on rv32i. DAGCombiner won't
form ISD::ROTL/ROTR unless those operations are Legal or Custom.
They aren't for rv32 so we were only testing shift lowering.

This commit adds i64 idioms and the idioms that mask the shift
amount to avoid UB for a rotate of 0. I've added riscv64 and Zbb
RUN lines to show that we do match rotate for XLen types when
available. We currently miss i32 on rv64izbb.
2022-02-05 12:48:22 -08:00
Craig Topper 673d68cd92 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Differential Revision: https://reviews.llvm.org/D118974
2022-02-05 12:42:12 -08:00
Craig Topper d752ea9a72 [RISCV] Remove exclusions for zext.h/zext.w from our (and (srl X, C1), C2) selection code.
This code tries to replace the pattern with a pair of shifts, but
we were excluding if the And could be a zext.h or zext.w. The SLLI/SRL
pair is more compressible and doesn't come with much down side.

We do regress one test case in rv64i-exhaustive-w-insts.ll but we
can probably add a narrower exclusion for that case.
2022-02-04 17:10:48 -08:00
Craig Topper 1d8bbe3d25 [RISCV] Implement a basic version of AArch64RedundantCopyElimination pass.
Using AArch64's original implementation for reference, this patch
implements a pass to remove unneeded copies of X0. This pass runs
after register allocation and looks to see if a register is implied
to be 0 by a branch in the predecessor basic block.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118160
2022-02-04 10:43:46 -08:00
Craig Topper 234e54bdd8 [RISCV] Add more types of shuffles isShuffleMaskLegal.
Add the vslidedown and interleave patterns that I recently implemented.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118952
2022-02-04 09:13:13 -08:00
Craig Topper c83905a308 [RISCV] Add inline expansion for vector fround.
This avoids a crash for scalable vectors and or scalarization for
fixed vectors.

The algorithm is different enough that I don't think it makes sense
to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86
SSE2 output.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117247
2022-02-04 09:12:09 -08:00
Craig Topper 237eb37260 [RISCV] Add FMV_X_W and FMV_X_H to RISCVSExtWRemoval.
Add -target-abi to sextw-removal.ll RUN lines to show benefit on
new test case.
2022-02-03 09:40:47 -08:00
Shao-Ce SUN 005fd8aa70 [RISCV] Add support for Zihintpause extention
Add support for the 'pause' hint instruction as an alias for
'fence w, 0'. To do this allow the 'fence' operands pred and succ
to be set to 0 (the empty set). This will also allow future hints
to be encoded as 'fence 0, <x>' and 'fence <x>, 0'.

This patch revised from @mundaym's D93019.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117789
2022-02-03 20:55:47 +08:00
Craig Topper b73d151a11 [RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub.
This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv
and a similar set for sub.

I've included support for narrowing scalar splats that have known
sign/zero bits similar to what was done for MUL_VL.

The conversion to vwadd.vv proceeds in two phases. First we'll form
a vwadd.wv by narrowing one of the operands. Then we'll visit the
vwadd.wv to try to narrow the other operand. This turned out to be
simpler than catching all the cases in one step. The forming of of
vwadd.wv can happen for either operand for add, but only the right
hand side for sub since sub isn't commutable.

An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed
during vector op legalization, but VMV_V_X_VL isn't usually formed
until op legalization when BUILD_VECTORS are handled. This leads to
VWADD_W_VL forming in one DAG combine round, and then a later DAG combine
round sees the VMV_V_X_VL and needs to commute the operands to get the
splat in position. This alone necessitated a VWADD_W_VL combine function
which made forming vwadd.vv in two stages an easy choice.

I've left out trying hard to form vwadd.wx instructions for now. It would
only save an extend in the scalar domain which isn't as interesting.

Might need to review the test coverage a bit. Most of the vwadd.wv
instructions are coming from vXi64 tests on rv64. The tests were
copy pasted from the existing multiply tests.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117954
2022-02-02 10:03:08 -08:00
Craig Topper 7f6441f96e [TableGen][RISCV] Relax a restriction in generating patterns for commutable SDNodes.
Previously, all children would be checked to see if any were an
explicit Register. If anywhere no commutable patterns would be
generated. This patch loosens the restriction to only check the
children that are being commuted.

Digging back through history, this code predates the existence of
commutable intrinsics and commutable SDNodes with more than 2
operands. At that time the loop would count the number of children that
weren't registers and if that was equal to 2 it would allow commuting.
I don't think this loop was re-considered when commutable
intrinsics were added or when we allowed SDNodes with more than 2
operands.

This important for RISCV were our isel patterns have a V0 mask
operand after the commutable operands on some RISCVISD opcodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D117955
2022-02-01 21:07:03 -08:00
Craig Topper 7eb7810727 [RISCV] Fix a vsetvli insertion bug involving loads/stores.
The first phase of the analysis can avoid a vsetvli if an earlier
instruction in the block used an SEW and LMUL that when combined with
the EEW of the load/store would produce the desired EMUL. If we
avoided a vsetvli this will affect the global analysis we do in the
second phase.

The third phase where we really insert the vsetvlis needs to agree
with the first phase. If it doesn't we can insert vsetvlis that
invalidate the global analysis.

In the test case there is a VSETVLI in the preheader that sets
SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1.
This VADD is followed by a store that wants wants SEW=32 LMUL=1/2.
Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the
VADD can be become EMUL=1 for the store. So the first phase determines no
vsetvli is needed.

The third phase manages CurInfo differently than BBInfo.Change from the
first phase. CurInfo is only updated when we see a vsetvli or insert
a vsetvli. This was done to allow predecessor block information from
the global analysis to be applied to multiple instructions. Since the
loop body has no vsetvli we won't update CurInfo for either the VADD
or the VSE. This prevented us from checking the store vsetvli elision
for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which
invalidated the global analysis.

To mitigate this, I've added a BBLocalInfo variable that more closely
matches the first phase propagation. This gets updated based on the
VADD and prevents emitting a vsetvli for the store like we did in the
first phase.

I wonder if we should do an earlier phase to handle the load/store case
by adding more pseudo opcodes and changing the SEW/LMUL for those
instructions before the insertion analysis. That might be more robust
than trying to guarantee two phases make the same decision.

Fixes the test from D118629.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118667
2022-02-01 07:29:01 -08:00
Fraser Cormack e9ceeedf30 [RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack 8d1169cf74 [RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack 414f21ed23 [RISCV][1/3] Switch undef -> poison in VP RVV tests
Inspired by a recent Discourse post on undef vs. poison usage, this
series of patches should reduce the number of undefs in LLVM tests by
around 10%.

Only undef vector operands to insertelement/shufflevector have been
handled, which are by far the most common we've got.

The switchover is split into 3 fairly arbitrary clusters to make it
slightly more manageable: vector predication, fixed-length vectors,
scalable vectors.
2022-02-01 11:06:55 +00:00
Fraser Cormack b00bce2a93 [RISCV] Add a test showing an incorrect VSETVLI insertion
This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector
operation. The loop body starts off with another SEW=64, LMUL=1 VADD
vector operation, before switching to a SEW=32, LMUL=1/2 vector store
instruction.

We can see that the VSETVLI insertion pass omits a VSETVLI before the
VADD (thinking it inherits its configuration from the preheader) but
does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in
a miscompilation as when the loop comes back around, the VADD is
incorrectly configured with SEW=32, LMUL=1/2.

It appears to be a bad load/store optimization, as replacing the vector
store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The
issue is therefore possibly arising from canSkipVSETVLIForLoadStore.

Differential Revision: https://reviews.llvm.org/D118629
2022-02-01 10:21:29 +00:00
David Sherwood daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Craig Topper aae947e860 [RISCV] Separate the Zfhmin and Zfh extensions.
The spec doesn't seem to be written as if Zfh implies Zfhmin. They
seem to be separate extensions.

This patch moves the instructions from Zfhmin to be enabled with
either the Zfh or Zfhmin extensions.

Reviewed By: achieveartificialintelligence

Differential Revision: https://reviews.llvm.org/D118581
2022-01-31 09:06:43 -08:00
Craig Topper e1075186a6 [RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV.
We can use the RISCVISD::GREV encoding that swaps the bits in
each byte.  This allows it to use the existing computeKnownBits
support for RISCVISD::GREV.
2022-01-30 12:41:09 -08:00
Craig Topper d8f929a567 [RISCV] Custom legalize BITREVERSE with Zbkb.
With Zbkb, a bitreverse can be split into a rev8 and a brev8.

Reviewed By: VincentWu

Differential Revision: https://reviews.llvm.org/D118430
2022-01-28 23:11:12 -08:00
jacquesguan 1276678982 [RISCV] Improve extract_vector_elt for fixed mask registers.
Now the backend promotes mask vector to an i8 vector and extract element from that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to extract the certain bit.

Differential Revision: https://reviews.llvm.org/D117389
2022-01-29 11:07:53 +08:00
Craig Topper ea05ee9059 [RISCV] Preserve VL when truncating i64 gather/scatter indices on RV32.
We were creating a truncate with the default for the type, but for
VP intrinsics we have a VL that we should use.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118406
2022-01-28 09:25:30 -08:00
Kito Cheng a9d5bb926d [RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as
default name for fp16 and fp32 conversion in LLVM.

However RISC-V GCC using default naming scheme for that, which is
`__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI
incompatible issue.

Although we didn't have formal runtime ABI spec to specify those naming
convention yet, but I think it would be great to fix the incompatible
issue first.

And I've plan to create a runtime ABI spec undere psABI spec this year.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118207
2022-01-29 00:01:00 +08:00
Fraser Cormack 10879c26a2 [RISCV] Add tests for possible splat optimizations
These splats -- whether BUILD_VECTOR or SPLAT_VECTOR -- are formed by
first extracting a value from a vector and splatting it to all elements
of the destination vector. These could be performed more optimally,
avoiding the drop to scalar, using RVV's vrgather, for example.
2022-01-28 12:32:29 +00:00
Fraser Cormack 6297f929f7 [RISCV] Fix FileCheck prefixes in RVV test
The LMULMAX check names didn't match the options we were passing to llc
(they were swapped around) and we were silently missing coverage for one
test which differs between RV32 and RV64.
2022-01-28 12:08:30 +00:00
Craig Topper 3e98ce45b6 [RISCV] Add Zbkb RUN lines to bswap-bitreverse.ll. NFC 2022-01-27 21:21:17 -08:00
Craig Topper dcd751b26e [RISCV] Split bswap-bitreverse-ctlz-cttz-ctpop.ll into two files bswap/bitreverse and ctlz/cttz/ctpop. NFC
Add Zbkb command lines to the bswap/bitreverse test.
2022-01-27 21:12:03 -08:00
Chenbing.Zheng 6d6c44a3f3 [RISCV] Add support for matching vwmulsu from fixed vectors
According to riscv-v-spec-1.0, widening signed(vs2)-unsigned integer multiply
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar

It is worth noting that signed op is only for vs2.
For vwmulsu.vv, we can swap two ops, and don't care which is sign extension,
but for vwmulsu.vx signExt can not be a vector extended from scalar (rs1).
I specifically added two functions ending with _swap in the test case.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118215
2022-01-28 02:33:30 +00:00
Craig Topper 70e1cc6792 [RISCV] Prefer vmslt.vx v0, v8, zero over vmsle.vi v0, v8, -1.
At least when starting from a vmslt.vx intrinsic or ISD::SETLT. We
don't handle the case where the user used vmsle.vx intrinsic with -1.
2022-01-27 11:48:27 -08:00
Fraser Cormack fed2f690a9 [RISCV] Fix test case expected output
I didn't correctly update this before landing D118058.
2022-01-27 09:28:09 +00:00
Fraser Cormack 84e85e025e [SelectionDAG][VP] Provide expansion for VP_MERGE
This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.

This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118058
2022-01-27 09:00:41 +00:00
Wu Xinlong 615d71d9a3 [RISCV][CodeGen] Implement IR Intrinsic support for K extension
This revision implements IR Intrinsic support for RISCV Scalar Crypto extension according to the specification of version [[ https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0-scalar | 1.0]]
Co-author:@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102310
2022-01-27 15:53:35 +08:00
Craig Topper b3bec6e453 [RISCV] Use vnsrl.wx with x0 instead of vnsrl.vi for truncate.
This matches what the spec uses for the vncvt.x.x.w assembly
pseudoinstruction.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D118295
2022-01-26 18:38:13 -08:00
Paweł Bylica bdb7837481
[test][DAGCombine] Add more tests for carry diamond. NFC 2022-01-27 00:25:26 +01:00
Craig Topper f487a76430 [RISCV] Add hasStdExtZbp() to hasAndNotCompare. 2022-01-26 13:54:05 -08:00
David Green 57356d6bb7 [DAG] Create fptoui.sat from clamped fptoui
This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D114964
2022-01-26 08:37:44 +00:00
Zakk Chen 9273378b85 [RISCV] Add the passthru operand for RVV nomask load intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>

Reviewers: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117647
2022-01-25 17:31:36 -08:00
eopXD b089e4072a [RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x
EEW=64 of mulh and its vairants requires V extension.

Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117947
2022-01-25 09:55:05 -08:00
Fraser Cormack 7cb452bfde [SelectionDAG][VP] Add widening support for VP_MERGE
This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118030
2022-01-25 10:59:40 +00:00
Victor Perez 19d3dc6e22 [VP] Update CodeGen/RISCV/rvv/vpgather-sdnode.ll test 2022-01-25 10:49:05 +00:00
Fraser Cormack 5f5c5603ce [SelectionDAG][VP] Add splitting support for VP_MERGE
This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118032
2022-01-25 10:33:23 +00:00
Victor Perez 2233befa5d [LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter
Split these nodes in a similar way as their masked versions.

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D117760
2022-01-25 10:08:07 +00:00
Craig Topper a43ed49f5b [DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)).
If the bitreverse gets expanded, it will introduce a new bswap. By
putting a bswap before the bitreverse, we can ensure it gets cancelled
out when this happens.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118012
2022-01-24 08:31:53 -08:00
Craig Topper b8c7cdcc81 [SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x.
This can show up during when bitreverse is expanded to bswap and
swap of bits within a byte. If the input is already a bswap, we
should cancel them out before we further transform them in a way
that makes it harder to see the redundancy.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118007
2022-01-24 08:17:46 -08:00
Craig Topper cd2a9ff397 [RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv.
Add might be faster than shift. We can't do this earlier without
using a Freeze instruction.

This is the intrinsic version of D106689.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118013
2022-01-24 08:04:53 -08:00
Fraser Cormack d42678b453 [RISCV] Add side-effect-free vsetvli intrinsics
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.

These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.

No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117910
2022-01-24 13:52:08 +00:00
SForeKeeper 70f83f3084 [RISCV] add support for zbkx subextension in MC layer.
This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer.
Instructions with same functionality and same encoding is defined in the bitmanip extension.
It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format.

[[ https://reviews.llvm.org/D94999 | D94999 ]] this is the patch that introduces xperm.* instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117889
2022-01-24 20:38:46 +08:00
Fraser Cormack af773a1818 [RISCV][VP] Lower VP_MERGE to RVV instructions
This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561
2022-01-24 11:05:05 +00:00
Fraser Cormack e7926e8d97 [RISCV] Match VF variants for masked VFRDIV/VFRSUB
This patch follows up on D117697 to help the simple binary operations
behave similarly in the presence of masks.

It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics,
now that VFRDIV and VFRSUB are consistently matched with a LHS splat for
masked and unmasked variants.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117783
2022-01-24 10:59:43 +00:00
Chenbing.Zheng 9aaa74aeef [RISCV] Add patterns of SET[U]LT_VI for STECC forms
This patch optmizes "li a0, 5
                     vmsgt[u].vx v10, v8, a0"
                 -> "vmsgt[u].vi v10, v8, 5"

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118014
2022-01-24 08:50:49 +00:00
jacquesguan ba16e3c31f [RISCV] Decouple Zve* extensions and the V extension.
According to the spec, there are some difference between V and Zve64d. For example, the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*, but V extension does support these instructions. So we should decouple Zve* extensions and the V extension.

Differential Revision: https://reviews.llvm.org/D117854
2022-01-24 14:55:21 +08:00
Wu Xinlong e29d8fb169 [RISCV] Initially support the K-extension instructions on the LLVM MC layer
This commit is currently implementing supports for scalar cryptography extension for LLVM according to version v1.0.0 of [K Ext specification](https://github.com/riscv/riscv-crypto/releases)(scala crypto has been ratified already). Currently, we are implementing the MC (Machine Code) layer of his extension and the majority of work is done under `llvm/lib/Target/RISCV` directory. There are also some test files in `llvm/test/MC/RISCV` directory.

Remove the subfeature of Zbk* which conflict with b extensions to reduce the size of the patch.
(Zbk* will be resubmit after this patch has been merged)

**Co-author:**@ksyx & @VincentWu & @lihongliang & @achieveartificialintelligence

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98136
2022-01-24 14:45:35 +08:00
Craig Topper 3575700b28 [RISCV] Add tests that do a bitreverse before or after a bswap. NFC
We don't optimize this as well as we could. Bitreverse is always
expanded to bswap and a shift/and/or sequence to swap bits within a
byte. The newly created bswap will either becomes a shift/and/or
sequence or rev8 instruction. We don't always realize the bswap is
redundant with another bswap before or after the bitreverse.

Found while thinking about the brev8 instruction from the
Cryptography extension. It's equivalent to bswap(bitreverse(x)) or
bitreverse(bswap(x)).
2022-01-23 13:44:31 -08:00
Craig Topper 88f33cff4b [RISCV] Add bitreverse tests to bswap-ctlz-cttz-ctpop.ll. Add Zbb command lines. NFC
Rename to include bitreverse. Add additional tests and Zbb command lines.

There's some overlapping tests with rv32zbb.ll and rv64zbb.ll. Maybe
I'll clean that up in a future patch.
2022-01-23 13:41:58 -08:00
Craig Topper 0b79979180 [RISCV] Merge some rvv intrinsic test cases that only differ by XLen type.
Instead of having a test for i32 XLen and i64 XLen, use sed to
replace iXLen with i32/i64 before running llc.

This change covers all of the floating point tests.
2022-01-23 09:36:14 -08:00
Craig Topper be6070c290 [RISCV] Use FP ABI for some RVV intrinsic tests. NFC
Removes moves from GPR to FPR and improves f64 tests on RV32.

Differential Revision: https://reviews.llvm.org/D117969
2022-01-22 22:06:33 -08:00
Craig Topper 85e42db1b6 [RISCV] Merge some rvv intrinsic test cases that only differ by XLen type.
Instead of having a test for i32 XLen and i64 XLen, use sed to
replace iXLen with i32/i64 before running llc.

This change updates tests for intrinsics that operate exclusively
on mask values. It removes over 4000 lines worth of test content.
More merging will come in future changes.

Differential Revision: https://reviews.llvm.org/D117968
2022-01-22 21:55:29 -08:00
eopXD 3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Alex Fan e796eaf2af [RISCV][RFC] add MC support for zbkc subextension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117874
2022-01-22 10:23:01 +08:00
Fraser Cormack 4d268dc94a [RISCV] Enable CGP to sink splat operands of VP intrinsics
This patch brings better splat-matching to our VP support, by sinking
splat operands of VP intrinsics back into the same block as the VP
operation. The list of VP intrinsics we are interested in matches that
of the regular instructions.

Some optimization is still lacking. For instance, our VL nodes aren't
recognized as commutative, so splats must be on the RHS. Because of
this, we limit our sinking of splats to just the RHS operand for now.
Improvement in this regard can come in another patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117703
2022-01-21 11:30:37 +00:00
eopXD e6de53b4de [RISCV] Bump rvv-related extensions from 0.10 to 1.0
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112987
2022-01-20 23:22:20 -08:00
wangpc 8def89b5dc [RISCV] Set CostPerUse to 1 iff RVC is enabled
After D86836, we can define multiple cost values for
different cost models. So here we set CostPerUse to
1 iff RVC is enabled to avoid potential impact on RA.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117741
2022-01-21 14:44:26 +08:00
Craig Topper 7b3d307288 [RISCV] Add isel patterns for grevi, shfli, and unshfli to brev8/zip/unzip instructions.
Zbkb supports some encodings of the general grevi, shfli, and
unshfli instructions legal, so we added separate instructions for
those encodings to improve the diagnostics for assembler and
disassembler. To be consistent we should always use these separate
instructions whenever those specific encodings of grevi/shfli/unshfli
occur. So this patch adds specific isel patterns to override the generic
isel patterns for these cases. Similar was done for rev8 and zext.h
for Zbb previously.
2022-01-20 20:43:52 -08:00
Wu Xinlong 7ee1c162cc [RISCV][RFC] add inst support of zbkb
This commit add instructions supports of `zbkb` which defined in scalar cryptography extension version v1.0.0 (has been ratified already).

Most of the zbkb directives reuse parts of the zbp and zbb directives, so this patch just modified some of the inst aliases and predicates.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117640
2022-01-21 11:49:36 +08:00
Hsiangkai Wang ad06e65dc4 [RISCV] Fix the bug in the register allocator caused by reserved BP.
Originally, hasRVVFrameObject() will scan all the stack objects to check
whether if there is any scalable vector object on the stack or not.
However, it causes errors in the register allocator. In issue 53016, it
returns false before RA because there is no RVV stack objects. After RA,
it returns true because there are spilling slots for RVV values during RA.
The compiler will not reserve BP during register allocation and generate BP
access in the PEI pass due to the inconsistent behavior of the function.

The function is changed to use hasStdExtV() as the return value. It is
not precise, but it can make the register allocation correct.

Refer to https://github.com/llvm/llvm-project/issues/53016.

Differential Revision: https://reviews.llvm.org/D117663
2022-01-21 01:23:01 +00:00
Craig Topper cfae2c65db [RISCV] Factor Zve32 support into RISCVSubtarget::getMaxELENForFixedLengthVectors.
This is needed to properly limit fractional LMULs for Zve32.

Add new RUN Zve32 RUN lines to the existing tests for the
-riscv-v-fixed-length-vector-elen-max command line option.
2022-01-20 16:31:12 -08:00
Craig Topper fa8bb22466 [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117743
2022-01-20 14:44:47 -08:00
Craig Topper 7a275dc354 [RISCV] Remove Zvlsseg extension.
This string no longer appears in the Vector Extension specification.
The segment load/store instructions are just part of the vector
instruction set.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117724
2022-01-20 12:40:07 -08:00
Craig Topper 94e69fbb4f [RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn))
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This is similar to D116771, but for the saturating conversions.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

I'm only handling saturating to i64 or i32. This could be extended
to other sizes in the future.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116864
2022-01-20 11:35:37 -08:00
Fraser Cormack 75017db08c [RISCV] Add tests for commuted vector/scalar VP patterns
This patch adds a variety of tests checking that we can match
vector/scalar instructions against masked VP intrinsics when the splat
is on the LHS. At this stage, we can't, despite us having
ostensibly-commutable ISel patterns for them. The use of V0 as the mask
operand interferes with the auto-generated ISel table.
2022-01-20 17:10:09 +00:00
Fraser Cormack ca36cc56ac [RISCV] Match RVV VF variants also through masked operations
This brings floating-point RVV vector/scalar support more in line with
the integer vector patterns, which can already match '.vx' instructions
with masked operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117697
2022-01-20 12:08:02 +00:00
Fraser Cormack 5a12024b95 [RISCV] Optimize lowering of floating-point -0.0
This idea has come up in several reviews -- D115978 and D105902 -- so I
can't take any credit for the idea. Instead of using a constant pool to
lower -0.0, we can emit a sequence of two instructions:

    fmv.[hwd].x freg, zero
    fsgnjn.[hsd] freg, freg, freg

This is only done when the floating-point type is legal.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117687
2022-01-20 11:46:28 +00:00
Victor Perez c10c748878 [LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117557
2022-01-20 08:57:57 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
eopXD 8eae99dfe5 [RISCV] Add the zve extension according to the v1.0 spec
`zve` is the new standard vector extension to specify varying degrees of
vector support for embedding processors. The `zve` extension is related
to the `zvl` extension and other updates that are added in v1.0.

According to https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21,
Clang defines macro `__riscv_v_max_elen`,  `__riscv_v_max_elen_fp` for
`zve` and it can be used by applications that uses the vector extension.

Authored by: Zakk Chen <zakk.chen@sifive.com> @khchen
Co-Authored by: Eop Chen <eop.chen@sifive.com> @eopXD

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112408
2022-01-19 23:48:28 -08:00
Hsiangkai Wang 34570f4faf [RISCV] Add a test to show the bug in the RA caused by reserved BP.
The bug is reported in https://github.com/llvm/llvm-project/issues/53016.

Authored-by: Kito Cheng <kito.cheng@sifive.com>

Differential Revision: https://reviews.llvm.org/D117738
2022-01-20 02:37:29 +00:00
Mohammed Nurul Hoque 21c79be5d7 [RISCV] Add patterns to MIR sign-extension removal pass.
This patch adds a few instruction patterns that generate sign-extended values or propagate them, adding to the pass introduced in https://reviews.llvm.org/D116397

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117465
2022-01-19 17:33:58 -08:00
Luís Marques a767ae2c5c [RISCV] Fix incomplete asm statement parsing
For instructions without operands, the final `AsmToken::EndOfStatement`
wasn't being consumed. In the context of inline assembly, the resulting
empty statements would cause extraneous empty lines to be emitted. Fix
the issue by consuming the `EndOfStatement` token.

Differential Revision: https://reviews.llvm.org/D117565
2022-01-19 21:56:21 +00:00
Craig Topper 4060b81e76 [RISCV] Obey -riscv-v-fixed-length-vector-elen-max when lowering mask BUILD_VECTORs.
We may not be allowed to use vXiXLen vectors. Consult ELEN to
determine what is allowed. This will become even more important
when Zve32 is added.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D117518
2022-01-19 10:47:37 -08:00
Fraser Cormack 5fa826f4e2 [RISCV] Test expected inst opcode in sink-splat test
The function's name suggests it should be testing 'sub' rather than
'add'. We test 'add' in an earlier test so we're not losing any coverage
here.
2022-01-19 17:18:24 +00:00
eopXD 9f27941c2f [RISCV] Add patterns for vector narrowing integer right shift instructions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117454
2022-01-18 22:30:13 -08:00
Victor Perez b7bf96a258 [LegalizeTypes][VP] Add widening support for vp.reduce.*
When widening these intrinsics, we do not have to insert neutral
elements at the end of the vector as when widening vector.reduce.*
intrinsics, thanks to vector predication semantics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117467
2022-01-18 10:21:01 +00:00
Victor Perez fd1dce35bd [LegalizeTypes][VP] Add splitting support for vp.reduction.*
Split vp.reduction.* intrinsics by splitting the vector to reduce in
two halves, perform the reduction operation in each one of them and
accumulate the results of both operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117469
2022-01-18 09:29:24 +00:00
jacquesguan 1090000b63 [RISCV] Add patterns for vector widening floating-point multiply
Add patterns for vector widening floating-point multiply

Differential Revision: https://reviews.llvm.org/D117530
2022-01-18 14:52:43 +08:00
Han-Kuan Chen ec9cb3a79c [RISCV] Provide VLOperand in td.
Currently, users expected VL is the last operand. However, since some
intrinsics has tail policy in the last operand, this rule cannot be used
anymore.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D117452
2022-01-17 20:25:47 -08:00
jacquesguan c29d6c410e [RISCV] Add patterns for vector widening floating-point add/subtract instructions
Add patterns for Vector Widening Floating-Point Add/Subtract Instructions

Differential Revision: https://reviews.llvm.org/D117466
2022-01-18 10:33:56 +08:00
Craig Topper 116af698e2 [RISCV] When expanding CONCAT_VECTORS, don't create INSERT_SUBVECTORS for undef subvectors.
For fixed vectors, the undef will get expanded to an all zeros
build_vector. We don't want that so suppress creating the
insert_subvector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117379
2022-01-17 14:40:59 -08:00
Craig Topper 9c410838d2 [RISCV] Legalize fixed length (insert_subvector undef, X, 0) to a scalable insert.
We were considering this legal, but later the undef would become an all
zeros vector. This would cause us to need to re-legalize the insert later
into a vslideup with zero vector.

This patch catches the case and directly legalizes it to a scalable
insert.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117377
2022-01-17 14:31:30 -08:00
Fraser Cormack af219e567f [RISCV] Add tests for scalable-vector vwsub patterns
This patch adds tests for patterns introduced in D117188.

Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D117392
2022-01-17 10:53:39 +00:00
eopXD 5a457782a2 [RISCV] Add patterns for vector widening integer multiply-add instructions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117404
2022-01-16 18:37:13 -08:00
Craig Topper 4c1e1e05cb [RISCV] Add RISCVISD::BFPW to ComputeNumSignBitsForTargetNode. 2022-01-15 15:23:49 -08:00
Fraser Cormack 877d1b3d07 [SelectionDAG][VP] Add splitting/widening for VP_LOAD and VP_STORE
Original patch by @hussainjk.

This patch was split off from D109377 to keep vector legalization
(widening/splitting) separate from vector element legalization
(promoting).

While the original patch added a third overload of
SelectionDAG::getVPStore, this patch takes the liberty of collapsing
those all down to 1, as three overloads seems excessive for a
little-used node.

The original patch also used ModifyToType in places, but that method
still crashes on scalable vector types. Seeing as the other VP
legalization methods only work when all operands need identical
widening, this patch follows in that vein.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117235
2022-01-15 11:41:29 +00:00
eopXD 122cab9b05 [RISCV] Add test for vector extension
It seems that D115709 have mis-deleted the whole testcase for vector
extension. This commit adds them back.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117353
2022-01-14 23:52:19 -08:00
eopXD 26bb1b1dab [RISCV] Add the zvl extension according to the v1.0 spec
`zvl` is the new standard vector extension that specifies the minimum vector length of the vector extension.
The `zvl` extension is related to the `zve` extension and other updates that are added in v1.0.

According to https://github.com/riscv-non-isa/riscv-c-api-doc/pull/21,
Clang defines macro `__riscv_v_min_vlen` for `zvl` and it can be used for applications that uses the vector extension.
LLVM checks whether the option `riscv-v-vector-bits-min` (if specified) matches the `zvl*` extension specified.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108694
2022-01-14 23:01:48 -08:00
jacquesguan b148348ad4 [RISCV] Add patterns for vector widening integer add/subtract
Add patterns for vector widening integer add/subtract instructions

Differential Revision: https://reviews.llvm.org/D117188
2022-01-15 09:41:07 +08:00
Shao-Ce SUN a0a76fee0c [RISCV] update zfh and zfhmin extention to v1.0
`zfh` and `zfhmin` have been ratified, with version 1.0.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117098
2022-01-15 09:21:24 +08:00
Craig Topper 2baa1dffd1 [RISCV] Add basic support for matching shuffles to vslidedown.vi.
Specifically the unary shuffle case where the elements being
shifted in are undef. This handles the shuffles produce by expanding
llvm.reduce.mul.

I did not reduce the VL which would increase the number of vsetvlis,
but may improve the execution speed. We'd also want to narrow the
multiplies so we could share vsetvlis between the vslidedown.vi and
the next multiply.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D117239
2022-01-14 09:04:54 -08:00
Craig Topper ac6b4896ea [RISCV] Honor the VT when converting float point register names to register class for inline assembly.
It appears the code here was written for the inline asm clobbering
a specific register, but it also gets used for named input and
output registers.

For the input and output case, we should honor the VT so we
don't insert conversion instructions around the inline assembly.

For the clobber, case we need to pick the largest register class.

Reviewed By: asb, jrtc27

Differential Revision: https://reviews.llvm.org/D117279
2022-01-14 09:04:00 -08:00
jacquesguan 88c0e0806b [RISCV] Improve i64 splat vector lowering in RV32.
We could use vmv.v.i/vmv.v.x whose eew is 32 to lower the i64 splat vector if the i64 constant scalar could be splitted into two same i32 scalar.

Differential Revision: https://reviews.llvm.org/D117079
2022-01-14 14:06:01 +08:00
jacquesguan 3e241353e1 [RISCV] Add more i64 splat vector test.
Precommit test for D117079.

Differential Revision: https://reviews.llvm.org/D117081
2022-01-14 14:04:01 +08:00
Craig Topper 7b7210291a [RISCV] Remove unused check prefixes. NFC 2022-01-13 22:02:16 -08:00
Craig Topper d72ebafda0 [RISCV] Add basic Zfh inline assembly tests. NFC
The abi name test shows incorrect fcvt.d.h, fcvt.h.d, fcvt.f.h,
and fcvt.h.f instructions around the inline assembly.
2022-01-13 21:52:46 -08:00
Craig Topper 3174525516 [RISCV] Add inline asm f32 test cases with D extension. NFC
Using named registers as input or output constraints creates fcvt.d.s
and fcvt.s.d instructions around the inline assembly.

This makes the data unusable by the inline assembly and corrupts
the results of the inline assembly.
2022-01-13 21:52:46 -08:00
Craig Topper 7690c2c76c [RISCV] Add tests for fixed vector mul reduction intrinsics. NFC
CodeGen for this can be improved.
2022-01-13 10:24:01 -08:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Craig Topper 15a78f9d09 [RISCV] Remove stale references to experimental-b. NFC
Differential Revision: https://reviews.llvm.org/D117136
2022-01-12 12:13:21 -08:00
Alex Bradbury 33d008b169 [RISCV] Update recently ratified Zb{a,b,c,s} extensions to no longer be experimental
Agreed policy is that RISC-V extensions that have not yet been ratified
should be marked as experimental, and enabling them requires the use of
the -menable-experimental-extensions flag when using clang alongside the
version number. These extensions have now been ratified, so this is no
longer necessary, and the target feature names can be renamed to no
longer be prefixed with "experimental-".

Differential Revision: https://reviews.llvm.org/D117131
2022-01-12 19:33:44 +00:00
Craig Topper 63b17eb9ec [RISCV] Add strictfp support for compares.
This adds support for STRICT_FSETCC(quiet) and STRICT_FSETCCS(signaling).

FEQ matches well to STRICT_FSETCC oeq.
FLT/FLE matches well to STRICT_FSETCCS olt/ole.

Others require commuting operands or multiple instructions.

STRICT_FSETCC olt/ole/ogt/oge/ult/ule/ugt/uge uses FLT/FLE,
but we need to save/restore FFLAGS around them to avoid spurious
exceptions. I've implemented pseudo instructions with a
CustomInserter to insert the save/restore CSR instructions.
Unfortunately, this doesn't honor exceptions for signaling NANs
but I'm not sure if signaling nans are really supported by the
constrained intrinsics.

STRICT_FSETCC one and ueq expand to a pair of FLT instructions
with a save/restore of fflags around each. This could be improved
in the future.

There may be some opportunities to generate better code for strict
comparisons mixed with nonans fast math flags. I've left FIXMEs in
the .td files for that.

Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D116694
2022-01-11 20:01:41 -08:00
Craig Topper be1cc64cc1 [RISCV] Add DAG combine to fold (fp_to_int (ffloor X)) -> (fcvt X, rdn)
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

We only handle XLen types except i32 on RV64. i32 will be type
legalized to a RISCVISD node. All other types will be type legalized
to XLen and maintain the FP_TO_SINT/UINT ISD opcode.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116771
2022-01-11 09:05:57 -08:00
wangpc c6430fade3 [RISCV] Generate 32 bits jumptable entries when code model is small
The code can only address the whole RV32 address space or the lower 2 GiB
of the RV64 address space in small code model, so 32 bits entry is enough.
Cache hit ratio and code size have some improvements.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116435
2022-01-11 18:20:37 +08:00
wangpc 98d51c2542 [RISCV] Override TargetLowering::BuildSDIVPow2 to generate SELECT
When `Zbt` is enabled, we can generate SELECT for division by power
of 2, so that there is no data dependency.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D114856
2022-01-11 15:54:35 +08:00
Chenbing.Zheng 9ea772ff81 [RISCV] Block vmsgeu.vi with 0 immediate in Isel
For vmsgeu.vi with 0, we know this is always true. So we can replace
it with vmset.m (unmasked) or vmset.m+vmand.mm (masked).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116584
2022-01-11 03:04:44 +00:00
jacquesguan d0554ae4cf [RISCV] Select vl op to X0 when it is equal to ~0.
Now the backend will select ~0 vl to a register and load instruction, we could use X0 to replace it.

Differential Revision: https://reviews.llvm.org/D116798
2022-01-11 10:56:25 +08:00
jacquesguan 99c1acf3f1 [RISCV] Add precommit test for select vl op that equal to ~0.
Precommit test for D116798

Differential Revision: https://reviews.llvm.org/D116799
2022-01-11 10:54:50 +08:00
Haocong.Lu bd653f6406 [RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
Now AND is used for zero extension when both Zbb and Zbp are not enabled.
It may be better to use shift operation if the trailing ones mask exceeds simm12.

This patch optimzes LUI+ADDI+AND to SLLI+SRLI.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116720
2022-01-11 02:37:03 +00:00
jacquesguan b607cd3928 [RISCV] Use vmv.s.x to build one element splat vector.
When we want to create an splat vector that only the first element is initialized, we could use vmv.s.x or vfmv.s.f to build it.

Differential Revision: https://reviews.llvm.org/D116277
2022-01-11 10:21:18 +08:00
Craig Topper b271184f07 [RISCV] Use FP ABI on some of the FP tests to reduce the number of CHECK lines. NFC
These tests are interested in the FP instructions being used, not
the conversions needed to pass the arguments/returns in GPRs.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116869
2022-01-10 09:08:29 -08:00
Craig Topper b645bcd98a [RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization.
This can be generalized to (srl (and X, C2), C) ->
(srli (slli X, (XLen-C3), (XLen-C3) + C). Where C2 is a mask with
C3 trailing ones.

This can avoid constant materialization for C2. This is beneficial
even when C2 can be selected to ANDI because the SLLI can become
C.SLLI, but C.ANDI cannot cover all the immediates of ANDI.

This also enables CSE in some cases of i8 sdiv by constant codegen.
2022-01-09 23:37:10 -08:00
Craig Topper 296e8cae5c [RISCV] Isel (sra (sext_inreg X, i16), C) -> (srai (slli X, (XLen-16), (XLen-16) + C).
Similar for (sra (sext_inreg X, i8), C).

With Zbb, sext_inreg of i8 and i16 are legal for sext.b and sext.h.
This transform makes the Zbb codegen the same as without Zbb. The
shifts are more compressible. This also exposes an opportunity for
CSE with another slli in the i16 sdiv by constant codegen.
2022-01-09 21:23:43 -08:00
jacquesguan 6b8362eb8d [RISCV] Disable EEW=64 for index values when XLEN=32.
Disable EEW=64 for vector index load/store when XLEN=32.

Differential Revision: https://reviews.llvm.org/D106518
2022-01-10 10:51:27 +08:00
Craig Topper 2dd52f840b [RISCV] Fold (srl (and X, 0xffff), C)->(srli (slli X, (XLen-16), (XLen-16) + C) even with Zbb/Zbp.
We can use zext.h with Zbb, but srli/slli may offer more opportunities
for compression.
2022-01-09 18:42:03 -08:00
Craig Topper a500f7f48f [SelectionDAG] Add FP_TO_UINT_SAT/FP_TO_SINT_SAT to computeKnownBits/computeNumSignBits.
These nodes should saturate to their saturating VT. We can use this
information to know the bits past the VT are all zeros or all sign bits.

I think we might only have test coverage for the unsigned case. I'll
verify and add tests.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D116870
2022-01-09 17:48:05 -08:00
Craig Topper 6a10bc7056 [RISCV] Add i8/i16 fptosi/fptoui and fptosi_sat/fptoui_sat tests. NFC
Use signext/zeroext return attributes to show unnecessary ands or
shifts in the saturating tests.
2022-01-08 14:01:31 -08:00
Craig Topper fe230bfc00 [RISCV] Add nounwind to remove some cfi directives from test CHECKs. NFC 2022-01-08 12:48:24 -08:00
Craig Topper 0f9f17869f [RISCV] Add nounwind to remove some cfi directives from test CHECKs. NFC 2022-01-08 12:27:46 -08:00
Baoshan Pang af931a51b9 [RISCV] Materializing constants with 'rori'
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116574
2022-01-07 15:39:22 -08:00
Lian Wang e8f1dfe923 [RISCV] Supplement PACKH instruction pattern
Optimize (rs1 & 255) | ((rs2 & 255) << 8) -> (PACKH rs1, rs2).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116791
2022-01-07 17:59:19 +08:00
Victor Perez 38efa68b08 [LegalizeTypes][VP] Add splitting support for vp.select
Split vp.select in a similar way as vselect, splitting also the length
parameter.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116651
2022-01-07 08:46:01 +00:00
Craig Topper ec4dd862bf [RISCV] Use simm5_plus1_nonzero in isel patterns for vmsgeu.vi/vmsltu.vi intrinsics.
The 0 immediate can't be selected to vmsgtu.vi/vmsleu.vi by decrementing
the immediate. To prevent his we had special patterns that provided
alternate lowering for the 0 cases. This relied on tablegen prioritizing
the 0 pattern over the sim5_plus1 range.

This patch introduces simm5_plus1_nonzero that excludes 0. It also
excludes the special case for vmsltu.vi since we can just use
vmsltu.vx and let the 0 be selected to X0.

This is an alternative to some of the changes in D116584.

Reviewed By: Chenbing.Zheng, asb

Differential Revision: https://reviews.llvm.org/D116723
2022-01-06 08:27:27 -08:00
Craig Topper 56ca11e31e [RISCV] Add an MIR pass to replace redundant sext.w instructions with copies.
Function calls and compare instructions tend to cause sext.w
instructions to be inserted. If we make good use of W instructions,
these operations can often end up being redundant. We don't always
detect these during SelectionDAG due to things like phis. There also
some cases caused by failure to turn extload into sextload in
SelectionDAG. extload selects to LW allowing later sext.ws to become
redundant.

This patch adds a pass that examines the input of sext.w instructions trying
to determine if it is already sign extended. Either by finding a
W instruction, other instructions that produce a sign extended result,
or looking through instructions that propagate sign bits. It uses
a worklist and visited set to search as far back as necessary.

Reviewed By: asb, kito-cheng

Differential Revision: https://reviews.llvm.org/D116397
2022-01-06 08:23:42 -08:00
Craig Topper 75117fb340 [RISCV] Don't advertise i32->i64 zextload as free for RV64.
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.

Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.

If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.

Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116567
2022-01-06 08:13:42 -08:00
Nikita Popov f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
Victor Perez 96e220e688 [LegalizeTypes][VP] Add integer promotion support for vp.select
Promote select, vselect and vp.select in a similar way.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116400
2022-01-05 11:01:52 +00:00
Victor Perez df5226dfb3 [LegalizeTypes][VP] Add widening support for vp.select
Widen vp.select the same way as select and vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116407
2022-01-05 09:21:11 +00:00
Craig Topper a04b532505 [LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares.
Unsigned compares work with either zero extended or sign extended
inputs just like equality comparisons. I didn't allow this when
I refactored the code in D116421 due to lack of tests. But I've
since found a simple C test case that demonstrates when this can be
useful.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D116617
2022-01-04 12:38:47 -08:00
Craig Topper df2e728b77 [RISCV] Teach RISCVGatherScatterLowering to handle more complex recurrence start values.
Previously we only recognized strided loads/store when the initial
value for the phi was a strided constant vector.

This patch extends the support to a strided_constant added to a
splatted value. The rewritten loop will add the splat value to the
first element of the strided constant vector to use as the scalar
start value. The stride is unaffected.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D115958
2022-01-04 10:13:34 -08:00
Victor Perez 5527139302 [RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select
Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115546
2022-01-02 23:12:32 -08:00
Craig Topper d00e438cfe [RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended.
This is similar to what is done for targets that prefer zero extend
where we avoid using a zero extend if the promoted values are sign
extended.

We'll also check for zero extended operands for ugt, ult, uge, and ule when the
target prefers sign extend. This is different than preferring zero extend, where
we only check for sign bits on equality comparisons.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D116421
2021-12-31 17:15:20 -08:00
wangpc 41454ab256 [RISCV] Use constant pool for large integers
For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to build them.
In the same time, it just takes two instructions to load
constants (with extra cycles to access memory), so it may be
profitable to put these integers into constant pool.

Reviewed By: asb, craig.topper

Differential Revision: https://reviews.llvm.org/D114950
2021-12-31 14:48:48 +08:00
jacquesguan 05f82dc877 [RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass.
Fix incorrect cases of vmv.s.f and add test cases for it.

Differential Revision: https://reviews.llvm.org/D116432
2021-12-31 14:17:03 +08:00
Craig Topper 15787ccd45 [RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics.
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.

It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.

I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116323
2021-12-30 11:54:32 -08:00
jacquesguan 128c6ed73b [RISCV] Teach VSETVLInsert to eliminate redundant vsetvli for vmv.s.x and vfmv.s.f.
Differential Revision: https://reviews.llvm.org/D116307
2021-12-30 17:16:18 +08:00