This uses to division by constant optimization to use MULHU/MULHS.
Reviewed By: frasercrmck, arcbbb
Differential Revision: https://reviews.llvm.org/D96934
Due to vXi64 on RV32, I've directly emitted this using _VL ISD
opcodes. If it wasn't for that we could just use fixed vector
BUILD_VECTOR and VSELECT and let those each be legalized.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96910
These should be NOPs so we can just replace with the input. This
matches what SVE does with isel patterns for all permutations.
Custom isel saves us from having to list all permurations for
all LMULs.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96921
This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR
(nominally where both operands are scalable vector types) where the
vector, subvector, and index align sufficiently to allow decomposition
to subregister manipulation:
* For extracts, the extracted subvector must correctly align with the
lower elements of a vector register.
* For inserts, the inserted subvector must be at least one full vector
register, and correctly align as above.
This approach should work for fixed-length vector insertion/extraction
too, but that will come later.
Reviewed By: craig.topper, khchen, arcbbb
Differential Revision: https://reviews.llvm.org/D96873
The type legalizer can call this code based on the scalar type so
we need to verify the vector type is a scalable vector.
I think due to how type legalization visits nodes, the vector type
will have already been legalized so we don't have an issue with
using MVT here like we did for EXTRACT_VECTOR_ELT.
I've added a test just in case.
The type legalizer is calling this code based on the scalar type so
we need to verify the input type is a scalable vector.
The vector type has also not been legalized yet when this is called
so we need to use EVT for it.
This patch adds support for fixed-length vector vselect. It does so by
lowering them to a custom unmasked VSELECT_VL node with a vector length
operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96768
This patch proposes how to deal with RISC-V vector frame objects. The
layout of RISC-V vector frame will look like
|---------------------------------|
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there is realignment or variable length array in the stack, we will use
frame pointer to access fixed objects and stack pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there are both realignment and variable length array in the stack, we
will use frame pointer to access fixed objects and base pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------| <- base pointer (bp)
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------|
| /////////////////////////////// |
| variable length array |
| /////////////////////////////// |
|---------------------------------| <- end of frame (sp)
| scalar outgoing arguments |
|---------------------------------|
In this version, we do not save the addresses of RVV objects in the
stack. We access them directly through the polynomial expression
(a x VLENB + b). We do not reserve frame pointer when there is any RVV
object in the stack. So, we also access the scalar frame objects through the
polynomial expression (a x VLENB + b) if the access across RVV stack
area.
Differential Revision: https://reviews.llvm.org/D94465
Non-splatted non-integer build_vector nodes were mistakenly being
lowered as VID expressions, which should not happen. VID can only be
used to select integer build_vector nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96718
The patterns mostly follow the scalar counterparts, save for some extra
optimizations to match the vector/scalar forms.
The patch adds a DAGCombine for ISD::FCOPYSIGN to try and reorder
ISD::FNEG around any ISD::FP_EXTEND or ISD::FP_TRUNC of the second
operand. This helps us achieve better codegen to match vfsgnjn.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96028
This is annoying because the condition code legalization belongs
to LegalizeDAG, but our custom handler runs in Legalize vector ops
which occurs earlier.
This adds some of the mask binary operations so that we can combine
multiple compares that we need for expansion.
I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks.
This patch contains a subset of the integer setcc patch as well.
That patch is dependent on the integer binary ops patch. I'll rebase
based on what order the patches go in.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96567
I believe I've covered all orderings of splat operands here. Better
canonicalization in lowering might help reduce this. I did not handle
the immediate adjustments needed for set(u)gt/set(u)lt.
Testing here is limited to byte types because the scalable vector
type used for masks for the store is calculated assuming 8 byte
elements. But for the setcc its based on the element count of the
container type for the setcc input. So they don't agree. We'll need
to enhanced D96352 to handle this I think.
Differential Revision: https://reviews.llvm.org/D96443
Unlike scalable vectors, I'm only using a ComplexPattern for
the immediate itself. The vmv_v_x is matched explicitly. We igore
the VL argument when matching a binary operator, but we do check
it when matching splat directly.
I left out tests for vXi64 as they fail on rv32 right now.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96365
This patch extends the initial fixed-length vector support to include
smin, smax, umin, and umax.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96491
This patch handles cast-like insert_subvector & extract_subvector
in which case:
1. index starts from 0.
2. inserting a fixed-width vector into a scalable vector,
or extracting a fixed-width vector from a scalable vector.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D96352
This refines how we determine which masks types are legal and adds
support for loads, stores, and all ones/zeros splats.
I left a fixme in store handling where I think we need to zero
extra bits if the type isn't a multiple of a byte. If I remember
right from X86 there was some case we could have a store of a
1, 2, or 4 bit mask and have a scalar zextload that then expected the
bits to be 0. Its tricky to zero the bits with RVV. We need to do
something like round VL up, zero a register, lower the VL back down,
then do a tail undisturbed move into the zero register. Another
option might be to generate a mask of 1/2/4 bits set with a VL of 8
and use that to mask off the bits.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96468
The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.
I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96186
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.
The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96332
In vector v0.10, there are whole vector register load/store
instructions. I suggest to use the whole register load/store
instructions for generic load/store for scalable vector types. It could
save up vset{i}vl{i} for these load/store.
For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to
load/store partial vector registers.
Differential Revision: https://reviews.llvm.org/D95853
Building on the fixed vector support from D95705
I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.
This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.
Differential Revision: https://reviews.llvm.org/D96108
This is an alternative to D95563.
This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a
Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.
This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.
I'm only testing one vector width right now, but the width is
programmable via the command line.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95705
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.
To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.
For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95800
This patch adds support for both the fadd reduction intrinsic, in both
the ordered and unordered modes.
The fmin and fmax intrinsics are not currently supported due to a
discrepancy between the LLVM semantics and the RVV ISA behaviour with
regards to signaling NaNs. This behaviour is likely fixed in version 2.3
of the RISC-V F/D/Q extension, but until then the intrinsics can be left
unsupported.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95870
This patch adds support for the integer reduction intrinsics supported
by RVV. This excludes "mul" which has no corresponding instruction.
The reduction instructions in RVV have slightly complicated type
constraints given they always produce a single "M1" vector register.
They are lowered to custom nodes including the second "scalar" reduction
operand to simplify the patterns and in the hope that they can be useful
for future DAG combines.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95620
This patch custom-legalizes all integer EXTRACT_VECTOR_ELT nodes where
SEW < XLEN to VMV_S_X nodes to help the compiler infer sign bits from
the result. This allows us to eliminate redundant sign extensions.
For parity, all integer EXTRACT_VECTOR_ELT nodes are legalized this way
so that we don't need TableGen patterns for some and not others.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95741
This patch adds support for lowering the sqrt intrinsic to the RVV
vfsqrt instruction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96012
The vrgather.vv instruction uses a vector of indices with the same
SEW as operand 0. The vrgather.vx instructions use a scalar index
operand of XLen bits.
By splitting this into 2 intrinsics we are able to use LLVMatchType
in the definition to avoid specifying the type for the index operand
when creating the IR for the intrinsic. For .vv it will match the
operand 0 type. And for .vx it will match the type of the vl operand
we already needed to specify a type for.
I'm considering splitting more intrinsics. This was a somewhat
odd one because the .vx doesn't use the element type, it always
use XLen.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D95979
Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95869
A follow up patch will add support for commuting operands or
changing opcode to vfmacc and friends.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95662
This demonstrates a missed optimization: the `vmv.x.s` instruction is
used to extract the element from the vector, and this instruction
already sign-extends the value to XLEN.
This patch adds support for the full range of vector int-to-float,
float-to-int, and float-to-float conversions on legal types.
Many conversions are supported natively in RVV so are lowered with
patterns. These include conversions between (element) types of the same
size, and those that are half/double the size of the input. When
conversions take place between types that are less than half or more
than double the size we must lower them using sequences of instructions
which go via intermediate types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95447
Move the Suffix string into the VTypeInfo class so we don't need a helper class to get to it.
Adjust pseudo naming scheme for FPRs to put F16/F32/F64 in
place of F in the pseudo instruction name rather than as a suffix.
This avoids special cases like VFMERGE from the original patch.
Differential Revision: https://reviews.llvm.org/D95404
When spilling, the spill size will depend on the size of register class.
For .vf vector instructions, it may spill the floating point scalar
argument. In order to use the correct load/store instructions for
spilling, we need to provide the correct floating point register class
for the .vf vector pseudo instructions.
In this commit, we define the .vf pseudo instructions as three
different kinds of pseudo instructions for half/float/double. For
example, PseudoVFADD_M1 will become as PseudoVFADD_F16_M1,
PseudoVFADD_F32_M1, and PseudoVFADD_F64_M1.
Differential Revision: https://reviews.llvm.org/D95234
Original patch by @rogfer01.
This patch adds support for insertelt and extractelt operations on
scalable vectors.
Special care must be taken on RV32 when dealing with i64 vectors as
there are no straightforward ways to insert a 64-bit element without a
register of that size. To that end, both are custom-lowered to different
sequences.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94615
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.
It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94501
1. Op2 type in vrgather.vx should be XLEN instead of SEW
2. Add double type in vrgather-rv32 cases.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95207
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.
This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.
By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.
The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.
I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94286
Upgrade RISC-V V extension to v1.0-08a0b46.
Indexed load/store have ordered and unordered form.
New whole vector load/store.
Differential Revision: https://reviews.llvm.org/D93614
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,
when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...
We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.
The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.
Differential Revision: https://reviews.llvm.org/D94229
Original patch by @rogfer01.
This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94590
This recommits 2c51bef76c.
I've fixed the broken check line from when I renamed the test function.
Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
Differential Revision: https://reviews.llvm.org/D94149
Original patch by @rogfer01.
This patch supports vector truncates, which on RVV must be done in a
series of instructions truncating by one power-of-two at a time. This is
done through custom-lowering and a custom node to avoid LLVM
re-combining the split TRUNCATE nodes.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94796
The vcompress intrinsic is defined such that it requires a tail
undisturbed policy. This patch makes it so we can use the tail
agnostic policy if the user has passed vundefined to the dest
operand.
We need to do something similar for masked policy, but we need
annotation of which instructions use the mask policy first.
Not sure if this is sufficient for scheduling or if we'll need to
select different pseudos that don't have a tied def.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D94566
This patch custom lowers ISD::VSCALE into a csrr vlenb followed
by a shift right by 3 followed by a multiply by the scale amount.
I've added computeKnownBits support to indicate that the csrr vlenb
always produces 3 trailng bits of 0s so the shift right is "exact".
This allows the shift and multiply sequence to be nicely optimized
into a single shift or removed completely when the scale amount is
a power of 2.
The non power of 2 case multiplying by 24 is still producing
suboptimal code. We could remove the right shift and use a
multiply by 3. Hopefully we can improve DAG combine to fix that
since it's not unique to this sequence.
This replaces D94144.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94249
We can use a 0 immediate to avoid needing to materialize 0 into
an FPR first.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94459
If SETO/SETUO aren't legal, they'll be expanded and we'll end up
with 3 comparisons.
SETONE is equivalent to (SETOGT || SETOLT)
so if one of those operations is supported use that expansion. We
don't need both since we can commute the operands to make the other.
SETUEQ can be implemented with !(SETOGT || SETOLT) or (SETULE && SETUGE).
I've only implemented the first because it didn't look like most of the
affected targets had legal SETULE/SETUGE.
Reviewed By: frasercrmck, tlively, nemanjai
Differential Revision: https://reviews.llvm.org/D94450
All i8/i16 and several i32 tests were testing immediate shift amounts
which exceeded the bits in the vector elements, creating poison values.
Amend the tests to test well-behaved shift amounts.
Define the `vfclass` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D94356
Original patch by @rogfer01.
This patch adds ISel patterns for the above operations to the
corresponding vector/vector and vector/scalar RVV instructions, as well
as extra patterns to match operand-swapped scalar/vector vfrsub and
vfrdiv.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94408
Original patch by @rogfer01.
All ordered comparisons except ONE are supported natively, and all
unordered comparisons except UNE are expanded into sequences involving
explicit NaN checks and mask arithmetic.
Additionally, we expand GT,OGT,GE,OGE to their swapped-operand versions, and
pattern-match those back to the "original", swapping operands once more. This
way we catch both operations and both "vf" and "fv" forms with fewer patterns.
Also add support for floating-point splat_vector, with an optimization for
splatting fpimm0.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94242
Original patch by @rogfer01.
The RVV integer comparison instructions are defined in such a way that
many LLVM operations are defined by using the "opposite" comparison
instruction and swapping the operands. This is done in this patch in
most cases, except for the mappings where the immediate range must be
adjusted to accomodate:
va < i --> vmsle{u}.vi vd, va, i-1, vm
va >= i --> vmsgt{u}.vi vd, va, i-1, vm
That is left for future optimization; this patch supports all operations
but in the case of the missing mappings the immediate will be moved to
a scalar register first.
Since there are so many condition codes and operand cases to check, it
was decided to reduce the test burden by only testing the "vscale x 8"
vector types.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94168
This improves llvm::isConstOrConstSplat by allowing it to analyze
ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of
operations using scalable vector types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94168