This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.
When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.
Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.
Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98495
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.
Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.
As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.
The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97954
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.
Differential Revision: https://reviews.llvm.org/D97480
This patch change the rvv frame layout that proposed in D94465. In patch D94465, In the eliminateFrameIndex function,
to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class
RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers,
we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency
spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error.
So I change the rvv frame layout as follows:
```
|--------------------------------------|
| arguments passed on the stack |
|--------------------------------------|<--- fp
| callee saved registers |
|--------------------------------------|
| rvv vector objects(local variables |
| and outgoing arguments |
|--------------------------------------|
| realignment field |
|--------------------------------------|
| scalar local variable(also contains|
| emergency spill slot) |
|--------------------------------------|<--- bp
| variable-sized local variables |
|--------------------------------------|<--- sp
```
Differential Revision: https://reviews.llvm.org/D97111
This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways.
Primarily, it removes the use of vslidedown during lowering, and the
vector element is inserted entirely using vslideup with a custom VL and
slide index.
Additionally, lowering of i64-element vectors on RV32 has been optimized
in several ways. When the 64-bit value to insert is the same as the
sign-extension of the lower 32-bits, the codegen can follow the regular
path. When this is not possible, a new sequence of two i32 vslide1up
instructions is used to get the vector element into a vector. This
sequence was suggested by @craig.topper. From there, the value is slid
into the final position for more consistent lowering across RV32 and
RV64.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98250
We don't support any other shuffles currently.
This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.
In the future we can probably use vrgather.vx to do a byte swap
shuffle.
This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.
Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
appropriate 8 bit element, copy to scalar, shift down the
correct bit within the 8 bits we extracted. Not exactly sure
how to describe such a bitcast from i1 vector to i8 vector
within the type system for elements less than 8.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98310
RISCV makes all fixed vector MVTs with size less than or equal
to a command line option legal.
This didn't include v1f16 because it was missing but did include v1f32 and v1f64.
One test is affected where we did test this type, but it is a horizontal
reduction so it is non-sensical. Perhaps we should canonicalize that
away somewhere.
I'm not sure if we should be making v1 types legal, but this will at
least make RISCV consistent across all types.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98365
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.
This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.
This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.
For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.
For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.
For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.
I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.
I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97895
The type legalizer will visit the result before the operands. To
avoid creating an illegal target specific node or falling back to
scalarization, we need to manually split vector operands.
This still doesn't handle the case of non-power of 2 operands
which need to be widened. I'm not sure the type legalizer is
ready for it. I think we would need to insert an
INSERT_SUBVECTOR with the power of 2 type we want, with an undef
first operand, and the non-power of 2 orignal operand as the vector
to insert. Then fill in the neutral elements into the elements the
padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector.
From there we carry on splitting if needed to get to a legal type
then do the target specific code.
The problem with this is the type legalizer doesn't know how to
widen an insert_subvector yet. We would need to add that including
the handling for a non-undef first vector.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98292
I've left mask registers to a future patch as we'll need
to convert them to full vectors, shuffle, and then truncate.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97609
I've included tests that require type legalization to split the
vector. The i64 version of these scalarizes on RV32 due to type
legalization visiting the result before the vector type. So we
have to abort our custom expansion to avoid creating target
specific nodes with an illegal type. Then type legalization ends
up scalarizing. We might be able to fix this by doing custom
splitting for large vectors in our handler to get down to a legal
type.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98102
Previously we set the value to -1, but the SEW information could
be useful for scheduling.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D98062
The default fixed vector expansion uses sra+xor+add since it can't
see that smax is legal due to our custom handling. So we select
smax(X, sub(0, X)) manually.
Scalable vectors are able to use the smax expansion automatically
for most cases. It crashes in one case because getConstant can't build a
SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So
we manually emit a SPLAT_VECTOR_I64 for that case.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97991
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97868
This patch fixes up one case where the fixed-length-vector VL was
dropped (falling back to VLMAX) when inserting vector elements, as the
code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the
fixed-length vector information.
To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL
operand through to the final instruction. This node wraps the RVV
vmv.s.x and vmv.s.f instructions, which were being selected by
insert_vector_elt anyway.
There should be no observable difference in scalable-vector codegen.
There is still one outstanding drop from fixed-length VL to VLMAX, when
an i64 element is inserted into a vector on RV32; the splat (which is
custom legalized) has no notion of the original fixed-length vector
type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97842
This function isn't exercised in lit tests today today according to
the code coverage report. But will be after the tests in D97543 and
D97559.
Posting this patch to help a crash that Fraser hit.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97582
This patch enables support for lowering INSERT_VECTOR_ELT on
fixed-length vector types. The strategy follows that for scalable vector
types.
This patch also includes a quick fix to prevent the compiler infinitely
looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97698
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97692
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.
One test was left disabled due to a crash in the legalizer.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97559
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.
This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97556
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.
As before, support for the insertion of mask vectors will come in a
separate patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97543
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.
Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.
Support was also added for fixed-length truncation to mask types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97475
This patch addresses issues arising from the fact that the index type
used for subvector insertion/extraction is inconsistent between the
intrinsics and SDNodes. The intrinsic forms require i64 whereas the
SDNodes use the type returned by SelectionDAG::getVectorIdxTy.
Rather than update the intrinsic definitions to use an overloaded index
type, this patch fixes the issue by transforming the index to the
correct type as required. Any loss of index bits going from i64 to a
smaller type is unexpected, and will be caught by an assertion in
SelectionDAG::getVectorIdxConstant.
The patch also updates the documentation for INSERT_SUBVECTOR and adds
an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR.
This necessitated changes to AArch64 which was using i64 for
EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed
its codegen after updating the backend accordingly.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D97459
This will allow FrameIndex as the base address instead of
emitting a separate ADDI from isel. eliminateFrameIndex will likely turn
it back into an ADDI, but this makes things consistent with the
SDPatterns and VLPatterns.
I only tested one case for simplicity. I can test more if reviewers
want.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97221
This patch extends the support for scalable-vector int->fp and fp->int
conversions by additionally handling fixed-length vectors.
The existing scalable-vector lowering re-expresses widening/narrowing by
x4+ conversions as standard nodes. The fixed-length vector support slots
in at "the end" of this process by lowering the now equally-sized and
widening/narrowing by x2 nodes to our custom VL versions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97374
This patch extends the support for vector FP_ROUND and FP_EXTEND by
including support for fixed-length vector types. Since fixed-length
vectors use "VL" nodes and scalable vectors can use the standard nodes,
there is slightly more to do in the fixed-length case. A helper function
was introduced to try and reduce the divergent paths. It is expected
that this function will similarly come in useful for lowering the
int-to-fp and fp-to-int operations for fixed-length vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97301
This patch extends support for our custom-lowering of scalable-vector
truncates to include those of fixed-length vectors. It does this by
co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and
VL operands. This avoids unnecessary duplication of patterns and
inflation of the ISel table.
Some truncates go through CONCAT_VECTORS which currently isn't
efficiently handled, as it goes through the stack. This can be improved
upon in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97202
This patch adds support for the custom lowering sign- and zero-extension
of fixed-length vector types. It does so through custom nodes. Since the
source and destination types are (necessarily) of different sizes, it is
possible that the source type is legal whilst the larger destination
type isn't. In this case the legalization makes heavy use of
EXTRACT_SUBVECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97194
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97192
We always create the VL operand using a register, but if we can
determine that it came from an ADDI X0, imm with a sufficiently
small immediate, we can use VSETIVLI.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97332
I've changed to use VL=1 for slidedown and shifts to avoid extra
element processing that we don't need.
The i64 fixed vector handling on i32 isn't great if the vector type
isn't legal due to an ordering issue in type legalization. If the
vector type isn't legal, we fall back to default legalization
which will bitcast the vector to vXi32 and use two independent extracts.
Doing better will require handling several different cases by
manually inserting insert_subvector/extract_subvector to adjust the type
to a legal vector before emitting custom nodes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97319
This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.
Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96972
This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those
which don't align to a vector register boundary. It accomplishes this by
extracting the nearest register-sized subvector (a subregister
operation), then sliding the vector down with VSLIDEDOWN and extracting
the subvector from the first position (a COPY operation).
Since this procedure involves the use of VSCALE and multiplication, the
handling of such operations is done during lowering to simplify the
implementation and make use of DAG combining. This necessitated moving
some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96959
With vector mask registers only allocatable to V0 (VMV0Regs) it is
relatively simple to generate code which uses multiple masks and naively
requires spilling.
This patch aims to improve codegen in such cases by telling LLVM it can
use VRRegs to hold masks. This will prevent spilling in many cases by
having LLVM copy to an available VR register.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97055
Just like we do for isel patterns, we need to call selectVLOp
to prevent 0 from being selected to X0 by the default isel.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97021