As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.
The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.
Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100618
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.
For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.
On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.
For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.
To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100549
Prep work for adding intrinsics in the future.
Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.
The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.
Other reduction types were not deemed to be relevant for mask vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100030
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.
Differential Revision: https://reviews.llvm.org/D98936
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.
We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.
We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.
Differential Revision: https://reviews.llvm.org/D99021
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.
Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.
For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.
That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.
The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D99910
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.
Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.
Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99667
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.
Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99654
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.
In theory this should allow more target independent optimizations
to remain active.
Forgot to amend the Author.
Original commit message:
Header files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
orc.b
Differential Revision: https://reviews.llvm.org/D99320
The default legalization strategy is PromoteFloat which keeps
half in single precision format through multiple floating point
operations. Conversion to/from float is done at loads, stores,
bitcasts, and other places that care about the exact size being 16
bits.
This patches switches to the alternative method softPromoteHalf.
This aims to keep the type in 16-bit format between every operation.
So we promote to float and immediately round for any arithmetic
operation. This should be closer to the IR semantics since we
are rounding after each operation and not accumulating extra
precision across multiple operations. X86 is the only other
target that enables this today. See https://reviews.llvm.org/D73749
I had to update getRegisterTypeForCallingConv to force f16 to
use f32 when the F extension is enabled. This way we can still
pass it in the lower bits of an FPR for ilp32f and lp64f ABIs.
The softPromoteHalf would otherwise always give i16 as the
argument type.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D99148
We need to splat the scalar separately and use .vv, but there is
no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with
swapped operands.
We also need to get VT to use for the splat from an operand rather
than the result since the result VT is nxvXi1.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D99704
There's no target independent ISD opcode for MULHSU, so custom
legalize 2*XLen multiplies ourselves. We have to be a little
careful to prefer MULHU or MULHSU.
I thought about doing this in isel by pattern matching the
(add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided
against this because the add might become part of a chain of adds.
I don't trust DAG combine not to reassociate with other adds making
it difficult to find both pieces again.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D99479
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.
CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.
Reviewed By: asb, HsiangKai
Differential Revision: https://reviews.llvm.org/D99317
This adds almost everything required for supporting the new stepvector
intrinsic on RVV. It is lowered to the existing VID_VL SDNode.
The only exception is a limitation that RV32 cannot yet lower the
intrinsic on i64 vectors. This is because the step operand is
(currently) required to be at least as large as the vector element type.
I will look into patching that out and loosening the requirement to only
an integer pointer type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99594
Without Zfh the half type isn't legal, but it could still be
used as an argument/return in IR. Clang will not generate this today.
Previously we promoted the half value to float for arguments and
returns if the F extension is enabled but Zfh isn't. Then depending on
which ABI is enabled we would pass it in either an FPR or a GPR in
float format.
If the F extension isn't enabled, it would get passed in the lower
16 bits of a GPR in half format.
With this patch the value will always in half format and will be
in the lower bits of a GPR or FPR. This should be consistent
with where the bits are located when Zfh is enabled.
I've based this implementation off of how this is done on ARM.
I've manually nan-boxed the value to 32 bits using integer ops.
It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't
canonicalize nans so should leave the value alone. I think those
are the instructions that could get used on this value.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D98670
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.
This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.
Differential Revision: https://reviews.llvm.org/D99042
This patch adds a small optimization for vector shuffle lowering,
detecting shuffles which can be re-expressed as vector selects.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99270
This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.
This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.
There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99195
This patch builds upon the initial BUILD_VECTOR work introduced in
D98700. It further optimizes the lowering of BUILD_VECTOR by using
VSELECT operations to effectively insert repeated elements into the
vector with relatively few instructions. This allows us to optimize more
BUILD_VECTORs without significantly increasing the size of the generated
code.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98969
This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose
elements are all constants or undef. It lowers such operations by
building up the vector via a series of integer operations, in which
multiple mask elements are inserted into a vector at a time via
i8/i16/i32/i64 element types. The final result is then bitcast from that
integer vector.
We restrict this optimization in certain circumstances when optimizing
for size. If we are required to use more than one integer insert
operation, then it will likely increase code size compared with using a
load from a constant pool.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98860
I've split the gather/scatter custom handler to avoid complicating
it with even more differences between gather/scatter.
Tests are the scalable vector tests with the vscale removed and
dropped the tests that used vector.insert. We're probably not
as thorough on the splitting cases since we use 128 for VLEN here
but scalable vector use a known min size of 64.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98991
Found by adding asserts to LegalizeDAG to catch incorrect result
types being returned.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98964
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98938
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98776
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96486
This patch supports the masked gather intrinsics in RVV.
The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.
Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96263
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98779
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.
In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.
This optimization is disabled when optimizing for size.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98700
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98567
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.
I've used two different implementations based on whether we
have minu/maxu instructions.
Differential Revision: https://reviews.llvm.org/D98683
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.
I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.
I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.
computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98159
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98233
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.
Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.
As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.
The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97954
Types of fractional LMUL and LMUL=1 are all using VR register class. When
using inline asm, it will use the first type in the register class as the
type for the register. It is not necessary the same as the value type. We
need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal
to put the value in the corresponding register class.
Differential Revision: https://reviews.llvm.org/D97480
This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways.
Primarily, it removes the use of vslidedown during lowering, and the
vector element is inserted entirely using vslideup with a custom VL and
slide index.
Additionally, lowering of i64-element vectors on RV32 has been optimized
in several ways. When the 64-bit value to insert is the same as the
sign-extension of the lower 32-bits, the codegen can follow the regular
path. When this is not possible, a new sequence of two i32 vslide1up
instructions is used to get the vector element into a vector. This
sequence was suggested by @craig.topper. From there, the value is slid
into the final position for more consistent lowering across RV32 and
RV64.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98250
We don't support any other shuffles currently.
This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.
In the future we can probably use vrgather.vx to do a byte swap
shuffle.
This uses a really simple approach of converting to an i8 vector
and extracting. This is probably not the best approach especially
if you know the index is constant.
Other ideas:
-Store to stack temporary using vse1, load as scalar and shift.
-Sort of bitcast the vector to a vector of i8, slide down the
appropriate 8 bit element, copy to scalar, shift down the
correct bit within the 8 bits we extracted. Not exactly sure
how to describe such a bitcast from i1 vector to i8 vector
within the type system for elements less than 8.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98310
On riscv32, i64 isn't a legal scalar type but we would like to
support scalable vectors of i64.
This patch introduces a new node that can represent a splat made
of multiple scalar values. I've used this new node to solve the current
crashes we experience when getConstant is used after type legalization.
For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS
when needed and then handling the SPLAT_VECTOR_PARTS later during
LegalizeOps. I've remove the special case I previously put in for
ABS for D97991 as the default expansion is now able to succesfully
use getConstant.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98004
Currently we crash in type legalization any time an intrinsic
uses a scalar i64 on RV32.
This patch adds support for type legalizing this to prevent
crashing. I don't promise that it uses the best possible codegen
just that it is functional.
This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x
intrinsic and intrinsics that take a scalar input, splat it and
then do some operation.
For vmv.v.x we'll either rely on hardware sign extension for
constants or we'll convert it to multiple splats and bit
manipulation.
For vmv.s.x we use a really unoptimal sequence inspired by what
we do for an INSERT_VECTOR_ELT.
For the third case we'll either try to use the .vi form for
constants or convert to a complicated splat and bitmanip and use
the .vv form of the operation.
I've renamed the ExtendOperand field to SplatOperand now use it
specifically for the third case. The first two cases are handled
by custom lowering specifically for those intrinsics.
I haven't updated all tests yet, but I tried to cover a subset
that includes single-width, widening, and narrowing.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97895
The type legalizer will visit the result before the operands. To
avoid creating an illegal target specific node or falling back to
scalarization, we need to manually split vector operands.
This still doesn't handle the case of non-power of 2 operands
which need to be widened. I'm not sure the type legalizer is
ready for it. I think we would need to insert an
INSERT_SUBVECTOR with the power of 2 type we want, with an undef
first operand, and the non-power of 2 orignal operand as the vector
to insert. Then fill in the neutral elements into the elements the
padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector.
From there we carry on splitting if needed to get to a legal type
then do the target specific code.
The problem with this is the type legalizer doesn't know how to
widen an insert_subvector yet. We would need to add that including
the handling for a non-undef first vector.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98292
I've left mask registers to a future patch as we'll need
to convert them to full vectors, shuffle, and then truncate.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97609
I've included tests that require type legalization to split the
vector. The i64 version of these scalarizes on RV32 due to type
legalization visiting the result before the vector type. So we
have to abort our custom expansion to avoid creating target
specific nodes with an illegal type. Then type legalization ends
up scalarizing. We might be able to fix this by doing custom
splitting for large vectors in our handler to get down to a legal
type.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98102
Previously we set the value to -1, but the SEW information could
be useful for scheduling.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D98062
The default fixed vector expansion uses sra+xor+add since it can't
see that smax is legal due to our custom handling. So we select
smax(X, sub(0, X)) manually.
Scalable vectors are able to use the smax expansion automatically
for most cases. It crashes in one case because getConstant can't build a
SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So
we manually emit a SPLAT_VECTOR_I64 for that case.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97991
While working on adding fixed-length vectors to the calling convention,
it was necessary to be able to query for a fixed-length vector container
type without access to an instance of SelectionDAG.
This patch modifies the "main" getContainerForFixedLengthVector function
to use an instance of TargetLowering rather than SelectionDAG, and
preserves the SelectionDAG overload as a wrapper.
An additional non-static version of the function was also added to
simplify the common case in RISCVTargetLowering.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97925
A setcc can be created during LegalizeDAG after select_cc has been
created. This combine will enable us to fold these late setccs.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98132
This pattern occurs when lowering for overflow operations
introduce an xor after select_cc has already been formed.
I had to rework another combine that looked for select_cc of an xor
with 1. That xor will now get combined away so we just need to
look for the RHS of the select_cc being 1.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98130
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97868
This patch fixes up one case where the fixed-length-vector VL was
dropped (falling back to VLMAX) when inserting vector elements, as the
code would lower via ISD::INSERT_VECTOR_ELT (at index 0) which loses the
fixed-length vector information.
To this end, a custom node, VMV_S_XF_VL, was introduced to carry the VL
operand through to the final instruction. This node wraps the RVV
vmv.s.x and vmv.s.f instructions, which were being selected by
insert_vector_elt anyway.
There should be no observable difference in scalable-vector codegen.
There is still one outstanding drop from fixed-length VL to VLMAX, when
an i64 element is inserted into a vector on RV32; the splat (which is
custom legalized) has no notion of the original fixed-length vector
type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97842
This patch enables support for lowering INSERT_VECTOR_ELT on
fixed-length vector types. The strategy follows that for scalable vector
types.
This patch also includes a quick fix to prevent the compiler infinitely
looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97698
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97692
Like with EXTRACT_SUBVECTOR, INSERT_SUBVECTOR poses a problem
for vector masks as RVV isn't able to slide mask types around. We choose
instead to bitcast to equivalently-sized i8 types where we can, else we
zero-extend, perform the operation, and truncate back down.
One test was left disabled due to a crash in the legalizer.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97559
This patch fixes a bug where the lowering for INSERT_SUBVECTOR and
EXTRACT_SUBVECTOR would insist on first extracting a register-aligned
LMUL1 vector type before perfoming the slide up/down. This was even if
the vector was a fractional LMUL type, in which case the aligned
EXTRACT_SUBVECTOR was invalid.
This issue only occurred for scalable vector types, but a variety of
tests for both scalable and fixed-length vectors have been added to
ensure this does not regress in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97556
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.
As before, support for the insertion of mask vectors will come in a
separate patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97543
This patch adds support for extracting subvectors from vector masks.
This can be either extracting a scalable vector from another, or a fixed-length
vector from a fixed-length or scalable vector.
Since RVV lacks a way to slide vector masks down on an element-wise
basis and we don't know the true length of the vector registers, in many
cases we must resort to using equivalently-sized i8 vectors to perform
the operation. When this is not possible we fall back and extend to a
suitable i8 vector.
Support was also added for fixed-length truncation to mask types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97475
This patch extends the support for scalable-vector int->fp and fp->int
conversions by additionally handling fixed-length vectors.
The existing scalable-vector lowering re-expresses widening/narrowing by
x4+ conversions as standard nodes. The fixed-length vector support slots
in at "the end" of this process by lowering the now equally-sized and
widening/narrowing by x2 nodes to our custom VL versions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97374
This patch extends the support for vector FP_ROUND and FP_EXTEND by
including support for fixed-length vector types. Since fixed-length
vectors use "VL" nodes and scalable vectors can use the standard nodes,
there is slightly more to do in the fixed-length case. A helper function
was introduced to try and reduce the divergent paths. It is expected
that this function will similarly come in useful for lowering the
int-to-fp and fp-to-int operations for fixed-length vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97301
This patch extends support for our custom-lowering of scalable-vector
truncates to include those of fixed-length vectors. It does this by
co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and
VL operands. This avoids unnecessary duplication of patterns and
inflation of the ISel table.
Some truncates go through CONCAT_VECTORS which currently isn't
efficiently handled, as it goes through the stack. This can be improved
upon in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97202
This patch adds support for the custom lowering sign- and zero-extension
of fixed-length vector types. It does so through custom nodes. Since the
source and destination types are (necessarily) of different sizes, it is
possible that the source type is legal whilst the larger destination
type isn't. In this case the legalization makes heavy use of
EXTRACT_SUBVECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97194
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97192
We always create the VL operand using a register, but if we can
determine that it came from an ADDI X0, imm with a sufficiently
small immediate, we can use VSETIVLI.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97332
I've changed to use VL=1 for slidedown and shifts to avoid extra
element processing that we don't need.
The i64 fixed vector handling on i32 isn't great if the vector type
isn't legal due to an ordering issue in type legalization. If the
vector type isn't legal, we fall back to default legalization
which will bitcast the vector to vXi32 and use two independent extracts.
Doing better will require handling several different cases by
manually inserting insert_subvector/extract_subvector to adjust the type
to a legal vector before emitting custom nodes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97319
This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.
Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96972
This should fix the issue reported in D96972.
I don't have a good test case for this without those changes.
Differential Revision: https://reviews.llvm.org/D97082
A previous patch moved the index versions. This moves the rest.
I also removed the custom lowering for VLEFF since we can now
do everything directly in the isel handling.
I had to update getLMUL to handle mask registers to index the
pseudo table correctly for VLE1/VSE1.
This is good for another 15K reduction in llc size.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97097
This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those
which don't align to a vector register boundary. It accomplishes this by
extracting the nearest register-sized subvector (a subregister
operation), then sliding the vector down with VSLIDEDOWN and extracting
the subvector from the first position (a COPY operation).
Since this procedure involves the use of VSCALE and multiplication, the
handling of such operations is done during lowering to simplify the
implementation and make use of DAG combining. This necessitated moving
some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96959
We previously used isel patterns for this, but that used quite
a bit of space in the isel table due to OR being associative
and commutative. It also wouldn't handle shifts/ands being in
reversed order.
This generalizes the shift/and matching from GREVI to
take the expected mask table as input so we can reuse it for
SHFLI.
There is no SHFLIW instruction, but we can promote a 32-bit
SHFLI to i64 on RV64. As long as bit 4 of the control bit isn't
set, a 64-bit SHFLI will preserve 33 sign bits if the input had
at least 33 sign bits. ComputeNumSignBits has been updated to
account for that to avoid sext.w in the tests.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96661
This uses to division by constant optimization to use MULHU/MULHS.
Reviewed By: frasercrmck, arcbbb
Differential Revision: https://reviews.llvm.org/D96934
Due to vXi64 on RV32, I've directly emitted this using _VL ISD
opcodes. If it wasn't for that we could just use fixed vector
BUILD_VECTOR and VSELECT and let those each be legalized.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96910
The type legalizer can call this code based on the scalar type so
we need to verify the vector type is a scalable vector.
I think due to how type legalization visits nodes, the vector type
will have already been legalized so we don't have an issue with
using MVT here like we did for EXTRACT_VECTOR_ELT.
I've added a test just in case.
The type legalizer is calling this code based on the scalar type so
we need to verify the input type is a scalable vector.
The vector type has also not been legalized yet when this is called
so we need to use EVT for it.
This patch adds support for fixed-length vector vselect. It does so by
lowering them to a custom unmasked VSELECT_VL node with a vector length
operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96768
Non-splatted non-integer build_vector nodes were mistakenly being
lowered as VID expressions, which should not happen. VID can only be
used to select integer build_vector nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96718