This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.
The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.
This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D101246
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.
If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D101138
We can have RISCVISelDAGToDAG.cpp call the VT only version by
finding the RISCVTargetLowering object via the Subtarget.
Make the static versions just global static functions in
RISCVISelLowering that can be called by static functions in that
file.
These instructions don't really exist, but we have ways we can
emulate them.
.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.
For unmasked vmsge(u).vx we use:
vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt
For other masked cases we use this sequence:
vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.
Differential Revision: https://reviews.llvm.org/D100925
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.
This work was left as a TODO from D94168.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100096
Many of the operands are handled the same or in the same order
for all these intrinsics. Factor out the code for selecting and
pushing them into the Operands vector.
Differential Revision: https://reviews.llvm.org/D99923
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.
Found while trying to share more code between these custom isel
functions.
This adds a new integer materialization strategy mainly targeted
at 64-bit constants like 0xffffffff where there are 32 or more trailing
ones with leading zeros. We can materialize these by using an addi -1
and srli to restore the leading zeros. This matches what gcc does.
I haven't limited to just these cases though. The implementation
here takes the constant, shifts out all the leading zeros and
shifts ones into the LSBs, creates the new sequence, adds an srli,
and checks if this is shorter than our original strategy.
I've separated the recursive portion into a standalone function
so I could append the new strategy outside of the recursion. Since
external users are no longer using the recursive function, I've
cleaned up the external interface to return the sequence instead of
taking a vector by reference.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D98821
This matches what we do in our isel patterns. In our internal
testing we've found this is needed to make the fast register
allocator happy at -O0. Otherwise it may assign V0 to an earlier
operand and find itself with no registers left when it reaches
the mask operand. By using V0 explicitly, the fast register allocator
will see it when it checks for phys register usages before it
starts allocating vregs. I'll try to update this with a test case.
Unfortunately, this does appear to prevent some instruction reordering
by the pre-RA scheduler which leads to the increased spills seen in
some tests. I suspect that problem could already occur for other
instructions that already used V0 directly.
There's a lot of repeated code here that could do with some
wrapper functions. Not sure if that should be at the level of the
new code that deals with V0. That would require multiple output
parameters to pass the glue, chain and register back. Maybe it
should be at a higher level over the entire set of push_backs.
Reviewed By: frasercrmck, HsiangKai
Differential Revision: https://reviews.llvm.org/D99367
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.
This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.
Differential Revision: https://reviews.llvm.org/D99042
Previously we used selectImm for RV64 and isel patterns for
RV32. This should be NFC, but will allow RV32 and RV64 to share
improvements in the future. For example, it might be useful to
use BSETI from Zbs to make single bit constants.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D98877
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97868
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.
As before, support for the insertion of mask vectors will come in a
separate patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97543
This will allow FrameIndex as the base address instead of
emitting a separate ADDI from isel. eliminateFrameIndex will likely turn
it back into an ADDI, but this makes things consistent with the
SDPatterns and VLPatterns.
I only tested one case for simplicity. I can test more if reviewers
want.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97221
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97192
We just started using a ComplexPattern for sexti32. This updates
zexti32 to match.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D97231
This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.
Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96972
An i64 AssertZExt from a type smaller than i32 has at least 33
leading zeros which mean it has at least 33 sign bits.
Since we have a couple patterns that use two sexti32, I've
switched to a ComplexPattern so tablegen didn't have to generate
9 different permutations.
As noted in the FIXME, maybe we should just call computeNumSignBits,
but we don't have tests that benefit from that yet.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D97130
This should fix the issue reported in D96972.
I don't have a good test case for this without those changes.
Differential Revision: https://reviews.llvm.org/D97082
A previous patch moved the index versions. This moves the rest.
I also removed the custom lowering for VLEFF since we can now
do everything directly in the isel handling.
I had to update getLMUL to handle mask registers to index the
pseudo table correctly for VLE1/VSE1.
This is good for another 15K reduction in llc size.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97097
This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those
which don't align to a vector register boundary. It accomplishes this by
extracting the nearest register-sized subvector (a subregister
operation), then sliding the vector down with VSLIDEDOWN and extracting
the subvector from the first position (a COPY operation).
Since this procedure involves the use of VSCALE and multiplication, the
handling of such operations is done during lowering to simplify the
implementation and make use of DAG combining. This necessitated moving
some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96959
We don't currently create memory operands for these intrinsics,
but there was a suggestion of using the indexed load/store
intrinsics to implement isel for scalable vector gather/scatter.
That may propagate the memory operand from the gather/scatter
ISD nodes.
We had more combinations of data and index lmuls than we needed.
Also add some asserts to verify that the IndexVT and data VT have
the same element count when we isel these pseudo instructions.
There are many legal combinations of index and data VTs supported
for these intrinsics. This results in a lot of isel patterns in
RISCVGenDAGISel.inc.
By adding a separate table similar to what we use for segment
load/stores, we can more efficiently manually select these
intrinsics. We should also be able to reuse this table scalable
vector gather/scatter.
This reduces the llc binary size by ~56K.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D97033
Just like we do for isel patterns, we need to call selectVLOp
to prevent 0 from being selected to X0 by the default isel.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97021
Intrinsic ID is a 32-bit value which made each row of the table 4
byte aligned. The remaining fields used 5 bytes. This meant 3 bytes
of padding per row.
This patch breaks the table into 4 separate tables and indexes them
by properties we know about the intrinsic. NF, masked,
strided, ordered, etc. The indexed load/store tables have no
padding in their rows now.
All together this reduces the size of llc binary by ~28K.
I'm considering adding similar tables for isel of non-segment
load/store as well to cut down the size of the isel table and
probably improve our isel performance. Those tables would need to
indexed from intrinsics, IR loads/stores, gathers/scatters, and
RISCVISD opcodes. So having a table that can be indexed without using
intrinsic ID is more flexible.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D96894
These should be NOPs so we can just replace with the input. This
matches what SVE does with isel patterns for all permutations.
Custom isel saves us from having to list all permurations for
all LMULs.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96921
This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR
(nominally where both operands are scalable vector types) where the
vector, subvector, and index align sufficiently to allow decomposition
to subregister manipulation:
* For extracts, the extracted subvector must correctly align with the
lower elements of a vector register.
* For inserts, the inserted subvector must be at least one full vector
register, and correctly align as above.
This approach should work for fixed-length vector insertion/extraction
too, but that will come later.
Reviewed By: craig.topper, khchen, arcbbb
Differential Revision: https://reviews.llvm.org/D96873
A lot of the code for the masked and unmasked is the same. This
patch adds a boolean to handle the differences so we can share
the code.
Differential Revision: https://reviews.llvm.org/D96841
This is annoying because the condition code legalization belongs
to LegalizeDAG, but our custom handler runs in Legalize vector ops
which occurs earlier.
This adds some of the mask binary operations so that we can combine
multiple compares that we need for expansion.
I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks.
This patch contains a subset of the integer setcc patch as well.
That patch is dependent on the integer binary ops patch. I'll rebase
based on what order the patches go in.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96567
Unlike scalable vectors, I'm only using a ComplexPattern for
the immediate itself. The vmv_v_x is matched explicitly. We igore
the VL argument when matching a binary operator, but we do check
it when matching splat directly.
I left out tests for vXi64 as they fail on rv32 right now.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96365
This patch handles cast-like insert_subvector & extract_subvector
in which case:
1. index starts from 0.
2. inserting a fixed-width vector into a scalable vector,
or extracting a fixed-width vector from a scalable vector.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D96352
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.
So remove them. They can always be added back if something changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96157