Commit Graph

115 Commits

Author SHA1 Message Date
Craig Topper ba63cdb8f2 [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Craig Topper ce09dd54e6 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Craig Topper baa107f018 [RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class
We can have RISCVISelDAGToDAG.cpp call the VT only version by
finding the RISCVTargetLowering object via the Subtarget.

Make the static versions just global static functions in
RISCVISelLowering that can be called by static functions in that
file.
2021-04-23 15:06:10 -07:00
Craig Topper e01c419ecd [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Ben Shi 30e2c7be99 [RISCV] Refactor an optimization of addition with immediate
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100769
2021-04-20 18:04:25 +08:00
Fraser Cormack d737c47137 [RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.

This work was left as a TODO from D94168.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100096
2021-04-12 18:36:45 +01:00
Craig Topper ff902080a9 [RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64.
We don't need the sign extending behavior here and SLLI/SRLI
are able to compress to C.SLLI/C.SRLI.
2021-04-11 13:59:51 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Craig Topper 9895285191 [RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC
ReplaceNode is a void function as is the function that we were
doing this in. While this is valid code, it was a bit confusing.
2021-04-07 12:18:41 -07:00
Craig Topper 3ae03f67fe [RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics.
Many of the operands are handled the same or in the same order
for all these intrinsics. Factor out the code for selecting and
pushing them into the Operands vector.

Differential Revision: https://reviews.llvm.org/D99923
2021-04-06 09:54:24 -07:00
Craig Topper cb1028a0b9 [RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register.
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.

Found while trying to share more code between these custom isel
functions.
2021-04-05 21:28:32 -07:00
Craig Topper d61b40ed27 [RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted
at 64-bit constants like 0xffffffff where there are 32 or more trailing
ones with leading zeros. We can materialize these by using an addi -1
and srli to restore the leading zeros. This matches what gcc does.

I haven't limited to just these cases though. The implementation
here takes the constant, shifts out all the leading zeros and
shifts ones into the LSBs, creates the new sequence, adds an srli,
and checks if this is shorter than our original strategy.

I've separated the recursive portion into a standalone function
so I could append the new strategy outside of the recursion. Since
external users are no longer using the recursive function, I've
cleaned up the external interface to return the sequence instead of
taking a vector by reference.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D98821
2021-04-01 09:12:52 -07:00
Craig Topper 3dd4aa7d09 [RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register.
This matches what we do in our isel patterns. In our internal
testing we've found this is needed to make the fast register
allocator happy at -O0. Otherwise it may assign V0 to an earlier
operand and find itself with no registers left when it reaches
the mask operand. By using V0 explicitly, the fast register allocator
will see it when it checks for phys register usages before it
starts allocating vregs. I'll try to update this with a test case.

Unfortunately, this does appear to prevent some instruction reordering
by the pre-RA scheduler which leads to the increased spills seen in
some tests. I suspect that problem could already occur for other
instructions that already used V0 directly.

There's a lot of repeated code here that could do with some
wrapper functions. Not sure if that should be at the level of the
new code that deals with V0. That would require multiple output
parameters to pass the glue, chain and register back. Maybe it
should be at a higher level over the entire set of push_backs.

Reviewed By: frasercrmck, HsiangKai

Differential Revision: https://reviews.llvm.org/D99367
2021-03-29 10:20:43 -07:00
Craig Topper 5a18c576c4 [RISCV] Don't call CheckAndMask from selectZExti32.
Now that targetShrinkDemandedConstant preserves 0xffffffff masks we
shouldn't need to call computeKnownBits here.
2021-03-25 22:07:41 -07:00
Craig Topper c40cea6f08 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff).
We look for this pattern frequently in isel patterns so its a
good idea to try to preserve it.

This also let's us remove our special isel handling for srliw
and use a direct pattern match of (srl (and X, 0xffffffff), C)
since no bits will be removed from the and mask.

Differential Revision: https://reviews.llvm.org/D99042
2021-03-25 09:03:25 -07:00
Craig Topper 839a46d88f [RISCV] Use selectImm for RV32. NFC
Previously we used selectImm for RV64 and isel patterns for
RV32. This should be NFC, but will allow RV32 and RV64 to share
improvements in the future. For example, it might be useful to
use BSETI from Zbs to make single bit constants.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98877
2021-03-23 08:57:15 -07:00
Craig Topper 92b39c6907 [RISCV] Use getTargetExtractSubreg and getTargetInsertSubreg to simplify some code. NFCI 2021-03-17 12:10:19 -07:00
Fraser Cormack 8e7ceffd0b [RISCV] Fix crash when inserting large fixed-length subvectors
This patch addresses a compiler crash resulting from passing a
fixed-length type to one that expects scalable vector types. An
assertion was added to prevent this regressing in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97868
2021-03-04 09:27:16 +00:00
Fraser Cormack 4ea734e6ec [RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR
operations under one roof. Consequently, with this patch it is possible to
support any fixed-length subvector insertion, not just "cast-like" ones.

As before, support for the insertion of mask vectors will come in a
separate patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97543
2021-03-01 11:38:47 +00:00
Craig Topper b183cbfacd [RISCV] Call SelectBaseAddr on the base pointer in the custom isel for vector loads and stores.
This will allow FrameIndex as the base address instead of
emitting a separate ADDI from isel. eliminateFrameIndex will likely turn
it back into an ADDI, but this makes things consistent with the
SDPatterns and VLPatterns.

I only tested one case for simplicity. I can test more if reviewers
want.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97221
2021-02-26 11:38:23 -08:00
Fraser Cormack 821f8bb29a [RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97192
2021-02-25 11:46:57 +00:00
Craig Topper 159f78fc2f [RISCV] Reuse existing SDLoc and XLenVT in the switch in RISCVISelDAGToDAG::Select. NFC
A SDLoc and XLenVT were already created above the switch.
2021-02-24 21:39:00 -08:00
Craig Topper 9bde29629d [RISCV] Use a ComplexPattern for zexti32 to match sexti32.
We just started using a ComplexPattern for sexti32. This updates
zexti32 to match.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D97231
2021-02-24 16:06:29 -08:00
Fraser Cormack dd68f3cf28 [RISCV] Support insertion of misaligned subvectors
This patch extends the support for RVV INSERT_SUBVECTOR to cover those
which don't align to a vector register boundary. Like the support for
EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the
nearest register-sized subvector (a subregister operation), then sliding
the vector down with VSLIDEDOWN, inserting the subvector to the first
position, and sliding the vector back up again afterwards.

Unlike subvector extraction, for vectors that occupy less than a full
vector register we must preserve the untouched elements. We do this by
lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and
lowering that to a VSLIDEUP with a zero offset. This uses a
tail-undisturbed policy and so has the effect of "sliding in" the
subvector elements while preserving the surrounding ones.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96972
2021-02-23 10:31:06 +00:00
Craig Topper 3231607ce9 [RISCV] Have sexti32 also recognize AssertZExt from types smaller than i32.
An i64 AssertZExt from a type smaller than i32 has at least 33
leading zeros which mean it has at least 33 sign bits.

Since we have a couple patterns that use two sexti32, I've
switched to a ComplexPattern so tablegen didn't have to generate
9 different permutations.

As noted in the FIXME, maybe we should just call computeNumSignBits,
but we don't have tests that benefit from that yet.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D97130
2021-02-22 14:56:22 -08:00
Craig Topper 1cd2a5a7da [RISCV] Add isel support for bitcasts between fixed vector types.
This should fix the issue reported in D96972.

I don't have a good test case for this without those changes.

Differential Revision: https://reviews.llvm.org/D97082
2021-02-22 12:05:46 -08:00
Craig Topper 1aeb927fed [RISCV] Custom isel the rest of the vector load/store intrinsics.
A previous patch moved the index versions. This moves the rest.
I also removed the custom lowering for VLEFF since we can now
do everything directly in the isel handling.

I had to update getLMUL to handle mask registers to index the
pseudo table correctly for VLE1/VSE1.

This is good for another 15K reduction in llc size.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97097
2021-02-22 09:53:46 -08:00
Fraser Cormack 3e1317fd32 [RISCV] Support extraction of misaligned subvectors
This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those
which don't align to a vector register boundary. It accomplishes this by
extracting the nearest register-sized subvector (a subregister
operation), then sliding the vector down with VSLIDEDOWN and extracting
the subvector from the first position (a COPY operation).

Since this procedure involves the use of VSCALE and multiplication, the
handling of such operations is done during lowering to simplify the
implementation and make use of DAG combining. This necessitated moving
some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96959
2021-02-20 15:43:54 +00:00
Craig Topper 71b68fe532 [RISCV] Teach our custom vector load/store intrinsic isel code to propagate memory operands if we have them.
We don't currently create memory operands for these intrinsics,
but there was a suggestion of using the indexed load/store
intrinsics to implement isel for scalable vector gather/scatter.
That may propagate the memory operand from the gather/scatter
ISD nodes.
2021-02-19 19:12:20 -08:00
Craig Topper 7f5b3886e4 [RISCV] Remove unneeded indexed segment load/store vector pseudo instruction.
We had more combinations of data and index lmuls than we needed.

Also add some asserts to verify that the IndexVT and data VT have
the same element count when we isel these pseudo instructions.
2021-02-19 10:28:48 -08:00
Craig Topper d056d5decf [RISCV] Use custom isel for vector indexed load/store intrinsics.
There are many legal combinations of index and data VTs supported
for these intrinsics. This results in a lot of isel patterns in
RISCVGenDAGISel.inc.

By adding a separate table similar to what we use for segment
load/stores, we can more efficiently manually select these
intrinsics. We should also be able to reuse this table scalable
vector gather/scatter.

This reduces the llc binary size by ~56K.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D97033
2021-02-19 10:10:06 -08:00
Craig Topper dbf910f0d9 [RISCV] Prevent selecting a 0 VL to X0 for the segment load/store intrinsics.
Just like we do for isel patterns, we need to call selectVLOp
to prevent 0 from being selected to X0 by the default isel.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97021
2021-02-19 10:07:12 -08:00
Fraser Cormack d9531a3097 [RISCV] Address some clang-tidy warnings. NFCI. 2021-02-19 12:10:28 +00:00
Craig Topper 8ed3bbbcc3 [RISCV] Split zvlsseg searchable table into 4 separate tables. Index by properties rather than intrinsic ID.
Intrinsic ID is a 32-bit value which made each row of the table 4
byte aligned. The remaining fields used 5 bytes. This meant 3 bytes
of padding per row.

This patch breaks the table into 4 separate tables and indexes them
by properties we know about the intrinsic. NF, masked,
strided, ordered, etc. The indexed load/store tables have no
padding in their rows now.

All together this reduces the size of llc binary by ~28K.

I'm considering adding similar tables for isel of non-segment
load/store as well to cut down the size of the isel table and
probably improve our isel performance. Those tables would need to
indexed from intrinsics, IR loads/stores, gathers/scatters, and
RISCVISD opcodes. So having a table that can be indexed without using
intrinsic ID is more flexible.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D96894
2021-02-18 19:00:49 -08:00
Craig Topper c7dd92e8a5 [RISCV] Support isel of scalable vector bitcasts
These should be NOPs so we can just replace with the input. This
matches what SVE does with isel patterns for all permutations.
Custom isel saves us from having to list all permurations for
all LMULs.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96921
2021-02-18 09:01:13 -08:00
Benjamin Kramer ae1e6c3557 [RISCV] Rewrite assert to not give unused variable warnings in Release builds
NFCI
2021-02-18 11:42:36 +01:00
Fraser Cormack d876214990 [RISCV] Begin to support more subvector inserts/extracts
This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR
(nominally where both operands are scalable vector types) where the
vector, subvector, and index align sufficiently to allow decomposition
to subregister manipulation:

* For extracts, the extracted subvector must correctly align with the
lower elements of a vector register.
* For inserts, the inserted subvector must be at least one full vector
register, and correctly align as above.

This approach should work for fixed-length vector insertion/extraction
too, but that will come later.

Reviewed By: craig.topper, khchen, arcbbb

Differential Revision: https://reviews.llvm.org/D96873
2021-02-18 10:18:27 +00:00
Craig Topper 3bdd02735b [RISCV] Localize RISCVZvlssegTable to RISCVISelDAGToDAG.cpp, the only place it is used. 2021-02-17 11:37:28 -08:00
Craig Topper d4353a3101 [RISCV] Merge the handlers for masked and unmasked segment loads/stores.
A lot of the code for the masked and unmasked is the same. This
patch adds a boolean to handle the differences so we can share
the code.

Differential Revision: https://reviews.llvm.org/D96841
2021-02-17 10:08:33 -08:00
Craig Topper 6f30d0035a [RISCV] Merge the vsetvli and vsetvlimax intrinsic selection
These have very similar code just with a different number of
operands and handling for vsetivl.

Differential Revision: https://reviews.llvm.org/D96834
2021-02-17 10:08:33 -08:00
Craig Topper 7ba2e1c601 [RISCV] Add support for fixed vector floating point setcc.
This is annoying because the condition code legalization belongs
to LegalizeDAG, but our custom handler runs in Legalize vector ops
which occurs earlier.

This adds some of the mask binary operations so that we can combine
multiple compares that we need for expansion.

I've also fixed up RISCVISelDAGToDAG.cpp to handle copies of masks.

This patch contains a subset of the integer setcc patch as well.
That patch is dependent on the integer binary ops patch. I'll rebase
based on what order the patches go in.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96567
2021-02-15 12:52:25 -08:00
Craig Topper 3520371ddb [RISCV] Rename the RVVBaseAddr ComplexPattern to just BaseAddr and use it to merge some scalar load/store patterns too. 2021-02-13 12:01:51 -08:00
Craig Topper d32ed9b27e [RISCV] Use a ComplexPattern to merge the PatFrags for removing unneeded masks on shift amounts.
Rather than having patterns with and without an AND, use a
ComplexPattern to handle both cases.

Reduces the isel table by about 700 bytes.
2021-02-12 14:03:23 -08:00
Craig Topper 875c76de2b [RISCV] Add support for matching .vx and .vi forms of binary instructions for fixed vectors.
Unlike scalable vectors, I'm only using a ComplexPattern for
the immediate itself. The vmv_v_x is matched explicitly. We igore
the VL argument when matching a binary operator, but we do check
it when matching splat directly.

I left out tests for vXi64 as they fail on rv32 right now.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96365
2021-02-12 09:18:10 -08:00
ShihPo Hung 9e62c9146d [RISCV] Initial support for insert/extract subvector
This patch handles cast-like insert_subvector & extract_subvector
in which case:
1. index starts from 0.
2. inserting a fixed-width vector into a scalable vector,
   or extracting a fixed-width vector from a scalable vector.

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D96352
2021-02-11 14:35:49 -08:00
Craig Topper fd5adae02c [RISCV] Remove SRO* and SLO* instructions from bitmanip.
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.

So remove them. They can always be added back if something changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96157
2021-02-09 09:35:05 -08:00
Hsiangkai Wang c7189ba785 [RISCV] Add new vector instructions in v0.10.
* Add new vector instructions in v0.10.
 - load/store for mask value vle1.v vse1.v
 - vsetivli for 0-31 immediate vector length.
* Rename vector instructions in v0.10.
 - vfrsqrte7 -> vfrsqrt7
 - vfrece7 -> vfrec7
* Reserve memory width encodings for EEW>128b.

Differential Revision: https://reviews.llvm.org/D95781
2021-02-03 13:28:58 +08:00
Craig Topper 912306ef21 [RISCV] Use a ComplexPattern to merge isel patterns for vector load/store with GPR and FrameIndex addresses.
This reduces the isel table size by about 3000 bytes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95844
2021-02-02 10:20:52 -08:00
Craig Topper e7f9a83499 [RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.
I think this is a more standard way of doing this.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D95833
2021-02-02 00:08:58 -08:00