This patch handles cast-like insert_subvector & extract_subvector
in which case:
1. index starts from 0.
2. inserting a fixed-width vector into a scalable vector,
or extracting a fixed-width vector from a scalable vector.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D96352
This refines how we determine which masks types are legal and adds
support for loads, stores, and all ones/zeros splats.
I left a fixme in store handling where I think we need to zero
extra bits if the type isn't a multiple of a byte. If I remember
right from X86 there was some case we could have a store of a
1, 2, or 4 bit mask and have a scalar zextload that then expected the
bits to be 0. Its tricky to zero the bits with RVV. We need to do
something like round VL up, zero a register, lower the VL back down,
then do a tail undisturbed move into the zero register. Another
option might be to generate a mask of 1/2/4 bits set with a VL of 8
and use that to mask off the bits.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96468
The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.
I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96186
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.
The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96332
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.
So remove them. They can always be added back if something changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96157
Commit a2d19bad07 introduced a
dependency in the RISCV disassembler on two additional libraries
(MC, RISCVDesc) which wasn't added to the CMakeLists.txt. This
causes shared library builds to break. This patch just adds them
to fix failures seen on some bots, such as the PPC64LE Multistage.
In vector v0.10, there are whole vector register load/store
instructions. I suggest to use the whole register load/store
instructions for generic load/store for scalable vector types. It could
save up vset{i}vl{i} for these load/store.
For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to
load/store partial vector registers.
Differential Revision: https://reviews.llvm.org/D95853
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.
It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.
We only consider LMUL = 1 in this patch.
This patch just an initial work for loop vectorizer for RISC-V Vector.
Differential Revision: https://reviews.llvm.org/D95659
Building on the fixed vector support from D95705
I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.
This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.
Differential Revision: https://reviews.llvm.org/D96108
This is an alternative to D95563.
This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a
Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.
This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.
I'm only testing one vector width right now, but the width is
programmable via the command line.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95705
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.
To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.
For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95800
This ensures that we'll match immediates consistently regardless
of whether we match them as a standalone splat or as part of
another operation.
While I was there I added complexities to the simm5/uimm5 patterns so
we didn't have to assume that the 1 on the non-immediate was lower
than what tablegen inferred.
I had to make a minor tweak to tablegen to fix one place that
didn't expect to see a ComplexPattern that wasn't a "leaf".
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96199
This patch adds support for both the fadd reduction intrinsic, in both
the ordered and unordered modes.
The fmin and fmax intrinsics are not currently supported due to a
discrepancy between the LLVM semantics and the RVV ISA behaviour with
regards to signaling NaNs. This behaviour is likely fixed in version 2.3
of the RISC-V F/D/Q extension, but until then the intrinsics can be left
unsupported.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95870
This patch adds support for the integer reduction intrinsics supported
by RVV. This excludes "mul" which has no corresponding instruction.
The reduction instructions in RVV have slightly complicated type
constraints given they always produce a single "M1" vector register.
They are lowered to custom nodes including the second "scalar" reduction
operand to simplify the patterns and in the hope that they can be useful
for future DAG combines.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95620
This patch custom-legalizes all integer EXTRACT_VECTOR_ELT nodes where
SEW < XLEN to VMV_S_X nodes to help the compiler infer sign bits from
the result. This allows us to eliminate redundant sign extensions.
For parity, all integer EXTRACT_VECTOR_ELT nodes are legalized this way
so that we don't need TableGen patterns for some and not others.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95741
This patch adds support for lowering the sqrt intrinsic to the RVV
vfsqrt instruction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96012
The vrgather.vv instruction uses a vector of indices with the same
SEW as operand 0. The vrgather.vx instructions use a scalar index
operand of XLen bits.
By splitting this into 2 intrinsics we are able to use LLVMatchType
in the definition to avoid specifying the type for the index operand
when creating the IR for the intrinsic. For .vv it will match the
operand 0 type. And for .vx it will match the type of the vl operand
we already needed to specify a type for.
I'm considering splitting more intrinsics. This was a somewhat
odd one because the .vx doesn't use the element type, it always
use XLen.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D95979
Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95869
A follow up patch will add support for commuting operands or
changing opcode to vfmacc and friends.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95662
Rather than materializing the 0xffff immediate for the AND, use
a shift left to remove the upper bits and then shift in zeros
from the right.
This pattern occurs when type legalizing an i16 right shift.
I've implemented this with custom selection code for a number of
reasons. I've limited this to the AND having a single use. We need
to compensate for SimplifyDemandedBits altering the AND mask. I'm
using *W opcodes on RV64. We may want to generlize this in the
future. For all these reason it seemed easiest to do it this way.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95774
We need to add a mask to the shift amount for these operations
to use the FSR/FSL instructions. We were previously doing this
in isel patterns, but custom lowering will make the mask
visible to optimizations earlier.
Not all combinations of SEW and LMUL we need to support. For example, we
only need to support [M1, M2, M4, M8] for SEW = 64. There is no need to
define pseudos for PseudoVLSE64MF8, PseudoVLSE64MF4, and PseudoVLSE64MF2.
Differential Revision: https://reviews.llvm.org/D95667
Various *TargetStreamer.h need formatted_raw_ostream but rely on a
forward declaration of formatted_raw_ostream in MCStreamer.h. This
patch adds forward declarations right in *TargetStreamer.h.
While we are at it, this patch removes the one in MCStreamer.h, where
it is unnecessary.
This patch allows targets to define multiple cost
values for each register so that the cost model
can be more flexible and better used during the
register allocation as per the target requirements.
For AMDGPU the VGPR allocation will be more efficient
if the register cost can be associated dynamically
based on the calling convention.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D86836
These instructions have been removed from the 0.94 bitmanip spec.
We should focus on optimizing the codegen without using them.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95302
This patch adds support for the full range of vector int-to-float,
float-to-int, and float-to-float conversions on legal types.
Many conversions are supported natively in RVV so are lowered with
patterns. These include conversions between (element) types of the same
size, and those that are half/double the size of the input. When
conversions take place between types that are less than half or more
than double the size we must lower them using sequences of instructions
which go via intermediate types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95447
In d2927f786e, I added patterns
to remove (and X, 31) from sllw/srlw/sraw shift amounts.
There is code in SelectionDAGISel.cpp that knows to use
computeKnownBits to fill in bits of the mask that were removed
by SimplifyDemandedBits based on bits being known zero.
The non-W shift patterns use immbottomxlenset which allows the
mask to have more than log2(xlen) trailing ones, but doesn't
have a call to computeKnownBits to fill in bits of the mask that may
have been cleared by SimplifyDemandedBits.
This patch copies code from X86 to handle more than log2(xlen)
bottom bits set and uses computeKnownBits to fill in missing bits
before counting.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95422
RISCVBaseInfo.h belongs to the MC layer, but the Pseudo instructions
are only used by the CodeGen layer. So it makes sense to keep this
table in the CodeGen layer.
-Remove the ISD opcode for READ_VL. Just emit the MachineSDNode directly.
-Move segmented fault first only load intrinsic handling completely to
RISCVISelDAGToDAG.cpp and emit the ReadVL MachineSDNode there
instead of lowering to ISD opcodes first.
Remove the RISCVVMVTs namespace because I don't think it provides
a lot of value. If we change the mappings we'd likely have to add
or remove things from the list anyway.
Add a wrapper around addRegisterClass that can determine the
register class from the fixed size of the type.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D95491
This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.
The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95434
Move the Suffix string into the VTypeInfo class so we don't need a helper class to get to it.
Adjust pseudo naming scheme for FPRs to put F16/F32/F64 in
place of F in the pseudo instruction name rather than as a suffix.
This avoids special cases like VFMERGE from the original patch.
Differential Revision: https://reviews.llvm.org/D95404
When spilling, the spill size will depend on the size of register class.
For .vf vector instructions, it may spill the floating point scalar
argument. In order to use the correct load/store instructions for
spilling, we need to provide the correct floating point register class
for the .vf vector pseudo instructions.
In this commit, we define the .vf pseudo instructions as three
different kinds of pseudo instructions for half/float/double. For
example, PseudoVFADD_M1 will become as PseudoVFADD_F16_M1,
PseudoVFADD_F32_M1, and PseudoVFADD_F64_M1.
Differential Revision: https://reviews.llvm.org/D95234
This pattern can occur when an unsigned is used to index an array
on RV64.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95290
Original patch by @rogfer01.
This patch adds support for insertelt and extractelt operations on
scalable vectors.
Special care must be taken on RV32 when dealing with i64 vectors as
there are no straightforward ways to insert a 64-bit element without a
register of that size. To that end, both are custom-lowered to different
sequences.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94615
This makes our i8/i16 codegen more similar to the i32 codegen.
I've also added computeKnownBits support for DIVUW/REMUW so
that we can remove zero extending ANDs from the output. Without
this we end up turning DIVUW/REMUW back into DIVU/REMU via some
isel patterns.
Reviewed By: frasercrmck, luismarques
Differential Revision: https://reviews.llvm.org/D95322
As far as I know 32 bits arguments and returns on RV64 are always
sign extended to i64. So I think we should be taking this into
account around libcalls.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95285
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.
It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94501
This adds support for ".attribute arch" for all extensions that are
currently supported by the compiler.
Differential Revision: https://reviews.llvm.org/D94931
The patterns that use this really want to know if the operand has at
least 32 sign/zero bits.
This increases opportunities to use W instructions when the original
source used i8/i16. Not sure how much this matters for performance,
but it makes i8/i16 code more consistent with i32.
This avoids being dependent on SimplifyDemandedBits having cleared
those bits.
It could make sense to teach SimplifyDemandedBits to keep all
lower bits 1 in an AND mask when possible. This could be
implemented with slli+srli in the general case rather than
needing to materialize the constant.
We try to do this during DAG combine with SimplifyDemandedBits,
but it fails if there are multiple nodes using the AND. For
example, multiple shifts using the same shift amount.
Similar to our free standing setcc patterns, we can use ADDI to
subtract the immediate from the other operand. Then the cmov
can check if the result is zero or non-zero.
Reviewed By: mundaym
Differential Revision: https://reviews.llvm.org/D95169
This adds an initial set of patterns for these instructions. Its
more complicated that I would like for the sh*add.uw instructions
because there is no guaranteed canonicalization for shl/and with
constants.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95106
These instructions use a portion of the encodings for grevi and
gorci. The full encodings are only supported with Zbp. Note,
rev8 has a different encoding between rv32 and rv64.
Zbb is closer to being finalized that Zbp which has motivated
some decisions in this patch.
I'm treating rev8 and orc.b as separate instructions when
either Zbb or Zbp is enabled. This allows us to print to suggest
that either feature needs to be enabled to support these mnemonics.
I had tried to put HasStdExtZbbAndNotZbp on the Zbb instructions,
but that caused a diagnostic that said Zbp is required if neither
feature is enabled. We should really mention Zbb since its closer
to final.
This does require extra isel patterns for the different cases so
that bswap will always print as rev8 in assembly listing since
we can't use an InstAlias.
llvm-objdump disassembling should always pick the rev8 or orc.b
instructions. llvm-mc parsing and printing text will not convert
the grevi/gorci spellings to rev8/gorc.b. We could probably fix
this with a special case in processInstruction in the assembly
parser if it its important.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94944
zext.h uses the same encoding as pack rd, rs, x0 in rv32 and
packw rd, rs, x0 in rv64. Encodings without x0 as the second source
are not valid in Zbb.
I've added two new instructions with these specific encodings with
predicates that enable them when either Zbb or Zbp is enabled.
The pack spelling will only be accepted with Zbp. The disassembler
will use the zext.h instruction when either feature is enabled.
Using the pack spelling will print as pack when llvm-mc is
emitting text. We could fix this with some custom code in
processInstruction if this is important, but I'm not sure it is.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94818
Zext.h will need to come back to Zbb, but that only uses specific
encodings of pack.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94742
This didn't make it into the published 0.93 spec, but it was the
intention.
But it is in the tex source as of this commit
d172f029c0
This means zext.w now requires Zba. Not sure if we should still use
pack if Zbp is enabled and Zba isn't. I'll leave that for the future
when pack is closer to being final.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94736
The 0.93 spec has this implementation for add.uw
uint_xlen_t adduw(uint_xlen_t rs1, uint_xlen_t rs2) {
uint_xlen_t rs1u = (uint32_t)rs1;
return rs1u + rs2;
}
The 0.92 spec had the usages of rs1 and rs2 swapped.
Reviewed By: frasercrmck, asb
Differential Revision: https://reviews.llvm.org/D95090
Also renamed Zbe instructions to resolve name conflict even though
that change is in the 0.94 draft.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94653
It's not really clear in the spec that these are in Zbp now, but
that's what I've gather from previous commits to the spec. I've
file an issue to get it documented properly.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94652
This is the first of multiple patches to bring our 0.92
implementation up to 0.93.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94568
The DWARF numbers of vector registers are already defined in
riscv-elf-psabi. The DWARF number for vector is start from 96.
Correct the DWARF numbers of vector registers.
Differential Revision: https://reviews.llvm.org/D94749
These instructions produce 2*SEW result so the input can't have
an LMUL=8 or the result would need a non-existant LMUL=16. So
only create pseudos for LMUL up to 4.
Differential Revision: https://reviews.llvm.org/D95189
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.
This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.
By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.
The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.
I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94286
Upgrade RISC-V V extension to v1.0-08a0b46.
Indexed load/store have ordered and unordered form.
New whole vector load/store.
Differential Revision: https://reviews.llvm.org/D93614
This recommits 71ed4b6ce5 with
the polarity of some of the pattern corrected.
Original commit message:
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.
Reviewed By: luismarques, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
Previously we only matched (and (shl X, C1), 0xffffffff << C1)
which matches the InstCombine canonicalization order. But its
possible to see (shl (and X, 0xffffffff), C1) if the pattern
is introduced in SelectionDAG. For example, through expansion of
a GEP.
There can be muliple patterns that map to the same compressed
instruction. Reversing those leads to multiple ways to uncompress
an instruction, but its not easily controllable which one will
be chosen by the tablegen backend.
This patch adds a flag to mark patterns that should only be used
for compressing. This allows us to leave one canonical pattern
for uncompressing.
The obvious benefit of this is getting c.mv to uncompress to
the addi patern that is aliased to the mv pseudoinstruction. For
the add/and/or/xor/li patterns it just removes some unreachable
code from the generated code.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94894
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,
when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...
We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.
The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.
Differential Revision: https://reviews.llvm.org/D94229
Make it easier to reuse for intrinsic vrgatherei16
which needs to encode both LMUL & EMUL in the instruction name,
like PseudoVRGATHEREI16_VV_M1_M1 and PseudoVRGATHEREI16_VV_M1_M2.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94951
NotHasStdExtZbb doesn't have an AssemblerPredicate associated with it
so it didn't do anything. We don't need it either because the sorting
rules in tablegen prioritize by number of predicates. So the
dedicated instructions in the B extension that have predicates
will be prioritized automatically.
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.
Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D94730
Original patch by @rogfer01.
This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94590
This patch factors out the "VLMax" operand passed to most
scalable-vector ISel patterns into a property of each VType.
This is seen as a preparatory change to allow RVV in the future to
more easily support fixed-length vector types with constrained vector
lengths, with the AVL operand set to the length of the fixed-length
vector. It has no effect on the scalable code generation path.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94594
Original patch by @rogfer01.
This patch supports vector truncates, which on RVV must be done in a
series of instructions truncating by one power-of-two at a time. This is
done through custom-lowering and a custom node to avoid LLVM
re-combining the split TRUNCATE nodes.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94796
The vcompress intrinsic is defined such that it requires a tail
undisturbed policy. This patch makes it so we can use the tail
agnostic policy if the user has passed vundefined to the dest
operand.
We need to do something similar for masked policy, but we need
annotation of which instructions use the mask policy first.
Not sure if this is sufficient for scheduling or if we'll need to
select different pseudos that don't have a tied def.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D94566
According to "9. Vector Memory Alignment Constraints" in V
specification, the alignment of vector memory access is aligned to the
size of the element. In our current implementation, we support ELEN up
to 64. We could assume the alignment of vector registers is 64 under the
assumption.
Differential Revision: https://reviews.llvm.org/D94751
SimplifyDemandedBits can remove set bits from immediates from instructions
like AND/OR/XOR. This can prevent them from being efficiently
codegened on RISCV.
This adds an initial version that tries to keep or form 12 bit
sign extended immediates for AND operations to enable use of ANDI.
If that doesn't work we'll try to create a 32 bit sign extended immediate
to use LUI+ADDIW.
More optimizations are possible for different size immediates or
different operations. But this is a good starting point that already
has test coverage.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94628
I noticed in D94450 that there were quite a few places where we generate
the sequence:
```
xN <- comparison ...
xN <- xor xN, 1
bnez xN, symbol
```
Given we know the XOR will be used by BRCOND, which only looks at the lowest
bit, I think we can remove the XOR and just invert the branch condition in
these cases?
The case mostly seems to come up in floating point tests, where there is often
more logic to combine the results of multiple SETCCs, rather than a single
(BRCOND (SETCC ...) ...).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94535
Some FP compares expand to a sequence ending with (xor X, 1) to invert the result. If
the consumer is a select_cc we can likely get rid of this xor by fixing
up the select_cc condition.
This patch combines (select_cc (xor X, 1), 0, setne, trueV, falseV) -
(select_cc X, 0, seteq, trueV, falseV) if we can prove X is 0/1.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D94546
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.
Differential Revision: https://reviews.llvm.org/D93168
This patch custom lowers ISD::VSCALE into a csrr vlenb followed
by a shift right by 3 followed by a multiply by the scale amount.
I've added computeKnownBits support to indicate that the csrr vlenb
always produces 3 trailng bits of 0s so the shift right is "exact".
This allows the shift and multiply sequence to be nicely optimized
into a single shift or removed completely when the scale amount is
a power of 2.
The non power of 2 case multiplying by 24 is still producing
suboptimal code. We could remove the right shift and use a
multiply by 3. Hopefully we can improve DAG combine to fix that
since it's not unique to this sequence.
This replaces D94144.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94249
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.
Reviewed By: lenary, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
We can use a 0 immediate to avoid needing to materialize 0 into
an FPR first.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94459
Define the `vfclass` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D94356
Original patch by @rogfer01.
This patch adds ISel patterns for the above operations to the
corresponding vector/vector and vector/scalar RVV instructions, as well
as extra patterns to match operand-swapped scalar/vector vfrsub and
vfrdiv.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94408
Original patch by @rogfer01.
All ordered comparisons except ONE are supported natively, and all
unordered comparisons except UNE are expanded into sequences involving
explicit NaN checks and mask arithmetic.
Additionally, we expand GT,OGT,GE,OGE to their swapped-operand versions, and
pattern-match those back to the "original", swapping operands once more. This
way we catch both operations and both "vf" and "fv" forms with fewer patterns.
Also add support for floating-point splat_vector, with an optimization for
splatting fpimm0.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94242
The Pseudo class sets isCodeGenOnly=1 which causes the asm strings
in the pseudos to be ignored. I think this is why the aliases are
needed at all.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94024
This patch moves all but the BaseInstr to bits in TSFlags.
For the index fields, we can just use a bit to indicate their presence.
The locations of the operands are well defined.
This reduces the llc binary by about 32K on my build. It also
removes the binary search of the table from the custom inserter.
Instead we just check that the SEW op is present.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D94375
This makes the mask align with the position of the bits in TSFlags
which is a little more logical.
I might be adding more fields to TSFlags and some might be single
bits where just ANDing with mask to test the bit would make sense.
While there rename TargetFlags in validateInstruction to reflect
that it's just the constraint bits.
We currently have about 7000 opcodes in the RISCVGenInstrInfo.inc
enum. We can use uint16_t to store these values. We would need to
grow by nearly 9x before we run out of space so this should last
for a little while.
This reduces the llc binary by 32K.
Original patch by @rogfer01.
The RVV integer comparison instructions are defined in such a way that
many LLVM operations are defined by using the "opposite" comparison
instruction and swapping the operands. This is done in this patch in
most cases, except for the mappings where the immediate range must be
adjusted to accomodate:
va < i --> vmsle{u}.vi vd, va, i-1, vm
va >= i --> vmsgt{u}.vi vd, va, i-1, vm
That is left for future optimization; this patch supports all operations
but in the case of the missing mappings the immediate will be moved to
a scalar register first.
Since there are so many condition codes and operand cases to check, it
was decided to reduce the test burden by only testing the "vscale x 8"
vector types.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94168
The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the
ISD::isBuildVectorAll(Ones|Zeros) helper functions. This was inhibiting
their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In
particular, RISC-V had to define its own 'vnot' fragment.
In order to extend the scope of these nodes to include support for
ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced:
ISD::isConstantSplatVectorAll(Ones|Zeros). These effectively supersede
the older "isBuildVector" predicates, which are now simple wrappers for
the new functions. They pass a defaulted boolean toggle which preserves
the old behaviour. It is hoped that in time all call-sites can be ported
to the "isConstantSplatVector" functions.
While the use of ISD::isBuildVectorAll(Ones|Zeros) has not changed, the
behaviour of the TableGen immAll(Ones|Zeros)V **has**. To test the new
functionality, the custom RISC-V TableGen fragment has been removed and
replaced with the built-in 'vnot'. To test their use as pattern-roots, two
splat patterns have been updated accordingly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94223
This is a first change needed to fix a crash in which the emergency
spill splot ends being out of reach. This happens when we run the
register scavenger after we have eliminated the frame indexes. The fix
for the actual crash will come in a later change.
This change removes an extra stack size increase we do in
RISCVFrameLowering::determineFrameLayout.
We don't have to change the size of the stack here as
PEI::calculateFrameObjectOffsets is already doing this with the right
size accounting the extra alignment.
Differential Revision: https://reviews.llvm.org/D89237
1. Break MUL with specific constant to a SLLI and an ADD/SUB on riscv32
with the M extension.
2. Break MUL with specific constant to two SLLI and an ADD/SUB, if the
constant needs a pair of LUI/ADDI to construct.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D93619
Define the `vfsqrt` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D93745
The patterns that want to use 'vnot' use a custom PatFrag. This is
because 'vnot' uses immAllOnesV which implicitly uses BUILD_VECTOR
rather than SPLAT_VECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94078