Reset the tracked emitted instructions when starting scheduling on a new
region.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90347
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.
This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".
Differential Revision: https://reviews.llvm.org/D90296
When moving +0.0 into a float vector, we can use to vi*gpr variants of
INS.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D90176
If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP for the common elements and
INSERT_VECTOR_ELT for the different elements.
Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.
With D90176, the lowering for patterns originating from code like
` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even
better (unnecessary fmov is removed).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D90233
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
BRIND and BR_JT are not implmented yet, so expand them atm.
Add regression tests too.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D90283
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"
Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.
Differential Revision: https://reviews.llvm.org/D90126
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90236
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.
Differential Revision: https://reviews.llvm.org/D90250
Turn on TLS support for PCRel by default and update the test cases.
Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
disable this feature. By Default, it's enabled.
Differential Revision: https://reviews.llvm.org/D89320
This uses PreprocessISelDAG to replace the constant before
instruction selection instead of matching opcodes after.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D89178
In small code model, program and its symbols are linked in the lower 2 GB of
the address space. Try encoding global address even when the range is unknown
in such case.
Differential Revision: https://reviews.llvm.org/D89341
In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.
The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.
Reviewed By: pengfei, RKSimon
Differential Revision: https://reviews.llvm.org/D89767
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling. Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation. These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D88081
This patch implements the set boolean condition instructions introduced in
POWER10.
The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)
Differential Revision: https://reviews.llvm.org/D87705
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.
Differential Revision: https://reviews.llvm.org/D90035
Support atomic load instruction and add a regression test.
VE uses release consitency, so need to insert fence around
atomic instructions. This patch enable AtomicExpandPass
and use emitLeadingFence and emitTrailingFence mechanism
for such purpose.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D90135
This adds a MultiHazardRecognizer and starts to make use of it in the
ARM backend. The idea of the class is to allow multiple independent
hazard recognizers to be added to a single base MultiHazardRecognizer,
allowing them to all work in parallel without requiring them to be
chained into subclasses. They can then be added or not based on cpu or
subtarget features, which will become useful in the ARM backend once
more hazard recognizers are being used for various things.
This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the
process, to more clearly explain what that recognizer is designed for.
Differential Revision: https://reviews.llvm.org/D72939
Support atomic fence instruction and add a regression test.
Add MEMBARRIER pseudo insturction also to use it as a barrier
against to the compiler optimizations.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D90112
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.
Fixes: SWDEV-228562
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89234
The 0xf3 prefix has been defined as wbnoinvd on Icelake Server. So
the prefix isn't ignored by the CPU. AMD documentation suggests that
wbnoinvd is treated as wbinvd on older processors. Intel documentation
is not clear. Perhaps 0xf2 and 0x66 are treated the same, but its
not documented.
This patch changes TB to PS in the td file so 0xf2 and 0x66 will
be treated as errors. This matches versions of objdump after
wbnoinvd was added.
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.
Differential Revision: https://reviews.llvm.org/D90009
We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and
R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be
consistent with GNU as), but not for MOV32rm/TEST32rm/...
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.
Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89973
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.
I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.
Differential Revision: https://reviews.llvm.org/D87930
This value had the default value of 4 which caused branch relaxation to fail.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D90065
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.
That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.
Differential Revision: https://reviews.llvm.org/D90064
There are two optimizations here:
1. Consider the following code:
FCMPSrr %0, %1, implicit-def $nzcv
%sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
%sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
FCMPSrr %0, %1, implicit-def $nzcv
%sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.
However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.
Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.
2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.
This pass is only enabled when optimizations are enabled.
Differential Revision: https://reviews.llvm.org/D89415
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D89831
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions
Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.
First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.
I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.
Differential Revision: https://reviews.llvm.org/D89386
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.
Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.
Added more tests to tables.s
Differential Revision: https://reviews.llvm.org/D89797
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.
Differential Revision: https://reviews.llvm.org/D90028
This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.
AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.
This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.
The current cost returned should be better for throughput, latency and size.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D89953
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.
This
- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.
- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)
- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.
- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.
Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.
Differential Revision: https://reviews.llvm.org/D89823
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)
It still makes sense to select these instructions at -O0.
Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.
This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.
Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.
And also add a 'r' which somehow got dropped from a bunch of function names.
https://reviews.llvm.org/D89820
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU
specific unroll threshold value to be specified on a loop by loop basis.
The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.
Differential Revision: https://reviews.llvm.org/D84779
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.
Differential Revision: https://reviews.llvm.org/D89965
Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger.
All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.
This patch touches two optimizations, TwoAddressInstruction and X86's
FixupLEAs pass, both of which optimize by re-creating instructions. For
LEAs, various bits of arithmetic are better represented as LEAs on X86,
while TwoAddressInstruction sometimes converts instrs into three address
instructions if it's profitable.
For debug instruction referencing, both of these require substitutions to
be created -- the old instruction number must be pointed to the new
instruction number, as illustrated in the added test. If this isn't done,
any variable locations based on the optimized instruction are
conservatively dropped.
Differential Revision: https://reviews.llvm.org/D85756
- Make the SIMemoryLegalizer insertAcquire function be in the same
order for each target to be consistent.
Differential Revision: https://reviews.llvm.org/D89880
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.
There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.
Fixes part of PR47393
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87115
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.
Fixes: SWDEV-256848
Differential Revision: https://reviews.llvm.org/D89599
We use the Real vs Pseudo instruction abstraction for other
types of instructions to facilitate changes in opcode
between gpu generations.
This patch introduces that abstraction to SOPC and SOPP.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89738
Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.
Also fix not being able to functionally run the pass standalone.
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.
This patch makes sure that in this special case any kill flags of it are
removed.
Review: Ulrich Weigand, Eli Friedman
Differential Revision: https://reviews.llvm.org/D89451
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.
Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose
Differential Revision: https://reviews.llvm.org/D88494
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.
Depends on D89753.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D89754
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.
Reviewed By: rampitec, dmgreen
Differential Revision: https://reviews.llvm.org/D89753
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.
In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.
This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.
Differential Revision: https://reviews.llvm.org/D89618
Remove getAllVGPR32() interface and update the SGPR spill code to use
a proper method to get the relevant VGPR registers list.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89806
- In general, a generic point may alias to pointers in all other address
spaces. However, for certain cases enforced by the programming model,
we may found a generic point won't alias to pointers to local objects.
* When a generic pointer is loaded from the constant address space, it
could only be a pointer to the GLOBAL or CONSTANT address space.
Thus, it won't alias to pointers to the PRIVATE or LOCAL address
space.
* When a generic pointer is passed as a kernel argument, it also could
only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
also won't alias to pointers to the PRIVATE or LOCAL address space.
Differential Revision: https://reviews.llvm.org/D89525
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified. Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.
This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D89644
Remove duplicate code and move things around to make it easier to
add additional optimisations to the pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89619
The "Size" value returned by SystemZDisassembler::getInstruction is
used by common code even in the case where the routine returns
failure. If that Size value exceeds the number of bytes remaining
in the section, that could cause disassembler crashes.
Fixed by never returning more than the number of bytes remaining.
This reverts commit 38f625d0d1.
This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D89622
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D89096
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.
Differential Revision: https://reviews.llvm.org/D89501
Before the change attempt to link libLTO.so against shared
LLVM library failed as:
```
[ 76%] Linking CXX shared library ../../lib/libLTO.so
... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1
c++ -o ...libLTO.so.12git ...ibLLVM-12git.so
ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()':
include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo'
```
It happens because on linux llvm build system sets default
symbol visibility to "hidden". The fix is to set visibility
back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47847
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89633
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.
This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).
We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.
Reviewers: vitalybuka, pcc
Subscribers: llvm-commits
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.
I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.
A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D89656
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.
This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"
No functional change intended. At least it has no effect on lit tests.
Differential Revision: https://reviews.llvm.org/D89706
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.
Differential Revision: https://reviews.llvm.org/D80485
S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64().
Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that
do not support 64-bit scalar compare.
Differential Revision: https://reviews.llvm.org/D89536
Add setcc for fp128 and clean existing ISel patterns. Also add
a regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89683
Add missing ISel patterns related to select_cc DAG nodes.
Add regression test of all combination of possible scalar types.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89672
Add VBRD/VMV vector instructions. In order to do that, also support
VM512 registers and RV instruction format in MC layer. Also add
regression tests for new instructions.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89641
Add LSV/LVS/LVM/SVM vector instructions and regression tests.
Also update AsmParser to support new format of operands.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D89499