Summary:
This patch implements two arithmetic intrinsics:
* int_aarch64_sve_abs
* int_aarch64_sve_neg
testing the support for scalable vector types in intrinsics added in D65930.
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D65931
llvm-svn: 371388
Summary:
Adds the following inline asm constraints for SVE:
- w: SVE vector register with full range, Z0 to Z31
- x: Restricted to registers Z0 to Z15 inclusive.
- y: Restricted to registers Z0 to Z7 inclusive.
This change also adds the "z" modifier to interpret a register as an SVE register.
Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened
Reviewed By: sdesmalen
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66302
llvm-svn: 370673
Summary:
* Loads and stores in SVE2 are gather/scatter not contiguous, fixed by
renaming multiclasses to reflect this and also updated comments.
* Remove aliases from load/store multiclasses that reflect the behaviour
of the original form.
* Fix bug in scatter store implementation, vector list should be used as
input, not output.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65392
llvm-svn: 367398
Summary:
* Clarify comment with SVE2 for predicated shifts and move next to other
shift instructions.
* Clarify comments for various instructions.
* Move FCVTX instruction next to other fp conversions.
* Move FLOGB to next to other fp instructions and fix description.
* Remove "cons" from non-constructive multiclass for bitwise shift-right
and accumulate instructions.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65390
llvm-svn: 367396
Summary:
This patch fixes a bug in the following instructions that should have been
implemented as destructive. A destructive instruction is an instruction where
one of the source registers also acts as the destination register. Therefore,
the contents of the source register, when the instruction begins execution, are
replaced by the result of the instruction when the instruction completes
execution [1]:
* SRI/SLI
* EORBT/EORTB
* TBX
* Narrowing top instructions
* FP convert precision instructions
These changes are non-functional from the assembler/diassembler point-of-view
but are necessary for correct codegen.
[1] https://static.docs.arm.com/ddi0584/ae/DDI0584A_e_SVE_supp_armv8A.pdf
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65389
llvm-svn: 367394
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62600
llvm-svn: 362214
Summary:
Patch adds support for the following instructions:
* EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR
Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the
preferred disassembly is .D.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62387
llvm-svn: 361936
Summary:
Patch adds support for the following instructions:
SVE2 floating-point pairwise operations:
* FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62383
llvm-svn: 361933
Summary:
Patch adds support for the following instructions:
SVE2 crypto constructive binary operations:
* SM4EKEY, RAX1
SVE2 crypto destructive binary operations:
* AESE, AESD, SM4E
SVE2 crypto unary operations:
* AESMC, AESIMC
AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and
SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62307
llvm-svn: 361797
Summary:
Patch adds support for the following instructions:
SVE2 histogram generation (segment):
* HISTSEG
SVE2 histogram generation (vector):
* HISTCNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62306
llvm-svn: 361796
Summary:
Patch adds support for the following instructions:
SVE2 bitwise exclusive-or interleaved:
* EORBT, EORTB
SVE2 bitwise permute:
* BEXT, BDEP, BGRP
SVE2 bitwise shift left long:
* SSHLLB, SSHLLT, USHLLB, USHLLT
SVE2 integer add/subtract interleaved long:
* SADDLBT, SSUBLBT, SSUBLTB
BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other
instructions in this group are enabled with +sve2.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62304
llvm-svn: 361795
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62145
llvm-svn: 361619
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:
* SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQSHLR, UQSHLR, SQRSHLR, UQRSHLR
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62140
llvm-svn: 361612
Summary:
This patch adds support for the integer pairwise add and accumulate long
instructions SADALP/UADALP. These instructions are predicated.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62001
llvm-svn: 361154
Summary:
This patch adds support for the predicated integer halving add/sub
instructions:
* SHADD, UHADD, SRHADD, URHADD
* SHSUB, UHSUB, SHSUBR, UHSUBR
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D62000
llvm-svn: 361136
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SQDMLALB, SQDMLALT, SQDMLSLB, SQDMLSLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D61997
llvm-svn: 361005
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SMLALB, SMLALT, UMLALB, UMLALT, SMLSLB, SMLSLT, UMLSLB, UMLSLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61951
llvm-svn: 361003
Summary:
Patch adds support for indexed and unpredicated vectors forms of the
following instructions:
* SMULLB, SMULLT, UMULLB, UMULLT, SQDMULLB, SQDMULLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61936
llvm-svn: 361002
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:
Vector form, e.g.
cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets,
accumulating results in 32-bit elements. The
complex numbers in the second source vector are
rotated by 90 degrees.
cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets,
accumulating results in 64-bit elements.
The complex numbers in the second source
vector are rotated by 180 degrees.
Indexed form, e.g.
cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 32-bit elements.
cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 64-bit elements.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer, rovka
Differential Revision: https://reviews.llvm.org/D61903
llvm-svn: 360870
Summary:
Add support for the following instructions:
* MUL (indexed and unpredicated vectors forms)
* SQDMULH (indexed and unpredicated vectors forms)
* SQRDMULH (indexed and unpredicated vectors forms)
* SMULH (unpredicated, predicated form added in SVE)
* UMULH (unpredicated, predicated form added in SVE)
* PMUL (unpredicated)
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer, rovka
Differential Revision: https://reviews.llvm.org/D61902
llvm-svn: 360867
Summary:
This patch adds support for the following instructions:
MLA mul-add, writing addend (Zda = Zda + Zn * Zm[idx])
MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm[idx])
Predicated forms of these instructions were added in SVE.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D61514
llvm-svn: 360682
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.
llvm-svn: 359457
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.
This patch also adds a variety of tests:
- positive tests for all instructions and forms that accept a
movprfx for either or both predicated and unpredicated forms.
- negative tests for all instructions and forms that do not accept
an unpredicated or predicated movprfx.
- negative tests for the diagnostics that get emitted when a MOVPRFX
instruction is used incorrectly.
This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D49593
llvm-svn: 338261
This patch adds predicated and unpredicated MOVPRFX instructions, which
can be prepended to SVE instructions that are destructive on their first
source operand, to make them a constructive operation, e.g.
add z1.s, p0/m, z1.s, z2.s <=> z1 = z1 + z2
can be made constructive:
movprfx z0, z1
add z0.s, p0/m, z0.s, z2.s <=> z0 = z1 + z2
The predicated MOVPRFX instruction can additionally be used to zero
inactive elements, e.g.
movprfx z0.s, p0/z, z1.s
add z0.s, p0/m, z0.s, z2.s
Not all instructions can be prefixed with the MOVPRFX instruction
which is why this patch also adds a mechanism to validate prefixed
instructions. The exact rules when a MOVPRFX applies is detailed in
the SVE supplement of the Architectural Reference Manual.
This is patch [1/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D49592
llvm-svn: 338258
The WHILE instructions generate a predicate that is true while the
comparison of the first scalar operand (incremented for each predicate
element) with the second scalar operand is true and false thereafter.
WHILELE While incrementing signed scalar less than or equal to scalar
WHILELO While incrementing unsigned scalar lower than scalar
WHILELS While incrementing unsigned scalar lower than or same as scalar
WHILELT While incrementing signed scalar less than scalar
e.g.
whilele p0.s, x0, x1
generates predicate p0 (for 32bit elements) by incrementing
(signed) x0 and comparing that vector to splat(x1).
llvm-svn: 338211
The instructions added in this patch permit active elements within
a vector to be processed sequentially without unpacking the vector.
PFIRST Set the first active element to true.
PNEXT Find next active element in predicate.
CTERMEQ Compare and terminate loop when equal.
CTERMNE Compare and terminate loop when not equal.
llvm-svn: 338210
This patch adds PFALSE (unconditionally sets all elements of
the predicate to false) and PTEST (set the status flags for the
predicate).
llvm-svn: 338198
This patch adds support for instructions that partition a predicate
based on data-dependent termination conditions in a loop.
BRKA Break after the first true condition
BRKAS Break after the first true condition, setting condition flags
BRKB Break before the first true condition
BRKBS Break before the first true condition, setting condition flags
BRKPA Break after the first true condition, propagating from the
previous partition
BRKPAS Break after the first true condition, propagating from the
previous partition, setting condition flags
BRKPB Break before the first true condition, propagating from the
previous partition
BRKPBS Break before the first true condition, propagating from the
previous partition, setting condition flags
BRKN Propagate break to next partition
BKRNS Propagate break to next partition, setting condition flags
llvm-svn: 338196
This patch adds support for various integer reduction operations:
SADDV signed add reduction to scalar
UADDV unsigned add reduction to scalar
SMAXV signed maximum reduction to scalar
SMINV signed minimum reduction to scalar
UMAXV unsigned maximum reduction to scalar
UMINV unsigned minimum reduction to scalar
ANDV logical AND reduction to scalar
ORV logical OR reduction to scalar
EORV logical EOR reduction to scalar
The reduction is predicated, e.g.
smaxv s0, p0, z1.s
performs a signed maximum reduction on active elements in z1,
and stores the (signed max value) result in s0.
llvm-svn: 338126
This patch adds support for various floating-point
reduction operations:
FADDA strictly-ordered add reduction, accumulating in scalar
FADDV recursive add reduction to scalar
FMAXV recursive max reduction to scalar
FMINV recursive min reduction to scalar
FMAXNMV recursive max number reduction to scalar
FMINNMV recursive min number reduction to scalar
The reduction is predicated, e.g.
fadda d0, p0, d0, z1.d
performs the add-reduction in strict order on active elements
in z1, accumulating into d0.
faddv d0, p0, z1.d
performs the add-reduction (not in strict order)
on active elements in z1, storing the result in d0.
llvm-svn: 338123
This patch adds support for transcendental acceleration
instructions 'FEXPA' (exponential accelerator) and 'FTSSEL'
(trigonometric select coefficient).
llvm-svn: 338121