Commit Graph

302 Commits

Author SHA1 Message Date
Cullen Rhodes 1fe0e6a380 [AArch64][SME] Support ptrue(s) in streaming mode
The ptrue and ptrues instructions are legal in streaming mode, missed in
D106272.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D107807
2021-08-11 07:49:36 +00:00
Bradley Smith 81eafb8a37 [AArch64][SVE] Break false dependencies for inactive lanes of unary operations
Differential Revision: https://reviews.llvm.org/D105889
2021-07-26 15:01:21 +00:00
Caroline Concatto 0bfc26e3a4 [SVE][AArch64] Improve code generation for vector_splice for Imm > 0
This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
    @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
    @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
        EXT  Vector_1, Vector_2, Imm              // Vector_1 = B, C, D + Vector_2 = E

Depends on D105633

Differential Revision: https://reviews.llvm.org/D106273
2021-07-26 11:45:46 +01:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Eli Friedman e41e865b15 [AArch64] Prepare for changes to STEP_VECTOR.
Rewrite patterns to assume that the operand of STEP_VECTOR is a
constant. The old patterns will stop working when the operand is changed
from a Constant to a TargetConstant. (See D105673.)

Add test coverage for certain patterns that weren't exercised by
existing regression tests.

Differential Revision: https://reviews.llvm.org/D105847
2021-07-17 14:13:41 -07:00
Bradley Smith 026bb84bcd [AArch64][SVE] Add ISel patterns for floating point compare with zero instructions
Additionally, lower the floating point compare SVE intrinsics to
SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns.

Differential Revision: https://reviews.llvm.org/D105486
2021-07-08 10:46:12 +00:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Bradley Smith e42ee2d509 [AArch64][SVE] Add support for using reverse forms of SVE2 shifts
When using and ACLE intrinsic for an SVE2 shift, if the predicate passed
has all relevant lanes active, then use a reversed version of the
instruction if beneficial.
2021-06-04 12:56:53 +01:00
Bradley Smith 12a74137b3 [AArch64][SVE] Combine cntp intrinsics with add/sub to produce incp/decp
Depends on D101062

Differential Revision: https://reviews.llvm.org/D102077
2021-05-14 17:16:06 +01:00
Bradley Smith 90ffcb1245 [AArch64][SVE] Add unpredicated vector BIC ISD node
Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831
2021-05-14 16:12:13 +01:00
Bradley Smith 65c89cd1a6 [AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics
When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828
2021-05-10 13:06:02 +01:00
Bradley Smith f8f953c2a6 [AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics
When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101062
2021-05-10 13:05:37 +01:00
Jun Ma b3aeb13892 [AArch64][SVE] Remove index_vector node.
Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593
2021-05-10 11:08:58 +08:00
Jun Ma b310dd1501 [AArch64][SVE] Lower index_vector to step_vector
As discussed in D100107, this patch first convert index_vector to
step_vector, and convert step_vector back to index_vector after LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D100816
2021-04-30 19:04:39 +08:00
Bradley Smith 354604a2a7 [AArch64][SVE] Use SIMD variant of INSR when scalar is the result of a vector extract
At the intrinsic layer the sve.insr operation takes a scalar. When this
scalar is an integer we are forcing a data transition between GPRs and
ZPRs that is potentially costly.

Often the integer scalar is the result of a vector extract, when
performing a reduction for example. In such cases we should keep all
data within the ZPRs.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101169
2021-04-29 12:17:42 +01:00
Caroline Concatto ca9b7e2e2f [AArch64][SVE] Fix crash with icmp+select
This patch changes the lowering of SELECT_CC from Legal to Expand for scalable
vector and adds support for scalable vectors in performSelectCombine.

When selecting the nodes to lower in visitSELECT it checks if it is possible to
use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks
if SELECT_CC is legal or custom to replace SELECT by SELECT_CC.
SELECT_CC used to be legal for scalable vector, so the node changes to
SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC
with scalable vectors. So now the compiler lowers to VSELECT instead of
SELECT_CC.

Differential Revision: https://reviews.llvm.org/D100485
2021-04-21 14:16:27 +01:00
Jun Ma 5c6ac3b4a2 [AArch64][SVE] Combine add and index_vector
This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100107
2021-04-20 11:38:37 +08:00
Bradley Smith f2593a0bd1 [AArch64][SVE] Remove redundant PTEST of MATCH/NMATCH results
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99584
2021-04-12 12:55:00 +01:00
Jun Ma ab3c5fb282 [NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem 2021-04-02 20:05:17 +08:00
Jun Ma 1af373c673 [AArch64][SVE] Codegen dup_lane for dup(vector_extract)
Differential Revision: https://reviews.llvm.org/D99324
2021-03-30 10:35:08 +08:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Bradley Smith 8bad8a43c3 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
David Green 0f435a544a [AArch64] Correct some tablegen operand types. NFC 2021-02-06 14:34:14 +00:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Cameron McInally f4013359b3 [SVE] Add unpacked scalable floating point ZIP/UZP/TRN patterns
Differential Revision: https://reviews.llvm.org/D94193
2021-01-07 09:56:53 -06:00
Bradley Smith c73ae747cb [AArch64][SVE] Add optimization to remove redundant ptest instructions
Co-Authored-by: Graham Hunter <graham.hunter@arm.com>
Co-Authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D93292
2021-01-05 15:28:36 +00:00
Paul Walker eba6deab22 [SVE] Lower vector CTLZ, CTPOP and CTTZ operations.
CTLZ and CTPOP are lowered to CLZ and CNT instructions respectively.

CTTZ is not a native SVE operation but is instead lowered to:
  CTTZ(V) => CTLZ(BITREVERSE(V))

In the case of fixed-length support using SVE we also lower CTTZ
operating on NEON sized vectors because of its reliance on
BITREVERSE which is also lowered to SVE intructions at these lengths.

Differential Revision: https://reviews.llvm.org/D93607
2021-01-05 10:42:35 +00:00
Paul Walker 8eec7294fe [SVE] Lower vector BITREVERSE and BSWAP operations.
These operations are lowered to RBIT and REVB instructions
respectively.  In the case of fixed-length support using SVE we
also lower BITREVERSE operating on NEON sized vectors as this
results in fewer instructions.

Differential Revision: https://reviews.llvm.org/D93606
2020-12-22 16:49:50 +00:00
Paul Walker c0bc169cb1 [NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions.
During isel there's no need to protect illegal types. Patch also
adds a missing unit test for tbl2 intrinsic using bfloat types.

Differential Revision: https://reviews.llvm.org/D93404
2020-12-18 13:20:41 +00:00
Paul Walker 632f4d2747 [NFC] Fix a few SVEInstrInfo related stylistic issues. 2020-12-15 16:10:38 +00:00
Kerry McLaughlin c5ced82c8e [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
Huihui Zhang 1e113c078a [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Paul C. Anagnostopoulos 2871c6c93f [Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.
Differential Revision: https://reviews.llvm.org/D89551
2020-10-19 10:33:55 -04:00
Muhammad Asif Manzoor aab6f7db47 [AArch64][SVE] Add lowering for llvm fabs
Add the functionality to lower fabs for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88679
2020-10-01 19:41:25 -04:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Muhammad Asif Manzoor 3a76de4275 [AArch64][SVE] Add lowering for llvm frecpx
Add the functionality to lower frecpx for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88032
2020-09-23 15:23:54 -04:00
Kerry McLaughlin d0149ba9b4 [SVE][CodeGen] Lower legal integer -> floating point conversions
This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU &
UCVTZ_MERGE_PASSTHRU, which are used to lower both legal
scalable vector [S|U]INT_TO_FP operations and the following intrinsics:
 - llvm.aarch64.sve.scvtf
 - llvm.aarch64.sve.ucvtf

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D87913
2020-09-23 11:53:53 +01:00
David Sherwood 96e52c1364 [SVE][CodeGen] Mark ptrue/pfalse instructions as rematerializable 2020-09-21 16:44:32 +01:00
Paul Walker f3fa954b5b [SVE] Change definition of reduction ISD nodes to have an SVE vector result type.
The current nodes, AArch64::SMAXV_PRED for example, are defined to
return a NEON vector result.  This is incorrect because they modify
the complete SVE register and are thus changed to represent such.

This patch also adds nodes for UADDV_PRED and SADDV_PRED, which
unifies the handling of all SVE reductions.

NOTE: Floating-point reductions are already implemented correctly,
so this patch is essentially making everything consistent with those.

Differential Revision: https://reviews.llvm.org/D87843
2020-09-21 13:16:28 +01:00
Kerry McLaughlin f7185b271f [SVE][CodeGen] Lower floating point -> integer conversions
This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU &
FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector
FP_TO_SINT/FP_TO_UINT operations and the following intrinsics:
 - llvm.aarch64.sve.fcvtzu
 - llvm.aarch64.sve.fcvtzs

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D87232
2020-09-17 14:04:22 +01:00
Muhammad Asif Manzoor fd536eeed9 [AArch64][SVE] Add lowering for llvm fceil
Add the functionality to lower fceil for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84548
2020-08-26 15:59:44 -04:00
Francesco Petrogalli 61dfa00957 [MC][SVE] Fix data operand for instruction alias of `st1d`.
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.

Differential Revision: https://reviews.llvm.org/D86633
2020-08-26 18:22:17 +00:00
Paul Walker 73ac3c0ede [SVE] Lower scalable vector ISD::FNEG operations.
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.

Differential Revision: https://reviews.llvm.org/D86415
2020-08-25 11:22:28 +01:00
Paul Walker 0015b8db8e [SVE] Add ISEL patterns for predicated shifts by an immediate.
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate.  When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.

Differential Revision: https://reviews.llvm.org/D86204
2020-08-20 11:47:20 +01:00
Eli Friedman be944c85f3 [AArch64][SVE] Add patterns for integer mla/mls.
We probably want to introduce pseudo-instructions at some point, like
we have for binary operations, but this seems okay for now.

One thing I'm not sure about is whether we should be doing this as a
DAGCombine instead of directly pattern-matching it. I don't see any big
downside to doing it this way, though.

Differential Revision: https://reviews.llvm.org/D85681
2020-08-18 12:51:16 -07:00
Paul Walker 9f63dc3265 [SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics.
Right shift patterns will no longer incorrectly accept a shift
amount of zero.  At the same time they will allow larger shift
amounts that are now saturated to their upper bound.

Patterns have been extended to enable immediate forms for shifts
taking an arbitrary predicate.

This patch also unifies the code path for immediate parsing so the
i64 based shifts are no longer treated specially.

Differential Revision: https://reviews.llvm.org/D86084
2020-08-18 11:41:26 +01:00
Paul Walker b6c7b7fa31 [SVE] Add ISD nodes for predicated integer extend inreg operations.
These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Differential Revision: https://reviews.llvm.org/D85546
2020-08-11 11:39:26 +01:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Paul Walker 3ed59b775d [SVE] Implement lowering for fixed length vector multiplication.
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327
2020-08-06 11:01:39 +01:00
Paul Walker 4be13b15d6 [SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants.
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
2020-08-04 11:19:17 +01:00