2020-04-22 23:33:11 +08:00
|
|
|
// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -fallow-half-arguments-and-returns -target-feature +mve.fp -verify -fsyntax-only %s
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
|
2021-11-14 03:09:01 +08:00
|
|
|
// REQUIRES: aarch64-registered-target || arm-registered-target
|
|
|
|
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#include <arm_mve.h>
|
|
|
|
|
|
|
|
void test_load_offsets(uint32x4_t addr32, uint64x2_t addr64)
|
|
|
|
{
|
|
|
|
// Offsets that should be a multiple of 8 times 0,1,...,127
|
|
|
|
vldrdq_gather_base_s64(addr64, 0);
|
|
|
|
vldrdq_gather_base_s64(addr64, 8);
|
|
|
|
vldrdq_gather_base_s64(addr64, 2*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, 125*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, 126*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, 127*8);
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vldrdq_gather_base_s64(addr64, -125*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, -126*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, -127*8);
|
|
|
|
vldrdq_gather_base_s64(addr64, 128*8); // expected-error {{argument value 1024 is outside the valid range [-1016, 1016]}}
|
|
|
|
vldrdq_gather_base_s64(addr64, -128*8); // expected-error {{argument value -1024 is outside the valid range [-1016, 1016]}}
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
vldrdq_gather_base_s64(addr64, 4); // expected-error {{argument should be a multiple of 8}}
|
|
|
|
vldrdq_gather_base_s64(addr64, 1); // expected-error {{argument should be a multiple of 8}}
|
|
|
|
|
|
|
|
// Offsets that should be a multiple of 4 times 0,1,...,127
|
|
|
|
vldrwq_gather_base_s32(addr32, 0);
|
|
|
|
vldrwq_gather_base_s32(addr32, 4);
|
|
|
|
vldrwq_gather_base_s32(addr32, 2*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, 125*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, 126*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, 127*4);
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vldrwq_gather_base_s32(addr32, -125*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, -126*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, -127*4);
|
|
|
|
vldrwq_gather_base_s32(addr32, 128*4); // expected-error {{argument value 512 is outside the valid range [-508, 508]}}
|
|
|
|
vldrwq_gather_base_s32(addr32, -128*4); // expected-error {{argument value -512 is outside the valid range [-508, 508]}}
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
vldrwq_gather_base_s32(addr32, 2); // expected-error {{argument should be a multiple of 4}}
|
|
|
|
vldrwq_gather_base_s32(addr32, 1); // expected-error {{argument should be a multiple of 4}}
|
|
|
|
|
|
|
|
// Show that the polymorphic store intrinsics get the right set of
|
|
|
|
// error checks after overload resolution. These ones expand to the
|
|
|
|
// 8-byte granular versions...
|
|
|
|
vstrdq_scatter_base(addr64, 0, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 2*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 125*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 126*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 127*8, addr64);
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrdq_scatter_base(addr64, -125*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, -126*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, -127*8, addr64);
|
|
|
|
vstrdq_scatter_base(addr64, 128*8, addr64); // expected-error {{argument value 1024 is outside the valid range [-1016, 1016]}}
|
|
|
|
vstrdq_scatter_base(addr64, -128*8, addr64); // expected-error {{argument value -1024 is outside the valid range [-1016, 1016]}}
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
vstrdq_scatter_base(addr64, 4, addr64); // expected-error {{argument should be a multiple of 8}}
|
|
|
|
vstrdq_scatter_base(addr64, 1, addr64); // expected-error {{argument should be a multiple of 8}}
|
|
|
|
|
|
|
|
/// ... and these ones to the 4-byte.
|
|
|
|
vstrwq_scatter_base(addr32, 0, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 2*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 125*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 126*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 127*4, addr32);
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base(addr32, -125*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, -126*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, -127*4, addr32);
|
|
|
|
vstrwq_scatter_base(addr32, 128*4, addr32); // expected-error {{argument value 512 is outside the valid range [-508, 508]}}
|
|
|
|
vstrwq_scatter_base(addr32, -128*4, addr32); // expected-error {{argument value -512 is outside the valid range [-508, 508]}}
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
vstrwq_scatter_base(addr32, 2, addr32); // expected-error {{argument should be a multiple of 4}}
|
|
|
|
vstrwq_scatter_base(addr32, 1, addr32); // expected-error {{argument should be a multiple of 4}}
|
|
|
|
}
|
2019-11-15 17:53:15 +08:00
|
|
|
|
|
|
|
void test_lane_indices(uint8x16_t v16, uint16x8_t v8,
|
|
|
|
uint32x4_t v4, uint64x2_t v2)
|
|
|
|
{
|
|
|
|
vgetq_lane_u8(v16, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
|
|
|
|
vgetq_lane_u8(v16, 0);
|
|
|
|
vgetq_lane_u8(v16, 15);
|
|
|
|
vgetq_lane_u8(v16, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
|
|
|
|
|
|
|
|
vgetq_lane_u16(v8, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
|
|
|
|
vgetq_lane_u16(v8, 0);
|
|
|
|
vgetq_lane_u16(v8, 7);
|
|
|
|
vgetq_lane_u16(v8, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
|
|
|
|
|
|
|
|
vgetq_lane_u32(v4, -1); // expected-error {{argument value -1 is outside the valid range [0, 3]}}
|
|
|
|
vgetq_lane_u32(v4, 0);
|
|
|
|
vgetq_lane_u32(v4, 3);
|
|
|
|
vgetq_lane_u32(v4, 4); // expected-error {{argument value 4 is outside the valid range [0, 3]}}
|
|
|
|
|
|
|
|
vgetq_lane_u64(v2, -1); // expected-error {{argument value -1 is outside the valid range [0, 1]}}
|
|
|
|
vgetq_lane_u64(v2, 0);
|
|
|
|
vgetq_lane_u64(v2, 1);
|
|
|
|
vgetq_lane_u64(v2, 2); // expected-error {{argument value 2 is outside the valid range [0, 1]}}
|
|
|
|
|
|
|
|
vsetq_lane_u8(23, v16, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
|
|
|
|
vsetq_lane_u8(23, v16, 0);
|
|
|
|
vsetq_lane_u8(23, v16, 15);
|
|
|
|
vsetq_lane_u8(23, v16, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
|
|
|
|
|
|
|
|
vsetq_lane_u16(23, v8, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
|
|
|
|
vsetq_lane_u16(23, v8, 0);
|
|
|
|
vsetq_lane_u16(23, v8, 7);
|
|
|
|
vsetq_lane_u16(23, v8, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
|
|
|
|
|
|
|
|
vsetq_lane_u32(23, v4, -1); // expected-error {{argument value -1 is outside the valid range [0, 3]}}
|
|
|
|
vsetq_lane_u32(23, v4, 0);
|
|
|
|
vsetq_lane_u32(23, v4, 3);
|
|
|
|
vsetq_lane_u32(23, v4, 4); // expected-error {{argument value 4 is outside the valid range [0, 3]}}
|
|
|
|
|
|
|
|
vsetq_lane_u64(23, v2, -1); // expected-error {{argument value -1 is outside the valid range [0, 1]}}
|
|
|
|
vsetq_lane_u64(23, v2, 0);
|
|
|
|
vsetq_lane_u64(23, v2, 1);
|
|
|
|
vsetq_lane_u64(23, v2, 2); // expected-error {{argument value 2 is outside the valid range [0, 1]}}
|
|
|
|
}
|
2020-01-09 18:49:41 +08:00
|
|
|
|
|
|
|
void test_immediate_shifts(uint8x16_t vb, uint16x8_t vh, uint32x4_t vw)
|
|
|
|
{
|
|
|
|
vshlq_n(vb, 0);
|
|
|
|
vshlq_n(vb, 7);
|
|
|
|
vshlq_n(vh, 0);
|
|
|
|
vshlq_n(vh, 15);
|
|
|
|
vshlq_n(vw, 0);
|
|
|
|
vshlq_n(vw, 31);
|
|
|
|
|
|
|
|
vshlq_n(vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
|
|
|
|
vshlq_n(vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
|
|
|
|
vshlq_n(vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
|
|
|
|
vshlq_n(vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
|
|
|
|
vshlq_n(vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
|
|
|
|
vshlq_n(vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
|
|
|
|
|
|
|
|
vqshlq_n(vb, 0);
|
|
|
|
vqshlq_n(vb, 7);
|
|
|
|
vqshlq_n(vh, 0);
|
|
|
|
vqshlq_n(vh, 15);
|
|
|
|
vqshlq_n(vw, 0);
|
|
|
|
vqshlq_n(vw, 31);
|
|
|
|
|
|
|
|
vqshlq_n(vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
|
|
|
|
vqshlq_n(vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
|
|
|
|
vqshlq_n(vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
|
|
|
|
vqshlq_n(vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
|
|
|
|
vqshlq_n(vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
|
|
|
|
vqshlq_n(vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
|
|
|
|
|
|
|
|
vsliq(vb, vb, 0);
|
|
|
|
vsliq(vb, vb, 7);
|
|
|
|
vsliq(vh, vh, 0);
|
|
|
|
vsliq(vh, vh, 15);
|
|
|
|
vsliq(vw, vw, 0);
|
|
|
|
vsliq(vw, vw, 31);
|
|
|
|
|
|
|
|
vsliq(vb, vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
|
|
|
|
vsliq(vb, vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
|
|
|
|
vsliq(vh, vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
|
|
|
|
vsliq(vh, vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
|
|
|
|
vsliq(vw, vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
|
|
|
|
vsliq(vw, vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
|
|
|
|
|
|
|
|
vshllbq(vb, 1);
|
|
|
|
vshllbq(vb, 8);
|
|
|
|
vshllbq(vh, 1);
|
|
|
|
vshllbq(vh, 16);
|
|
|
|
|
|
|
|
vshllbq(vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
|
|
|
|
vshllbq(vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
|
|
|
|
vshllbq(vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
|
|
|
|
vshllbq(vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
|
|
|
|
|
|
|
|
vshrq(vb, 1);
|
|
|
|
vshrq(vb, 8);
|
|
|
|
vshrq(vh, 1);
|
|
|
|
vshrq(vh, 16);
|
|
|
|
vshrq(vw, 1);
|
|
|
|
vshrq(vw, 32);
|
|
|
|
|
|
|
|
vshrq(vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
|
|
|
|
vshrq(vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
|
|
|
|
vshrq(vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
|
|
|
|
vshrq(vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
|
|
|
|
vshrq(vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 32]}}
|
|
|
|
vshrq(vw, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
|
|
|
|
|
|
|
|
vshrntq(vb, vh, 1);
|
|
|
|
vshrntq(vb, vh, 8);
|
|
|
|
vshrntq(vh, vw, 1);
|
|
|
|
vshrntq(vh, vw, 16);
|
|
|
|
|
|
|
|
vshrntq(vb, vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
|
|
|
|
vshrntq(vb, vh, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
|
|
|
|
vshrntq(vh, vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
|
|
|
|
vshrntq(vh, vw, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
|
|
|
|
|
|
|
|
vsriq(vb, vb, 1);
|
|
|
|
vsriq(vb, vb, 8);
|
|
|
|
vsriq(vh, vh, 1);
|
|
|
|
vsriq(vh, vh, 16);
|
|
|
|
vsriq(vw, vw, 1);
|
|
|
|
vsriq(vw, vw, 32);
|
|
|
|
|
|
|
|
vsriq(vb, vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
|
|
|
|
vsriq(vb, vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
|
|
|
|
vsriq(vh, vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
|
|
|
|
vsriq(vh, vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
|
|
|
|
vsriq(vw, vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 32]}}
|
|
|
|
vsriq(vw, vw, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
|
|
|
|
}
|
[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)
The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.
The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.
This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.
Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 19:53:42 +08:00
|
|
|
|
|
|
|
void test_simd_bic_orr(int16x8_t h, int32x4_t w)
|
|
|
|
{
|
|
|
|
h = vbicq(h, 0x0000);
|
|
|
|
h = vbicq(h, 0x0001);
|
|
|
|
h = vbicq(h, 0x00FF);
|
|
|
|
h = vbicq(h, 0x0100);
|
|
|
|
h = vbicq(h, 0x0101); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
h = vbicq(h, 0x01FF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
h = vbicq(h, 0xFF00);
|
|
|
|
|
|
|
|
w = vbicq(w, 0x00000000);
|
|
|
|
w = vbicq(w, 0x00000001);
|
|
|
|
w = vbicq(w, 0x000000FF);
|
|
|
|
w = vbicq(w, 0x00000100);
|
|
|
|
w = vbicq(w, 0x0000FF00);
|
|
|
|
w = vbicq(w, 0x00010000);
|
|
|
|
w = vbicq(w, 0x00FF0000);
|
|
|
|
w = vbicq(w, 0x01000000);
|
|
|
|
w = vbicq(w, 0xFF000000);
|
|
|
|
w = vbicq(w, 0x01000001); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
w = vbicq(w, 0x01FFFFFF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
|
|
|
|
h = vorrq(h, 0x0000);
|
|
|
|
h = vorrq(h, 0x0001);
|
|
|
|
h = vorrq(h, 0x00FF);
|
|
|
|
h = vorrq(h, 0x0100);
|
|
|
|
h = vorrq(h, 0x0101); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
h = vorrq(h, 0x01FF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
h = vorrq(h, 0xFF00);
|
|
|
|
|
|
|
|
w = vorrq(w, 0x00000000);
|
|
|
|
w = vorrq(w, 0x00000001);
|
|
|
|
w = vorrq(w, 0x000000FF);
|
|
|
|
w = vorrq(w, 0x00000100);
|
|
|
|
w = vorrq(w, 0x0000FF00);
|
|
|
|
w = vorrq(w, 0x00010000);
|
|
|
|
w = vorrq(w, 0x00FF0000);
|
|
|
|
w = vorrq(w, 0x01000000);
|
|
|
|
w = vorrq(w, 0xFF000000);
|
|
|
|
w = vorrq(w, 0x01000001); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
w = vorrq(w, 0x01FFFFFF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
|
|
|
|
}
|
|
|
|
|
|
|
|
void test_simd_vmvn(void)
|
|
|
|
{
|
|
|
|
uint16x8_t h;
|
|
|
|
h = vmvnq_n_u16(0x0000);
|
|
|
|
h = vmvnq_n_u16(0x0001);
|
|
|
|
h = vmvnq_n_u16(0x00FF);
|
|
|
|
h = vmvnq_n_u16(0x0100);
|
|
|
|
h = vmvnq_n_u16(0x0101); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
|
|
|
|
h = vmvnq_n_u16(0x01FF);
|
|
|
|
h = vmvnq_n_u16(0xFF00);
|
|
|
|
|
|
|
|
uint32x4_t w;
|
|
|
|
w = vmvnq_n_u32(0x00000000);
|
|
|
|
w = vmvnq_n_u32(0x00000001);
|
|
|
|
w = vmvnq_n_u32(0x000000FF);
|
|
|
|
w = vmvnq_n_u32(0x00000100);
|
|
|
|
w = vmvnq_n_u32(0x0000FF00);
|
|
|
|
w = vmvnq_n_u32(0x00010000);
|
|
|
|
w = vmvnq_n_u32(0x00FF0000);
|
|
|
|
w = vmvnq_n_u32(0x01000000);
|
|
|
|
w = vmvnq_n_u32(0xFF000000);
|
|
|
|
w = vmvnq_n_u32(0x01000001); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
|
|
|
|
w = vmvnq_n_u32(0x01FFFFFF); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
|
|
|
|
w = vmvnq_n_u32(0x0001FFFF); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
|
|
|
|
w = vmvnq_n_u32(0x000001FF);
|
|
|
|
}
|
[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq.
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.
At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.
Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.
This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.
Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73357
2020-01-31 18:53:31 +08:00
|
|
|
|
|
|
|
void test_vidup(void)
|
|
|
|
{
|
|
|
|
vidupq_n_u16(0x12345678, 1);
|
|
|
|
vidupq_n_u16(0x12345678, 2);
|
|
|
|
vidupq_n_u16(0x12345678, 4);
|
|
|
|
vidupq_n_u16(0x12345678, 8);
|
|
|
|
|
|
|
|
vidupq_n_u16(0x12345678, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, 16); // expected-error {{argument value 16 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, -1); // expected-error {{argument value -1 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, -2); // expected-error {{argument value -2 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, -4); // expected-error {{argument value -4 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, -8); // expected-error {{argument value -8 is outside the valid range [1, 8]}}
|
|
|
|
vidupq_n_u16(0x12345678, 3); // expected-error {{argument should be a power of 2}}
|
|
|
|
vidupq_n_u16(0x12345678, 7); // expected-error {{argument should be a power of 2}}
|
|
|
|
}
|
2020-02-07 00:49:45 +08:00
|
|
|
|
|
|
|
void test_vcvtq(void)
|
|
|
|
{
|
|
|
|
uint16x8_t vec_u16;
|
|
|
|
float16x8_t vec_f16;
|
|
|
|
vcvtq_n_f16_u16(vec_u16, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
|
|
|
|
vcvtq_n_f16_u16(vec_u16, 1);
|
|
|
|
vcvtq_n_f16_u16(vec_u16, 16);
|
|
|
|
vcvtq_n_f16_u16(vec_u16, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
|
|
|
|
|
|
|
|
int32x4_t vec_s32;
|
|
|
|
float32x4_t vec_f32;
|
|
|
|
vcvtq_n_s32_f32(vec_s32, -1); // expected-error {{argument value -1 is outside the valid range [1, 32]}}
|
|
|
|
vcvtq_n_s32_f32(vec_s32, 1);
|
|
|
|
vcvtq_n_s32_f32(vec_s32, 32);
|
|
|
|
vcvtq_n_s32_f32(vec_s32, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
|
|
|
|
}
|