[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
|
2020-04-22 23:33:11 +08:00
|
|
|
// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
|
|
|
|
// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
|
2021-11-14 03:09:01 +08:00
|
|
|
// REQUIRES: aarch64-registered-target || arm-registered-target
|
|
|
|
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#include <arm_mve.h>
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i8.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrbq_gather_offset_s16(const int8_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_s16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i8.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrbq_gather_offset_s32(const int8_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_s32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_s8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.v16i8.p0i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <16 x i8> [[TMP0]]
|
|
|
|
//
|
|
|
|
int8x16_t test_vldrbq_gather_offset_s8(const int8_t *base, uint8x16_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_s8(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i8.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrbq_gather_offset_u16(const uint8_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_u16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i8.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrbq_gather_offset_u32(const uint8_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_u32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_u8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.v16i8.p0i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <16 x i8> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint8x16_t test_vldrbq_gather_offset_u8(const uint8_t *base, uint8x16_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_u8(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i8.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrbq_gather_offset_z_s16(const int8_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_s16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i8.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrbq_gather_offset_z_s32(const int8_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_s32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.predicated.v16i8.p0i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <16 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <16 x i8> [[TMP2]]
|
|
|
|
//
|
|
|
|
int8x16_t test_vldrbq_gather_offset_z_s8(const int8_t *base, uint8x16_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_s8(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i8.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrbq_gather_offset_z_u16(const uint8_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_u16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i8.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrbq_gather_offset_z_u32(const uint8_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_u32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.predicated.v16i8.p0i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <16 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <16 x i8> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint8x16_t test_vldrbq_gather_offset_z_u8(const uint8_t *base, uint8x16_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrbq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrbq_gather_offset_z_u8(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 616)
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_base_s64(uint64x2_t addr)
|
|
|
|
{
|
|
|
|
return vldrdq_gather_base_s64(addr, 0x268);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 -336)
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_base_u64(uint64x2_t addr)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrdq_gather_base_u64(addr, -0x150);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_wb_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 576)
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 1
|
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 0
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP3]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_base_wb_s64(uint64x2_t *addr)
|
|
|
|
{
|
|
|
|
return vldrdq_gather_base_wb_s64(addr, 0x240);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_wb_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 -328)
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 1
|
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 0
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP3]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_base_wb_u64(uint64x2_t *addr)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrdq_gather_base_wb_u64(addr, -0x148);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_wb_z_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 664, <2 x i1> [[TMP2]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 1
|
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 0
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP5]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_base_wb_z_s64(uint64x2_t *addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrdq_gather_base_wb_z_s64(addr, 0x298, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_wb_z_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 656, <2 x i1> [[TMP2]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 1
|
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 0
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP5]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_base_wb_z_u64(uint64x2_t *addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrdq_gather_base_wb_z_u64(addr, 0x290, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_z_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 888, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_base_z_s64(uint64x2_t addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrdq_gather_base_z_s64(addr, 0x378, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_base_z_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 -1000, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_base_z_u64(uint64x2_t addr, mve_pred16_t p)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrdq_gather_base_z_u64(addr, -0x3e8, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_offset_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_offset_s64(const int64_t *base, uint64x2_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_offset_s64(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_offset_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_offset_u64(const uint64_t *base, uint64x2_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_offset_u64(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_offset_z_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 0, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_offset_z_s64(const int64_t *base, uint64x2_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_offset_z_s64(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_offset_z_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 1, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_offset_z_u64(const uint64_t *base, uint64x2_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_offset_z_u64(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 0)
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_shifted_offset_s64(const int64_t *base, uint64x2_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_shifted_offset_s64(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 1)
|
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_shifted_offset_u64(const uint64_t *base, uint64x2_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_shifted_offset_u64(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_z_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 0, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
int64x2_t test_vldrdq_gather_shifted_offset_z_s64(const int64_t *base, uint64x2_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_shifted_offset_z_s64(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_z_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 1, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint64x2_t test_vldrdq_gather_shifted_offset_z_u64(const uint64_t *base, uint64x2_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrdq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrdq_gather_shifted_offset_z_u64(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.v8f16.p0f16.v8i16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <8 x half> [[TMP0]]
|
|
|
|
//
|
|
|
|
float16x8_t test_vldrhq_gather_offset_f16(const float16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_f16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrhq_gather_offset_s16(const int16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_s16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrhq_gather_offset_s32(const int16_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_s32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrhq_gather_offset_u16(const uint16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_u16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrhq_gather_offset_u32(const uint16_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_u32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_z_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.predicated.v8f16.p0f16.v8i16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x half> [[TMP2]]
|
|
|
|
//
|
|
|
|
float16x8_t test_vldrhq_gather_offset_z_f16(const float16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_z_f16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_z_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrhq_gather_offset_z_s16(const int16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_z_s16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrhq_gather_offset_z_s32(const int16_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_z_s32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_z_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrhq_gather_offset_z_u16(const uint16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_z_u16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_offset_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrhq_gather_offset_z_u32(const uint16_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_offset_z_u32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.v8f16.p0f16.v8i16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
|
|
|
|
// CHECK-NEXT: ret <8 x half> [[TMP0]]
|
|
|
|
//
|
|
|
|
float16x8_t test_vldrhq_gather_shifted_offset_f16(const float16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_f16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrhq_gather_shifted_offset_s16(const int16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_s16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrhq_gather_shifted_offset_s32(const int16_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_s32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 1)
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrhq_gather_shifted_offset_u16(const uint16_t *base, uint16x8_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_u16(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 1)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrhq_gather_shifted_offset_u32(const uint16_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_u32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.predicated.v8f16.p0f16.v8i16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x half> [[TMP2]]
|
|
|
|
//
|
|
|
|
float16x8_t test_vldrhq_gather_shifted_offset_z_f16(const float16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_z_f16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
int16x8_t test_vldrhq_gather_shifted_offset_z_s16(const int16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_z_s16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrhq_gather_shifted_offset_z_s32(const int16_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_z_s32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint16x8_t test_vldrhq_gather_shifted_offset_z_u16(const uint16_t *base, uint16x8_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_z_u16(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrhq_gather_shifted_offset_z_u32(const uint16_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrhq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrhq_gather_shifted_offset_z_u32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.base.v4f32.v4i32(<4 x i32> [[ADDR:%.*]], i32 12)
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP0]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_base_f32(uint32x4_t addr)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_f32(addr, 0xc);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 400)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_base_s32(uint32x4_t addr)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_s32(addr, 0x190);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 284)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_base_u32(uint32x4_t addr)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_u32(addr, 0x11c);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4f32.v4i32(<4 x i32> [[TMP0]], i32 -64)
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP1]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP1]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP3]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_base_wb_f32(uint32x4_t *addr)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrwq_gather_base_wb_f32(addr, -0x40);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 80)
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP3]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_base_wb_s32(uint32x4_t *addr)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_wb_s32(addr, 0x50);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 480)
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP3]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_base_wb_u32(uint32x4_t *addr)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_wb_u32(addr, 0x1e0);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4f32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 -352, <4 x i1> [[TMP2]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP3]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP3]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP5]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_base_wb_z_f32(uint32x4_t *addr, mve_pred16_t p)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrwq_gather_base_wb_z_f32(addr, -0x160, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 276, <4 x i1> [[TMP2]])
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP5]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_base_wb_z_s32(uint32x4_t *addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_wb_z_s32(addr, 0x114, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 88, <4 x i1> [[TMP2]])
|
|
|
|
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 1
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 0
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP5]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_base_wb_z_u32(uint32x4_t *addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_wb_z_u32(addr, 0x58, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_z_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.base.predicated.v4f32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 300, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP2]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_base_z_f32(uint32x4_t addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_z_f32(addr, 0x12c, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 440, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_base_z_s32(uint32x4_t addr, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
return vldrwq_gather_base_z_s32(addr, 0x1b8, p);
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_base_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -300, <4 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_base_z_u32(uint32x4_t addr, mve_pred16_t p)
|
|
|
|
{
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
return vldrwq_gather_base_z_u32(addr, -0x12c, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.v4f32.p0f32.v4i32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP0]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_offset_f32(const float32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_f32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_offset_s32(const int32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_s32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 1)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_offset_u32(const uint32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_u32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_z_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.predicated.v4f32.p0f32.v4i32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP2]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_offset_z_f32(const float32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_z_f32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_offset_z_s32(const int32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_z_s32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_offset_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_offset_z_u32(const uint32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_offset_z_u32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.v4f32.p0f32.v4i32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP0]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_shifted_offset_f32(const float32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_f32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_shifted_offset_s32(const int32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_s32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 1)
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_shifted_offset_u32(const uint32_t *base, uint32x4_t offset)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset(base, offset);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_u32(base, offset);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.predicated.v4f32.p0f32.v4i32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x float> [[TMP2]]
|
|
|
|
//
|
|
|
|
float32x4_t test_vldrwq_gather_shifted_offset_z_f32(const float32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_z_f32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
int32x4_t test_vldrwq_gather_shifted_offset_z_s32(const int32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_z_s32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
|
|
|
|
//
|
|
|
|
uint32x4_t test_vldrwq_gather_shifted_offset_z_u32(const uint32_t *base, uint32x4_t offset, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
return vldrwq_gather_shifted_offset_z(base, offset, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
return vldrwq_gather_shifted_offset_z_u32(base, offset, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v8i16.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_s16(int8_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_s16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v4i32.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_s32(int8_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_s32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v16i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0, <16 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_s8(int8_t *base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_s8(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v8i16.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_u16(uint8_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_u16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v4i32.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_u32(uint8_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_u32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v16i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0, <16 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_p_u8(uint8_t *base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_p_u8(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v8i16.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_s16(int8_t *base, uint16x8_t offset, int16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_s16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v4i32.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_s32(int8_t *base, uint32x4_t offset, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_s32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_s8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v16i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_s8(int8_t *base, uint8x16_t offset, int8x16_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_s8(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v8i16.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_u16(uint8_t *base, uint16x8_t offset, uint16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_u16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v4i32.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_u32(uint8_t *base, uint32x4_t offset, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_u32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrbq_scatter_offset_u8(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v16i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrbq_scatter_offset_u8(uint8_t *base, uint8x16_t offset, uint8x16_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrbq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrbq_scatter_offset_u8(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_p_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 888, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_p_s64(uint64x2_t addr, int64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base_p(addr, 0x378, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_p_s64(addr, 0x378, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_p_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 264, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_p_u64(uint64x2_t addr, uint64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base_p(addr, 0x108, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_p_u64(addr, 0x108, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 408, <2 x i64> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_s64(uint64x2_t addr, int64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base(addr, 0x198, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_s64(addr, 0x198, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 -472, <2 x i64> [[VALUE:%.*]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_u64(uint64x2_t addr, uint64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrdq_scatter_base(addr, -0x1d8, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrdq_scatter_base_u64(addr, -0x1d8, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_p_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 248, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP2]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_wb_p_s64(uint64x2_t *addr, int64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base_wb_p(addr, 0xf8, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_wb_p_s64(addr, 0xf8, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_p_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 136, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP2]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_wb_p_u64(uint64x2_t *addr, uint64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base_wb_p(addr, 0x88, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_wb_p_u64(addr, 0x88, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 208, <2 x i64> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_wb_s64(uint64x2_t *addr, int64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_base_wb(addr, 0xd0, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_base_wb_s64(addr, 0xd0, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 -168, <2 x i64> [[VALUE:%.*]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_base_wb_u64(uint64x2_t *addr, uint64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrdq_scatter_base_wb(addr, -0xa8, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrdq_scatter_base_wb_u64(addr, -0xa8, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_offset_p_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_offset_p_s64(int64_t *base, uint64x2_t offset, int64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_offset_p_s64(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_offset_p_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_offset_p_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_offset_p_u64(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_offset_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_offset_s64(int64_t *base, uint64x2_t offset, int64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_offset_s64(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_offset_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_offset_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_offset_u64(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_p_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_shifted_offset_p_s64(int64_t *base, uint64x2_t offset, int64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_shifted_offset_p_s64(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_p_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
2021-12-03 23:27:58 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3, <2 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_shifted_offset_p_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_shifted_offset_p_u64(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_s64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_shifted_offset_s64(int64_t *base, uint64x2_t offset, int64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_shifted_offset_s64(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_u64(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrdq_scatter_shifted_offset_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrdq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrdq_scatter_shifted_offset_u64(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f16.v8i16.v8f16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_f16(float16_t *base, uint16x8_t offset, float16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_f16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f16.v8i16.v8f16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_p_f16(float16_t *base, uint16x8_t offset, float16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_p_f16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_p_s16(int16_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_p_s16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_p_s32(int16_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_p_s32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_p_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_p_u16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_p_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_p_u32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_s16(int16_t *base, uint16x8_t offset, int16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_s16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_s32(int16_t *base, uint32x4_t offset, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_s32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_u16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_offset_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_offset_u32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f16.v8i16.v8f16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 1)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_f16(float16_t *base, uint16x8_t offset, float16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_f16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_f16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f16.v8i16.v8f16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_p_f16(float16_t *base, uint16x8_t offset, float16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_p_f16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_p_s16(int16_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_p_s16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_p_s32(int16_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_p_s32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_p_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_p_u16(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_p_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_p_u32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_s16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_s16(int16_t *base, uint16x8_t offset, int16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_s16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_s32(int16_t *base, uint32x4_t offset, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_s32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_u16(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_u16(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrhq_scatter_shifted_offset_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrhq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrhq_scatter_shifted_offset_u32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4f32(<4 x i32> [[ADDR:%.*]], i32 380, <4 x float> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_f32(uint32x4_t addr, float32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base(addr, 0x17c, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_f32(addr, 0x17c, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_p_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4f32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -400, <4 x float> [[VALUE:%.*]], <4 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_p_f32(uint32x4_t addr, float32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_p(addr, -0x190, value, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_p_f32(addr, -0x190, value, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 48, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_p_s32(uint32x4_t addr, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base_p(addr, 0x30, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_p_s32(addr, 0x30, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -376, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP1]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_p_u32(uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_p(addr, -0x178, value, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_p_u32(addr, -0x178, value, p);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 156, <4 x i32> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_s32(uint32x4_t addr, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base(addr, 0x9c, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_s32(addr, 0x9c, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 212, <4 x i32> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_u32(uint32x4_t addr, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base(addr, 0xd4, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_u32(addr, 0xd4, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4f32(<4 x i32> [[TMP0]], i32 -412, <4 x float> [[VALUE:%.*]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_f32(uint32x4_t *addr, float32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_wb(addr, -0x19c, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_wb_f32(addr, -0x19c, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4f32.v4i1(<4 x i32> [[TMP0]], i32 236, <4 x float> [[VALUE:%.*]], <4 x i1> [[TMP2]])
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_p_f32(uint32x4_t *addr, float32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base_wb_p(addr, 0xec, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_wb_p_f32(addr, 0xec, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 328, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP2]])
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_p_s32(uint32x4_t *addr, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base_wb_p(addr, 0x148, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_wb_p_s32(addr, 0x148, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
|
|
|
|
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 412, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP2]])
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_p_u32(uint32x4_t *addr, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base_wb_p(addr, 0x19c, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_wb_p_u32(addr, 0x19c, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 -152, <4 x i32> [[VALUE:%.*]])
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_s32(uint32x4_t *addr, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_wb(addr, -0x98, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#else /* POLYMORPHIC */
|
[ARM,MVE] Support -ve offsets in gather-load intrinsics.
Summary:
The ACLE intrinsics with `gather_base` or `scatter_base` in the name
are wrappers on the MVE load/store instructions that take a vector of
base addresses and an immediate offset. The immediate offset can be up
to 127 times the alignment unit, and it can be positive or negative.
At the MC layer, we got that right. But in the Sema error checking for
the wrapping intrinsics, the offset was erroneously constrained to be
positive.
To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that
defines the intrinsics. But that causes integer literals like
`0xfffffffffffffe04` to appear in the autogenerated calls to
`SemaBuiltinConstantArgRange`, which provokes a compiler warning
because that's out of the non-overflowing range of an `int64_t`. So
I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead.
Updated the tests of the Sema checks themselves, and also adjusted a
random sample of the CodeGen tests to actually use negative offsets
and prove they get all the way through code generation without causing
a crash.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72268
2020-01-07 00:33:05 +08:00
|
|
|
vstrwq_scatter_base_wb_s32(addr, -0x98, value);
|
[ARM,MVE] Add intrinsics for gather/scatter load/stores.
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 64, <4 x i32> [[VALUE:%.*]])
|
|
|
|
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_base_wb_u32(uint32x4_t *addr, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_base_wb(addr, 0x40, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_base_wb_u32(addr, 0x40, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f32.v4i32.v4f32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_f32(float32_t *base, uint32x4_t offset, float32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_f32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f32.v4i32.v4f32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_p_f32(float32_t *base, uint32x4_t offset, float32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_p_f32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_p_s32(int32_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_p_s32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_p_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_p_u32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_s32(int32_t *base, uint32x4_t offset, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_s32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_offset_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_offset_u32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f32.v4i32.v4f32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 2)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_f32(float32_t *base, uint32x4_t offset, float32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_f32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_f32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f32.v4i32.v4f32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_p_f32(float32_t *base, uint32x4_t offset, float32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_p_f32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_p_s32(int32_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_p_s32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
|
|
|
|
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_p_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_p_u32(base, offset, value, p);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_s32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_s32(int32_t *base, uint32x4_t offset, int32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_s32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|
|
|
|
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_u32(
|
|
|
|
// CHECK-NEXT: entry:
|
|
|
|
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2)
|
|
|
|
// CHECK-NEXT: ret void
|
|
|
|
//
|
|
|
|
void test_vstrwq_scatter_shifted_offset_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value)
|
|
|
|
{
|
|
|
|
#ifdef POLYMORPHIC
|
|
|
|
vstrwq_scatter_shifted_offset(base, offset, value);
|
|
|
|
#else /* POLYMORPHIC */
|
|
|
|
vstrwq_scatter_shifted_offset_u32(base, offset, value);
|
|
|
|
#endif /* POLYMORPHIC */
|
|
|
|
}
|
|
|
|
|