llvm-project/clang/test/CodeGen/arm-mve-intrinsics/scatter-gather.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

2149 lines
92 KiB
C
Raw Normal View History

[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
// RUN: %clang_cc1 -triple thumbv8.1m.main-none-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// REQUIRES: aarch64-registered-target || arm-registered-target
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#include <arm_mve.h>
// CHECK-LABEL: @test_vldrbq_gather_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i8.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
int16x8_t test_vldrbq_gather_offset_s16(const int8_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_s16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i8.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrbq_gather_offset_s32(const int8_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_s32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_s8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.v16i8.p0i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 0)
// CHECK-NEXT: ret <16 x i8> [[TMP0]]
//
int8x16_t test_vldrbq_gather_offset_s8(const int8_t *base, uint8x16_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_s8(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i8.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
uint16x8_t test_vldrbq_gather_offset_u16(const uint8_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_u16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i8.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrbq_gather_offset_u32(const uint8_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_u32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_u8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.v16i8.p0i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 1)
// CHECK-NEXT: ret <16 x i8> [[TMP0]]
//
uint8x16_t test_vldrbq_gather_offset_u8(const uint8_t *base, uint8x16_t offset)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_u8(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i8.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
int16x8_t test_vldrbq_gather_offset_z_s16(const int8_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_s16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i8.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrbq_gather_offset_z_s32(const int8_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_s32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_s8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.predicated.v16i8.p0i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 0, <16 x i1> [[TMP1]])
// CHECK-NEXT: ret <16 x i8> [[TMP2]]
//
int8x16_t test_vldrbq_gather_offset_z_s8(const int8_t *base, uint8x16_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_s8(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i8.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
uint16x8_t test_vldrbq_gather_offset_z_u16(const uint8_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_u16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i8.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrbq_gather_offset_z_u32(const uint8_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_u32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrbq_gather_offset_z_u8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <16 x i8> @llvm.arm.mve.vldr.gather.offset.predicated.v16i8.p0i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], i32 8, i32 0, i32 1, <16 x i1> [[TMP1]])
// CHECK-NEXT: ret <16 x i8> [[TMP2]]
//
uint8x16_t test_vldrbq_gather_offset_z_u8(const uint8_t *base, uint8x16_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrbq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrbq_gather_offset_z_u8(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_base_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 616)
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
int64x2_t test_vldrdq_gather_base_s64(uint64x2_t addr)
{
return vldrdq_gather_base_s64(addr, 0x268);
}
// CHECK-LABEL: @test_vldrdq_gather_base_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 -336)
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
uint64x2_t test_vldrdq_gather_base_u64(uint64x2_t addr)
{
return vldrdq_gather_base_u64(addr, -0x150);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrdq_gather_base_wb_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 576)
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 1
// CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 0
// CHECK-NEXT: ret <2 x i64> [[TMP3]]
//
int64x2_t test_vldrdq_gather_base_wb_s64(uint64x2_t *addr)
{
return vldrdq_gather_base_wb_s64(addr, 0x240);
}
// CHECK-LABEL: @test_vldrdq_gather_base_wb_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 -328)
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 1
// CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP1]], 0
// CHECK-NEXT: ret <2 x i64> [[TMP3]]
//
uint64x2_t test_vldrdq_gather_base_wb_u64(uint64x2_t *addr)
{
return vldrdq_gather_base_wb_u64(addr, -0x148);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrdq_gather_base_wb_z_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 664, <2 x i1> [[TMP2]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 1
// CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 0
// CHECK-NEXT: ret <2 x i64> [[TMP5]]
//
int64x2_t test_vldrdq_gather_base_wb_z_s64(uint64x2_t *addr, mve_pred16_t p)
{
return vldrdq_gather_base_wb_z_s64(addr, 0x298, p);
}
// CHECK-LABEL: @test_vldrdq_gather_base_wb_z_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call { <2 x i64>, <2 x i64> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 656, <2 x i1> [[TMP2]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 1
// CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[TMP3]], 0
// CHECK-NEXT: ret <2 x i64> [[TMP5]]
//
uint64x2_t test_vldrdq_gather_base_wb_z_u64(uint64x2_t *addr, mve_pred16_t p)
{
return vldrdq_gather_base_wb_z_u64(addr, 0x290, p);
}
// CHECK-LABEL: @test_vldrdq_gather_base_z_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 888, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
int64x2_t test_vldrdq_gather_base_z_s64(uint64x2_t addr, mve_pred16_t p)
{
return vldrdq_gather_base_z_s64(addr, 0x378, p);
}
// CHECK-LABEL: @test_vldrdq_gather_base_z_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 -1000, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
uint64x2_t test_vldrdq_gather_base_z_u64(uint64x2_t addr, mve_pred16_t p)
{
return vldrdq_gather_base_z_u64(addr, -0x3e8, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrdq_gather_offset_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 0)
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
int64x2_t test_vldrdq_gather_offset_s64(const int64_t *base, uint64x2_t offset)
{
#ifdef POLYMORPHIC
return vldrdq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrdq_gather_offset_s64(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_offset_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 1)
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
uint64x2_t test_vldrdq_gather_offset_u64(const uint64_t *base, uint64x2_t offset)
{
#ifdef POLYMORPHIC
return vldrdq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrdq_gather_offset_u64(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_offset_z_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 0, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
int64x2_t test_vldrdq_gather_offset_z_s64(const int64_t *base, uint64x2_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrdq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrdq_gather_offset_z_s64(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_offset_z_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 0, i32 1, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
uint64x2_t test_vldrdq_gather_offset_z_u64(const uint64_t *base, uint64x2_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrdq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrdq_gather_offset_z_u64(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 0)
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
int64x2_t test_vldrdq_gather_shifted_offset_s64(const int64_t *base, uint64x2_t offset)
{
#ifdef POLYMORPHIC
return vldrdq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrdq_gather_shifted_offset_s64(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.v2i64.p0i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 1)
// CHECK-NEXT: ret <2 x i64> [[TMP0]]
//
uint64x2_t test_vldrdq_gather_shifted_offset_u64(const uint64_t *base, uint64x2_t offset)
{
#ifdef POLYMORPHIC
return vldrdq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrdq_gather_shifted_offset_u64(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_z_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 0, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
int64x2_t test_vldrdq_gather_shifted_offset_z_s64(const int64_t *base, uint64x2_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrdq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrdq_gather_shifted_offset_z_s64(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrdq_gather_shifted_offset_z_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i64> @llvm.arm.mve.vldr.gather.offset.predicated.v2i64.p0i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], i32 64, i32 3, i32 1, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <2 x i64> [[TMP2]]
//
uint64x2_t test_vldrdq_gather_shifted_offset_z_u64(const uint64_t *base, uint64x2_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrdq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrdq_gather_shifted_offset_z_u64(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.v8f16.p0f16.v8i16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
// CHECK-NEXT: ret <8 x half> [[TMP0]]
//
float16x8_t test_vldrhq_gather_offset_f16(const float16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_f16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
int16x8_t test_vldrhq_gather_offset_s16(const int16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_s16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 0)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrhq_gather_offset_s32(const int16_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_s32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 1)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
uint16x8_t test_vldrhq_gather_offset_u16(const uint16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_u16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 1)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrhq_gather_offset_u32(const uint16_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_u32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_z_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.predicated.v8f16.p0f16.v8i16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x half> [[TMP2]]
//
float16x8_t test_vldrhq_gather_offset_z_f16(const float16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_z_f16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_z_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
int16x8_t test_vldrhq_gather_offset_z_s16(const int16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_z_s16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrhq_gather_offset_z_s32(const int16_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_z_s32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_z_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 0, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
uint16x8_t test_vldrhq_gather_offset_z_u16(const uint16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_z_u16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_offset_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 0, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrhq_gather_offset_z_u32(const uint16_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_offset_z_u32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.v8f16.p0f16.v8i16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
// CHECK-NEXT: ret <8 x half> [[TMP0]]
//
float16x8_t test_vldrhq_gather_shifted_offset_f16(const float16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_f16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
int16x8_t test_vldrhq_gather_shifted_offset_s16(const int16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_s16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 0)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrhq_gather_shifted_offset_s32(const int16_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_s32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.v8i16.p0i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 1)
// CHECK-NEXT: ret <8 x i16> [[TMP0]]
//
uint16x8_t test_vldrhq_gather_shifted_offset_u16(const uint16_t *base, uint16x8_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_u16(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i16.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 1)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrhq_gather_shifted_offset_u32(const uint16_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_u32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x half> @llvm.arm.mve.vldr.gather.offset.predicated.v8f16.p0f16.v8i16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x half> [[TMP2]]
//
float16x8_t test_vldrhq_gather_shifted_offset_z_f16(const float16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_z_f16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
int16x8_t test_vldrhq_gather_shifted_offset_z_s16(const int16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_z_s16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrhq_gather_shifted_offset_z_s32(const int16_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_z_s32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <8 x i16> @llvm.arm.mve.vldr.gather.offset.predicated.v8i16.p0i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], i32 16, i32 1, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret <8 x i16> [[TMP2]]
//
uint16x8_t test_vldrhq_gather_shifted_offset_z_u16(const uint16_t *base, uint16x8_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_z_u16(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrhq_gather_shifted_offset_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i16.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 16, i32 1, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrhq_gather_shifted_offset_z_u32(const uint16_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrhq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrhq_gather_shifted_offset_z_u32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_base_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.base.v4f32.v4i32(<4 x i32> [[ADDR:%.*]], i32 12)
// CHECK-NEXT: ret <4 x float> [[TMP0]]
//
float32x4_t test_vldrwq_gather_base_f32(uint32x4_t addr)
{
return vldrwq_gather_base_f32(addr, 0xc);
}
// CHECK-LABEL: @test_vldrwq_gather_base_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 400)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrwq_gather_base_s32(uint32x4_t addr)
{
return vldrwq_gather_base_s32(addr, 0x190);
}
// CHECK-LABEL: @test_vldrwq_gather_base_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 284)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrwq_gather_base_u32(uint32x4_t addr)
{
return vldrwq_gather_base_u32(addr, 0x11c);
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4f32.v4i32(<4 x i32> [[TMP0]], i32 -64)
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP1]], 1
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP1]], 0
// CHECK-NEXT: ret <4 x float> [[TMP3]]
//
float32x4_t test_vldrwq_gather_base_wb_f32(uint32x4_t *addr)
{
return vldrwq_gather_base_wb_f32(addr, -0x40);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 80)
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 1
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 0
// CHECK-NEXT: ret <4 x i32> [[TMP3]]
//
int32x4_t test_vldrwq_gather_base_wb_s32(uint32x4_t *addr)
{
return vldrwq_gather_base_wb_s32(addr, 0x50);
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 480)
// CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 1
// CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP1]], 0
// CHECK-NEXT: ret <4 x i32> [[TMP3]]
//
uint32x4_t test_vldrwq_gather_base_wb_u32(uint32x4_t *addr)
{
return vldrwq_gather_base_wb_u32(addr, 0x1e0);
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x float>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4f32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 -352, <4 x i1> [[TMP2]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP3]], 1
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x float>, <4 x i32> } [[TMP3]], 0
// CHECK-NEXT: ret <4 x float> [[TMP5]]
//
float32x4_t test_vldrwq_gather_base_wb_z_f32(uint32x4_t *addr, mve_pred16_t p)
{
return vldrwq_gather_base_wb_z_f32(addr, -0x160, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 276, <4 x i1> [[TMP2]])
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 1
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 0
// CHECK-NEXT: ret <4 x i32> [[TMP5]]
//
int32x4_t test_vldrwq_gather_base_wb_z_s32(uint32x4_t *addr, mve_pred16_t p)
{
return vldrwq_gather_base_wb_z_s32(addr, 0x114, p);
}
// CHECK-LABEL: @test_vldrwq_gather_base_wb_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vldr.gather.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 88, <4 x i1> [[TMP2]])
// CHECK-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 1
// CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[TMP3]], 0
// CHECK-NEXT: ret <4 x i32> [[TMP5]]
//
uint32x4_t test_vldrwq_gather_base_wb_z_u32(uint32x4_t *addr, mve_pred16_t p)
{
return vldrwq_gather_base_wb_z_u32(addr, 0x58, p);
}
// CHECK-LABEL: @test_vldrwq_gather_base_z_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.base.predicated.v4f32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 300, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x float> [[TMP2]]
//
float32x4_t test_vldrwq_gather_base_z_f32(uint32x4_t addr, mve_pred16_t p)
{
return vldrwq_gather_base_z_f32(addr, 0x12c, p);
}
// CHECK-LABEL: @test_vldrwq_gather_base_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 440, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrwq_gather_base_z_s32(uint32x4_t addr, mve_pred16_t p)
{
return vldrwq_gather_base_z_s32(addr, 0x1b8, p);
}
// CHECK-LABEL: @test_vldrwq_gather_base_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -300, <4 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrwq_gather_base_z_u32(uint32x4_t addr, mve_pred16_t p)
{
return vldrwq_gather_base_z_u32(addr, -0x12c, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
}
// CHECK-LABEL: @test_vldrwq_gather_offset_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.v4f32.p0f32.v4i32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0)
// CHECK-NEXT: ret <4 x float> [[TMP0]]
//
float32x4_t test_vldrwq_gather_offset_f32(const float32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_f32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrwq_gather_offset_s32(const int32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_s32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 1)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrwq_gather_offset_u32(const uint32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_u32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_offset_z_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.predicated.v4f32.p0f32.v4i32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x float> [[TMP2]]
//
float32x4_t test_vldrwq_gather_offset_z_f32(const float32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_z_f32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_offset_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrwq_gather_offset_z_s32(const int32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_z_s32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_offset_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 0, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrwq_gather_offset_z_u32(const uint32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_offset_z_u32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.v4f32.p0f32.v4i32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0)
// CHECK-NEXT: ret <4 x float> [[TMP0]]
//
float32x4_t test_vldrwq_gather_shifted_offset_f32(const float32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_f32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
int32x4_t test_vldrwq_gather_shifted_offset_s32(const int32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_s32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.v4i32.p0i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 1)
// CHECK-NEXT: ret <4 x i32> [[TMP0]]
//
uint32x4_t test_vldrwq_gather_shifted_offset_u32(const uint32_t *base, uint32x4_t offset)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset(base, offset);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_u32(base, offset);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x float> @llvm.arm.mve.vldr.gather.offset.predicated.v4f32.p0f32.v4i32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x float> [[TMP2]]
//
float32x4_t test_vldrwq_gather_shifted_offset_z_f32(const float32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_z_f32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
int32x4_t test_vldrwq_gather_shifted_offset_z_s32(const int32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_z_s32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vldrwq_gather_shifted_offset_z_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i32> @llvm.arm.mve.vldr.gather.offset.predicated.v4i32.p0i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], i32 32, i32 2, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret <4 x i32> [[TMP2]]
//
uint32x4_t test_vldrwq_gather_shifted_offset_z_u32(const uint32_t *base, uint32x4_t offset, mve_pred16_t p)
{
#ifdef POLYMORPHIC
return vldrwq_gather_shifted_offset_z(base, offset, p);
#else /* POLYMORPHIC */
return vldrwq_gather_shifted_offset_z_u32(base, offset, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v8i16.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_s16(int8_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_s16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v4i32.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_s32(int8_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_s32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_s8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v16i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0, <16 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_s8(int8_t *base, uint8x16_t offset, int8x16_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_s8(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v8i16.v8i16.v8i1(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_u16(uint8_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_u16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v4i32.v4i32.v4i1(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_u32(uint8_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_u32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_p_u8(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <16 x i1> @llvm.arm.mve.pred.i2v.v16i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i8.v16i8.v16i8.v16i1(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0, <16 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_p_u8(uint8_t *base, uint8x16_t offset, uint8x16_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_p_u8(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v8i16.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_s16(int8_t *base, uint16x8_t offset, int16x8_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_s16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v4i32.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_s32(int8_t *base, uint32x4_t offset, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_s32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_s8(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v16i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_s8(int8_t *base, uint8x16_t offset, int8x16_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_s8(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v8i16.v8i16(i8* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_u16(uint8_t *base, uint16x8_t offset, uint16x8_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_u16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v4i32.v4i32(i8* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_u32(uint8_t *base, uint32x4_t offset, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_u32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrbq_scatter_offset_u8(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i8.v16i8.v16i8(i8* [[BASE:%.*]], <16 x i8> [[OFFSET:%.*]], <16 x i8> [[VALUE:%.*]], i32 8, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrbq_scatter_offset_u8(uint8_t *base, uint8x16_t offset, uint8x16_t value)
{
#ifdef POLYMORPHIC
vstrbq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrbq_scatter_offset_u8(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_p_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 888, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_p_s64(uint64x2_t addr, int64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_p(addr, 0x378, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_base_p_s64(addr, 0x378, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_p_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v2i64.v2i64.v2i1(<2 x i64> [[ADDR:%.*]], i32 264, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_p_u64(uint64x2_t addr, uint64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_p(addr, 0x108, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_base_p_u64(addr, 0x108, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 408, <2 x i64> [[VALUE:%.*]])
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_s64(uint64x2_t addr, int64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base(addr, 0x198, value);
#else /* POLYMORPHIC */
vstrdq_scatter_base_s64(addr, 0x198, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v2i64.v2i64(<2 x i64> [[ADDR:%.*]], i32 -472, <2 x i64> [[VALUE:%.*]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_u64(uint64x2_t addr, uint64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base(addr, -0x1d8, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrdq_scatter_base_u64(addr, -0x1d8, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_p_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 248, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP2]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_wb_p_s64(uint64x2_t *addr, int64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_wb_p(addr, 0xf8, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_base_wb_p_s64(addr, 0xf8, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_p_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v2i64.v2i64.v2i1(<2 x i64> [[TMP0]], i32 136, <2 x i64> [[VALUE:%.*]], <2 x i1> [[TMP2]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_wb_p_u64(uint64x2_t *addr, uint64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_wb_p(addr, 0x88, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_base_wb_p_u64(addr, 0x88, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 208, <2 x i64> [[VALUE:%.*]])
// CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_wb_s64(uint64x2_t *addr, int64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_wb(addr, 0xd0, value);
#else /* POLYMORPHIC */
vstrdq_scatter_base_wb_s64(addr, 0xd0, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_base_wb_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <2 x i64>, <2 x i64>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.arm.mve.vstr.scatter.base.wb.v2i64.v2i64(<2 x i64> [[TMP0]], i32 -168, <2 x i64> [[VALUE:%.*]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_base_wb_u64(uint64x2_t *addr, uint64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_base_wb(addr, -0xa8, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrdq_scatter_base_wb_u64(addr, -0xa8, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_offset_p_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_offset_p_s64(int64_t *base, uint64x2_t offset, int64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_offset_p_s64(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_offset_p_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_offset_p_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_offset_p_u64(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_offset_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_offset_s64(int64_t *base, uint64x2_t offset, int64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrdq_scatter_offset_s64(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_offset_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_offset_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrdq_scatter_offset_u64(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_p_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_shifted_offset_p_s64(int64_t *base, uint64x2_t offset, int64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_shifted_offset_p_s64(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_p_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <2 x i1> @llvm.arm.mve.pred.i2v.v2i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i64.v2i64.v2i64.v2i1(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3, <2 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_shifted_offset_p_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrdq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrdq_scatter_shifted_offset_p_u64(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_s64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3)
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_shifted_offset_s64(int64_t *base, uint64x2_t offset, int64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrdq_scatter_shifted_offset_s64(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrdq_scatter_shifted_offset_u64(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i64.v2i64.v2i64(i64* [[BASE:%.*]], <2 x i64> [[OFFSET:%.*]], <2 x i64> [[VALUE:%.*]], i32 64, i32 3)
// CHECK-NEXT: ret void
//
void test_vstrdq_scatter_shifted_offset_u64(uint64_t *base, uint64x2_t offset, uint64x2_t value)
{
#ifdef POLYMORPHIC
vstrdq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrdq_scatter_shifted_offset_u64(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f16.v8i16.v8f16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_f16(float16_t *base, uint16x8_t offset, float16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_f16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f16.v8i16.v8f16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_p_f16(float16_t *base, uint16x8_t offset, float16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_p_f16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_p_s16(int16_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_p_s16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_p_s32(int16_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_p_s32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_p_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_p_u16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_p_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_p_u32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_s16(int16_t *base, uint16x8_t offset, int16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_s16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_s32(int16_t *base, uint32x4_t offset, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_s32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_u16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_offset_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_offset_u32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f16.v8i16.v8f16(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 1)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_f16(float16_t *base, uint16x8_t offset, float16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_f16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_f16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f16.v8i16.v8f16.v8i1(half* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x half> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_p_f16(float16_t *base, uint16x8_t offset, float16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_p_f16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_p_s16(int16_t *base, uint16x8_t offset, int16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_p_s16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_p_s32(int16_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_p_s32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <8 x i1> @llvm.arm.mve.pred.i2v.v8i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v8i16.v8i16.v8i1(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1, <8 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_p_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_p_u16(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i16.v4i32.v4i32.v4i1(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_p_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_p_u32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_s16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_s16(int16_t *base, uint16x8_t offset, int16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_s16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_s32(int16_t *base, uint32x4_t offset, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_s32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_u16(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* [[BASE:%.*]], <8 x i16> [[OFFSET:%.*]], <8 x i16> [[VALUE:%.*]], i32 16, i32 1)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_u16(uint16_t *base, uint16x8_t offset, uint16x8_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_u16(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrhq_scatter_shifted_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v4i32.v4i32(i16* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 16, i32 1)
// CHECK-NEXT: ret void
//
void test_vstrhq_scatter_shifted_offset_u32(uint16_t *base, uint32x4_t offset, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrhq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrhq_scatter_shifted_offset_u32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4f32(<4 x i32> [[ADDR:%.*]], i32 380, <4 x float> [[VALUE:%.*]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_f32(uint32x4_t addr, float32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base(addr, 0x17c, value);
#else /* POLYMORPHIC */
vstrwq_scatter_base_f32(addr, 0x17c, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_p_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4f32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -400, <4 x float> [[VALUE:%.*]], <4 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_p_f32(uint32x4_t addr, float32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_p(addr, -0x190, value, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrwq_scatter_base_p_f32(addr, -0x190, value, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 48, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_p_s32(uint32x4_t addr, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_p(addr, 0x30, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_base_p_s32(addr, 0x30, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.predicated.v4i32.v4i32.v4i1(<4 x i32> [[ADDR:%.*]], i32 -376, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP1]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_p_u32(uint32x4_t addr, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_p(addr, -0x178, value, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrwq_scatter_base_p_u32(addr, -0x178, value, p);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 156, <4 x i32> [[VALUE:%.*]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_s32(uint32x4_t addr, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base(addr, 0x9c, value);
#else /* POLYMORPHIC */
vstrwq_scatter_base_s32(addr, 0x9c, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.base.v4i32.v4i32(<4 x i32> [[ADDR:%.*]], i32 212, <4 x i32> [[VALUE:%.*]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_u32(uint32x4_t addr, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base(addr, 0xd4, value);
#else /* POLYMORPHIC */
vstrwq_scatter_base_u32(addr, 0xd4, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4f32(<4 x i32> [[TMP0]], i32 -412, <4 x float> [[VALUE:%.*]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_f32(uint32x4_t *addr, float32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb(addr, -0x19c, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_f32(addr, -0x19c, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4f32.v4i1(<4 x i32> [[TMP0]], i32 236, <4 x float> [[VALUE:%.*]], <4 x i1> [[TMP2]])
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_p_f32(uint32x4_t *addr, float32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb_p(addr, 0xec, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_p_f32(addr, 0xec, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 328, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP2]])
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_p_s32(uint32x4_t *addr, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb_p(addr, 0x148, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_p_s32(addr, 0x148, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP2:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP1]])
// CHECK-NEXT: [[TMP3:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.predicated.v4i32.v4i32.v4i1(<4 x i32> [[TMP0]], i32 412, <4 x i32> [[VALUE:%.*]], <4 x i1> [[TMP2]])
// CHECK-NEXT: store <4 x i32> [[TMP3]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_p_u32(uint32x4_t *addr, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb_p(addr, 0x19c, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_p_u32(addr, 0x19c, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 -152, <4 x i32> [[VALUE:%.*]])
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_s32(uint32x4_t *addr, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb(addr, -0x98, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_s32(addr, -0x98, value);
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_base_wb_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* [[ADDR:%.*]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.arm.mve.vstr.scatter.base.wb.v4i32.v4i32(<4 x i32> [[TMP0]], i32 64, <4 x i32> [[VALUE:%.*]])
// CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[ADDR]], align 8
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_base_wb_u32(uint32x4_t *addr, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_base_wb(addr, 0x40, value);
#else /* POLYMORPHIC */
vstrwq_scatter_base_wb_u32(addr, 0x40, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f32.v4i32.v4f32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_f32(float32_t *base, uint32x4_t offset, float32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_f32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f32.v4i32.v4f32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_p_f32(float32_t *base, uint32x4_t offset, float32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_p_f32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_p_s32(int32_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_p_s32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_p_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_p_u32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_s32(int32_t *base, uint32x4_t offset, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_s32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 0)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_offset_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_offset_u32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0f32.v4i32.v4f32(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 2)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_f32(float32_t *base, uint32x4_t offset, float32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_f32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_f32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0f32.v4i32.v4f32.v4i1(float* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x float> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_p_f32(float32_t *base, uint32x4_t offset, float32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_p_f32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_p_s32(int32_t *base, uint32x4_t offset, int32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_p_s32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_p_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[TMP0:%.*]] = zext i16 [[P:%.*]] to i32
// CHECK-NEXT: [[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 [[TMP0]])
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.predicated.p0i32.v4i32.v4i32.v4i1(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2, <4 x i1> [[TMP1]])
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_p_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value, mve_pred16_t p)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset_p(base, offset, value, p);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_p_u32(base, offset, value, p);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_s32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_s32(int32_t *base, uint32x4_t offset, int32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_s32(base, offset, value);
#endif /* POLYMORPHIC */
}
// CHECK-LABEL: @test_vstrwq_scatter_shifted_offset_u32(
// CHECK-NEXT: entry:
// CHECK-NEXT: call void @llvm.arm.mve.vstr.scatter.offset.p0i32.v4i32.v4i32(i32* [[BASE:%.*]], <4 x i32> [[OFFSET:%.*]], <4 x i32> [[VALUE:%.*]], i32 32, i32 2)
// CHECK-NEXT: ret void
//
void test_vstrwq_scatter_shifted_offset_u32(uint32_t *base, uint32x4_t offset, uint32x4_t value)
{
#ifdef POLYMORPHIC
vstrwq_scatter_shifted_offset(base, offset, value);
#else /* POLYMORPHIC */
vstrwq_scatter_shifted_offset_u32(base, offset, value);
#endif /* POLYMORPHIC */
}