llvm-project/clang/test/Sema/arm-mve-immediates.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

310 lines
15 KiB
C
Raw Normal View History

// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -fallow-half-arguments-and-returns -target-feature +mve.fp -verify -fsyntax-only %s
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
#include <arm_mve.h>
void test_load_offsets(uint32x4_t addr32, uint64x2_t addr64)
{
// Offsets that should be a multiple of 8 times 0,1,...,127
vldrdq_gather_base_s64(addr64, 0);
vldrdq_gather_base_s64(addr64, 8);
vldrdq_gather_base_s64(addr64, 2*8);
vldrdq_gather_base_s64(addr64, 125*8);
vldrdq_gather_base_s64(addr64, 126*8);
vldrdq_gather_base_s64(addr64, 127*8);
vldrdq_gather_base_s64(addr64, -125*8);
vldrdq_gather_base_s64(addr64, -126*8);
vldrdq_gather_base_s64(addr64, -127*8);
vldrdq_gather_base_s64(addr64, 128*8); // expected-error {{argument value 1024 is outside the valid range [-1016, 1016]}}
vldrdq_gather_base_s64(addr64, -128*8); // expected-error {{argument value -1024 is outside the valid range [-1016, 1016]}}
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
vldrdq_gather_base_s64(addr64, 4); // expected-error {{argument should be a multiple of 8}}
vldrdq_gather_base_s64(addr64, 1); // expected-error {{argument should be a multiple of 8}}
// Offsets that should be a multiple of 4 times 0,1,...,127
vldrwq_gather_base_s32(addr32, 0);
vldrwq_gather_base_s32(addr32, 4);
vldrwq_gather_base_s32(addr32, 2*4);
vldrwq_gather_base_s32(addr32, 125*4);
vldrwq_gather_base_s32(addr32, 126*4);
vldrwq_gather_base_s32(addr32, 127*4);
vldrwq_gather_base_s32(addr32, -125*4);
vldrwq_gather_base_s32(addr32, -126*4);
vldrwq_gather_base_s32(addr32, -127*4);
vldrwq_gather_base_s32(addr32, 128*4); // expected-error {{argument value 512 is outside the valid range [-508, 508]}}
vldrwq_gather_base_s32(addr32, -128*4); // expected-error {{argument value -512 is outside the valid range [-508, 508]}}
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
vldrwq_gather_base_s32(addr32, 2); // expected-error {{argument should be a multiple of 4}}
vldrwq_gather_base_s32(addr32, 1); // expected-error {{argument should be a multiple of 4}}
// Show that the polymorphic store intrinsics get the right set of
// error checks after overload resolution. These ones expand to the
// 8-byte granular versions...
vstrdq_scatter_base(addr64, 0, addr64);
vstrdq_scatter_base(addr64, 8, addr64);
vstrdq_scatter_base(addr64, 2*8, addr64);
vstrdq_scatter_base(addr64, 125*8, addr64);
vstrdq_scatter_base(addr64, 126*8, addr64);
vstrdq_scatter_base(addr64, 127*8, addr64);
vstrdq_scatter_base(addr64, -125*8, addr64);
vstrdq_scatter_base(addr64, -126*8, addr64);
vstrdq_scatter_base(addr64, -127*8, addr64);
vstrdq_scatter_base(addr64, 128*8, addr64); // expected-error {{argument value 1024 is outside the valid range [-1016, 1016]}}
vstrdq_scatter_base(addr64, -128*8, addr64); // expected-error {{argument value -1024 is outside the valid range [-1016, 1016]}}
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
vstrdq_scatter_base(addr64, 4, addr64); // expected-error {{argument should be a multiple of 8}}
vstrdq_scatter_base(addr64, 1, addr64); // expected-error {{argument should be a multiple of 8}}
/// ... and these ones to the 4-byte.
vstrwq_scatter_base(addr32, 0, addr32);
vstrwq_scatter_base(addr32, 4, addr32);
vstrwq_scatter_base(addr32, 2*4, addr32);
vstrwq_scatter_base(addr32, 125*4, addr32);
vstrwq_scatter_base(addr32, 126*4, addr32);
vstrwq_scatter_base(addr32, 127*4, addr32);
vstrwq_scatter_base(addr32, -125*4, addr32);
vstrwq_scatter_base(addr32, -126*4, addr32);
vstrwq_scatter_base(addr32, -127*4, addr32);
vstrwq_scatter_base(addr32, 128*4, addr32); // expected-error {{argument value 512 is outside the valid range [-508, 508]}}
vstrwq_scatter_base(addr32, -128*4, addr32); // expected-error {{argument value -512 is outside the valid range [-508, 508]}}
[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
vstrwq_scatter_base(addr32, 2, addr32); // expected-error {{argument should be a multiple of 4}}
vstrwq_scatter_base(addr32, 1, addr32); // expected-error {{argument should be a multiple of 4}}
}
void test_lane_indices(uint8x16_t v16, uint16x8_t v8,
uint32x4_t v4, uint64x2_t v2)
{
vgetq_lane_u8(v16, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
vgetq_lane_u8(v16, 0);
vgetq_lane_u8(v16, 15);
vgetq_lane_u8(v16, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
vgetq_lane_u16(v8, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
vgetq_lane_u16(v8, 0);
vgetq_lane_u16(v8, 7);
vgetq_lane_u16(v8, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
vgetq_lane_u32(v4, -1); // expected-error {{argument value -1 is outside the valid range [0, 3]}}
vgetq_lane_u32(v4, 0);
vgetq_lane_u32(v4, 3);
vgetq_lane_u32(v4, 4); // expected-error {{argument value 4 is outside the valid range [0, 3]}}
vgetq_lane_u64(v2, -1); // expected-error {{argument value -1 is outside the valid range [0, 1]}}
vgetq_lane_u64(v2, 0);
vgetq_lane_u64(v2, 1);
vgetq_lane_u64(v2, 2); // expected-error {{argument value 2 is outside the valid range [0, 1]}}
vsetq_lane_u8(23, v16, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
vsetq_lane_u8(23, v16, 0);
vsetq_lane_u8(23, v16, 15);
vsetq_lane_u8(23, v16, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
vsetq_lane_u16(23, v8, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
vsetq_lane_u16(23, v8, 0);
vsetq_lane_u16(23, v8, 7);
vsetq_lane_u16(23, v8, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
vsetq_lane_u32(23, v4, -1); // expected-error {{argument value -1 is outside the valid range [0, 3]}}
vsetq_lane_u32(23, v4, 0);
vsetq_lane_u32(23, v4, 3);
vsetq_lane_u32(23, v4, 4); // expected-error {{argument value 4 is outside the valid range [0, 3]}}
vsetq_lane_u64(23, v2, -1); // expected-error {{argument value -1 is outside the valid range [0, 1]}}
vsetq_lane_u64(23, v2, 0);
vsetq_lane_u64(23, v2, 1);
vsetq_lane_u64(23, v2, 2); // expected-error {{argument value 2 is outside the valid range [0, 1]}}
}
void test_immediate_shifts(uint8x16_t vb, uint16x8_t vh, uint32x4_t vw)
{
vshlq_n(vb, 0);
vshlq_n(vb, 7);
vshlq_n(vh, 0);
vshlq_n(vh, 15);
vshlq_n(vw, 0);
vshlq_n(vw, 31);
vshlq_n(vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
vshlq_n(vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
vshlq_n(vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
vshlq_n(vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
vshlq_n(vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
vshlq_n(vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
vqshlq_n(vb, 0);
vqshlq_n(vb, 7);
vqshlq_n(vh, 0);
vqshlq_n(vh, 15);
vqshlq_n(vw, 0);
vqshlq_n(vw, 31);
vqshlq_n(vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
vqshlq_n(vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
vqshlq_n(vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
vqshlq_n(vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
vqshlq_n(vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
vqshlq_n(vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
vsliq(vb, vb, 0);
vsliq(vb, vb, 7);
vsliq(vh, vh, 0);
vsliq(vh, vh, 15);
vsliq(vw, vw, 0);
vsliq(vw, vw, 31);
vsliq(vb, vb, -1); // expected-error {{argument value -1 is outside the valid range [0, 7]}}
vsliq(vb, vb, 8); // expected-error {{argument value 8 is outside the valid range [0, 7]}}
vsliq(vh, vh, -1); // expected-error {{argument value -1 is outside the valid range [0, 15]}}
vsliq(vh, vh, 16); // expected-error {{argument value 16 is outside the valid range [0, 15]}}
vsliq(vw, vw, -1); // expected-error {{argument value -1 is outside the valid range [0, 31]}}
vsliq(vw, vw, 32); // expected-error {{argument value 32 is outside the valid range [0, 31]}}
vshllbq(vb, 1);
vshllbq(vb, 8);
vshllbq(vh, 1);
vshllbq(vh, 16);
vshllbq(vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
vshllbq(vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
vshllbq(vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
vshllbq(vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
vshrq(vb, 1);
vshrq(vb, 8);
vshrq(vh, 1);
vshrq(vh, 16);
vshrq(vw, 1);
vshrq(vw, 32);
vshrq(vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
vshrq(vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
vshrq(vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
vshrq(vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
vshrq(vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 32]}}
vshrq(vw, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
vshrntq(vb, vh, 1);
vshrntq(vb, vh, 8);
vshrntq(vh, vw, 1);
vshrntq(vh, vw, 16);
vshrntq(vb, vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
vshrntq(vb, vh, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
vshrntq(vh, vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
vshrntq(vh, vw, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
vsriq(vb, vb, 1);
vsriq(vb, vb, 8);
vsriq(vh, vh, 1);
vsriq(vh, vh, 16);
vsriq(vw, vw, 1);
vsriq(vw, vw, 32);
vsriq(vb, vb, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
vsriq(vb, vb, 9); // expected-error {{argument value 9 is outside the valid range [1, 8]}}
vsriq(vh, vh, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
vsriq(vh, vh, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
vsriq(vw, vw, 0); // expected-error {{argument value 0 is outside the valid range [1, 32]}}
vsriq(vw, vw, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
}
[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics. Summary: Immediate vmvnq is code-generated as a simple vector constant in IR, and left to the backend to recognize that it can be created with an MVE VMVN instruction. The predicated version is represented as a select between the input and the same constant, and I've added a Tablegen isel rule to turn that into a predicated VMVN. (That should be better than the previous VMVN + VPSEL: it's the same number of instructions but now it can fold into an adjacent VPT block.) The unpredicated forms of VBIC and VORR are done by enabling the same isel lowering as for NEON, recognizing appropriate immediates and rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I then instruction-select into the right MVE instructions (now that I've also reworked those instructions to use the same MC operand encoding). In order to do that, I had to promote the Tablegen SDNode instance `NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and similarly for `NEONvbicImm`. The predicated forms of VBIC and VORR are represented as a vector select between the original input vector and the output of the unpredicated operation. The main convenience of this is that it still lets me use the existing isel lowering for VBICIMM/VORRIMM, and not have to write another copy of the operand encoding translation code. This intrinsic family is the first to use the `imm_simd` system I put into the MveEmitter tablegen backend. So, naturally, it showed up a bug or two (emitting bogus range checks and the like). Fixed those, and added a full set of tests for the permissible immediates in the existing Sema test. Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching because lowering started turning its input into a VBICIMM. Now it recognizes the VBICIMM instead. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 19:53:42 +08:00
void test_simd_bic_orr(int16x8_t h, int32x4_t w)
{
h = vbicq(h, 0x0000);
h = vbicq(h, 0x0001);
h = vbicq(h, 0x00FF);
h = vbicq(h, 0x0100);
h = vbicq(h, 0x0101); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
h = vbicq(h, 0x01FF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
h = vbicq(h, 0xFF00);
w = vbicq(w, 0x00000000);
w = vbicq(w, 0x00000001);
w = vbicq(w, 0x000000FF);
w = vbicq(w, 0x00000100);
w = vbicq(w, 0x0000FF00);
w = vbicq(w, 0x00010000);
w = vbicq(w, 0x00FF0000);
w = vbicq(w, 0x01000000);
w = vbicq(w, 0xFF000000);
w = vbicq(w, 0x01000001); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
w = vbicq(w, 0x01FFFFFF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
h = vorrq(h, 0x0000);
h = vorrq(h, 0x0001);
h = vorrq(h, 0x00FF);
h = vorrq(h, 0x0100);
h = vorrq(h, 0x0101); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
h = vorrq(h, 0x01FF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
h = vorrq(h, 0xFF00);
w = vorrq(w, 0x00000000);
w = vorrq(w, 0x00000001);
w = vorrq(w, 0x000000FF);
w = vorrq(w, 0x00000100);
w = vorrq(w, 0x0000FF00);
w = vorrq(w, 0x00010000);
w = vorrq(w, 0x00FF0000);
w = vorrq(w, 0x01000000);
w = vorrq(w, 0xFF000000);
w = vorrq(w, 0x01000001); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
w = vorrq(w, 0x01FFFFFF); // expected-error-re {{argument should be an 8-bit value shifted by a multiple of 8 bits{{$}}}}
}
void test_simd_vmvn(void)
{
uint16x8_t h;
h = vmvnq_n_u16(0x0000);
h = vmvnq_n_u16(0x0001);
h = vmvnq_n_u16(0x00FF);
h = vmvnq_n_u16(0x0100);
h = vmvnq_n_u16(0x0101); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
h = vmvnq_n_u16(0x01FF);
h = vmvnq_n_u16(0xFF00);
uint32x4_t w;
w = vmvnq_n_u32(0x00000000);
w = vmvnq_n_u32(0x00000001);
w = vmvnq_n_u32(0x000000FF);
w = vmvnq_n_u32(0x00000100);
w = vmvnq_n_u32(0x0000FF00);
w = vmvnq_n_u32(0x00010000);
w = vmvnq_n_u32(0x00FF0000);
w = vmvnq_n_u32(0x01000000);
w = vmvnq_n_u32(0xFF000000);
w = vmvnq_n_u32(0x01000001); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
w = vmvnq_n_u32(0x01FFFFFF); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
w = vmvnq_n_u32(0x0001FFFF); // expected-error {{argument should be an 8-bit value shifted by a multiple of 8 bits, or in the form 0x??FF}}
w = vmvnq_n_u32(0x000001FF);
}
[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq. Summary: These instructions generate a vector of consecutive elements starting from a given base value and incrementing by 1, 2, 4 or 8. The `wdup` versions also wrap the values back to zero when they reach a given limit value. The instruction updates the scalar base register so that another use of the same instruction will continue the sequence from where the previous one left off. At the IR level, I've represented these instructions as a family of target-specific intrinsics with two return values (the constructed vector and the updated base). The user-facing ACLE API provides a set of intrinsics that throw away the written-back base and another set that receive it as a pointer so they can update it, plus the usual predicated versions. Because the intrinsics return two values (as do the underlying instructions), the isel has to be done in C++. This is the first family of MVE intrinsics that use the `imm_1248` immediate type in the clang Tablegen framework, so naturally, I found I'd given it the wrong C integer type. Also added some tests of the check that the immediate has a legal value, because this is the first time those particular checks have been exercised. Finally, I also had to fix a bug in MveEmitter which failed an assertion when I nested two `seq` nodes (the inner one used to extract the two values from the pair returned by the IR intrinsic, and the outer one put on by the predication multiclass). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73357
2020-01-31 18:53:31 +08:00
void test_vidup(void)
{
vidupq_n_u16(0x12345678, 1);
vidupq_n_u16(0x12345678, 2);
vidupq_n_u16(0x12345678, 4);
vidupq_n_u16(0x12345678, 8);
vidupq_n_u16(0x12345678, 0); // expected-error {{argument value 0 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, 16); // expected-error {{argument value 16 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, -1); // expected-error {{argument value -1 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, -2); // expected-error {{argument value -2 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, -4); // expected-error {{argument value -4 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, -8); // expected-error {{argument value -8 is outside the valid range [1, 8]}}
vidupq_n_u16(0x12345678, 3); // expected-error {{argument should be a power of 2}}
vidupq_n_u16(0x12345678, 7); // expected-error {{argument should be a power of 2}}
}
void test_vcvtq(void)
{
uint16x8_t vec_u16;
float16x8_t vec_f16;
vcvtq_n_f16_u16(vec_u16, 0); // expected-error {{argument value 0 is outside the valid range [1, 16]}}
vcvtq_n_f16_u16(vec_u16, 1);
vcvtq_n_f16_u16(vec_u16, 16);
vcvtq_n_f16_u16(vec_u16, 17); // expected-error {{argument value 17 is outside the valid range [1, 16]}}
int32x4_t vec_s32;
float32x4_t vec_f32;
vcvtq_n_s32_f32(vec_s32, -1); // expected-error {{argument value -1 is outside the valid range [1, 32]}}
vcvtq_n_s32_f32(vec_s32, 1);
vcvtq_n_s32_f32(vec_s32, 32);
vcvtq_n_s32_f32(vec_s32, 33); // expected-error {{argument value 33 is outside the valid range [1, 32]}}
}