llvm-project/clang/test/Sema/arm-mve-immediates.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

57 lines
3.2 KiB
C
Raw Normal View History

[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
2019-11-01 01:02:07 +08:00
// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -verify -fsyntax-only %s
#include <arm_mve.h>
void test_load_offsets(uint32x4_t addr32, uint64x2_t addr64)
{
// Offsets that should be a multiple of 8 times 0,1,...,127
vldrdq_gather_base_s64(addr64, 0);
vldrdq_gather_base_s64(addr64, 8);
vldrdq_gather_base_s64(addr64, 2*8);
vldrdq_gather_base_s64(addr64, 125*8);
vldrdq_gather_base_s64(addr64, 126*8);
vldrdq_gather_base_s64(addr64, 127*8);
vldrdq_gather_base_s64(addr64, -8); // expected-error {{argument value -8 is outside the valid range [0, 1016]}}
vldrdq_gather_base_s64(addr64, 128*8); // expected-error {{argument value 1024 is outside the valid range [0, 1016]}}
vldrdq_gather_base_s64(addr64, 4); // expected-error {{argument should be a multiple of 8}}
vldrdq_gather_base_s64(addr64, 1); // expected-error {{argument should be a multiple of 8}}
// Offsets that should be a multiple of 4 times 0,1,...,127
vldrwq_gather_base_s32(addr32, 0);
vldrwq_gather_base_s32(addr32, 4);
vldrwq_gather_base_s32(addr32, 2*4);
vldrwq_gather_base_s32(addr32, 125*4);
vldrwq_gather_base_s32(addr32, 126*4);
vldrwq_gather_base_s32(addr32, 127*4);
vldrwq_gather_base_s32(addr32, -4); // expected-error {{argument value -4 is outside the valid range [0, 508]}}
vldrwq_gather_base_s32(addr32, 128*4); // expected-error {{argument value 512 is outside the valid range [0, 508]}}
vldrwq_gather_base_s32(addr32, 2); // expected-error {{argument should be a multiple of 4}}
vldrwq_gather_base_s32(addr32, 1); // expected-error {{argument should be a multiple of 4}}
// Show that the polymorphic store intrinsics get the right set of
// error checks after overload resolution. These ones expand to the
// 8-byte granular versions...
vstrdq_scatter_base(addr64, 0, addr64);
vstrdq_scatter_base(addr64, 8, addr64);
vstrdq_scatter_base(addr64, 2*8, addr64);
vstrdq_scatter_base(addr64, 125*8, addr64);
vstrdq_scatter_base(addr64, 126*8, addr64);
vstrdq_scatter_base(addr64, 127*8, addr64);
vstrdq_scatter_base(addr64, -8, addr64); // expected-error {{argument value -8 is outside the valid range [0, 1016]}}
vstrdq_scatter_base(addr64, 128*8, addr64); // expected-error {{argument value 1024 is outside the valid range [0, 1016]}}
vstrdq_scatter_base(addr64, 4, addr64); // expected-error {{argument should be a multiple of 8}}
vstrdq_scatter_base(addr64, 1, addr64); // expected-error {{argument should be a multiple of 8}}
/// ... and these ones to the 4-byte.
vstrwq_scatter_base(addr32, 0, addr32);
vstrwq_scatter_base(addr32, 4, addr32);
vstrwq_scatter_base(addr32, 2*4, addr32);
vstrwq_scatter_base(addr32, 125*4, addr32);
vstrwq_scatter_base(addr32, 126*4, addr32);
vstrwq_scatter_base(addr32, 127*4, addr32);
vstrwq_scatter_base(addr32, -4, addr32); // expected-error {{argument value -4 is outside the valid range [0, 508]}}
vstrwq_scatter_base(addr32, 128*4, addr32); // expected-error {{argument value 512 is outside the valid range [0, 508]}}
vstrwq_scatter_base(addr32, 2, addr32); // expected-error {{argument should be a multiple of 4}}
vstrwq_scatter_base(addr32, 1, addr32); // expected-error {{argument should be a multiple of 4}}
}