llvm-project/clang/test/CodeGenCXX/aarch64-mangle-sve-fixed-ve...

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

129 lines
6.1 KiB
C++
Raw Normal View History

Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute" This relands D85743 with a fix for test CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass manager with '-fno-experimental-new-pass-manager'. Test was failing due to IR differences with the new pass manager which broke the Fuchsia builder [1]. Reverted in 2e7041f. [1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375 Original summary: This patch implements codegen for the 'arm_sve_vector_bits' type attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1]. The purpose of this attribute is to define vector-length-specific (VLS) versions of existing vector-length-agnostic (VLA) types. VLSTs are represented as VectorType in the AST and fixed-length vectors in the IR everywhere except in function args/return. Implemented in this patch is codegen support for the following: * Implicit casting between VLA <-> VLS types. * Coercion of VLS types in function args/return. * Mangling of VLS types. Casting is handled by the CK_BitCast operation, which has been extended to support the two new vector kinds for fixed-length SVE predicate and data vectors, where the cast is implemented through memory rather than a bitcast which is unsupported. Implementing this as a normal bitcast would require relaxing checks in LLVM to allow bitcasting between scalable and fixed types. Another option was adding target-specific intrinsics, although codegen support would need to be added for these intrinsics. Given this, casting through memory seemed like the best approach as it's supported today and existing optimisations may remove unnecessary loads/stores, although there is room for improvement here. Coercion of VLSTs in function args/return from fixed to scalable is implemented through the AArch64 ABI in TargetInfo. The VLA and VLS types are defined by the ACLE to map to the same machine-level SVE vectors. VLS types are mangled in the same way as: __SVE_VLS<typename, unsigned> where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example: #if __ARM_FEATURE_SVE_BITS==512 // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE typedef svint32_t vec __attribute__((arm_sve_vector_bits(512))); // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE typedef svbool_t pred __attribute__((arm_sve_vector_bits(512))); #endif The latest ACLE specification (00bet5) does not contain details of this mangling scheme, it will be specified in the next revision. The mangling scheme is otherwise defined in the appendices to the Procedure Call Standard for the Arm Architecture, see [2] for more information. [1] https://developer.arm.com/documentation/100987/latest [2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85743
2020-08-11 22:30:02 +08:00
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu %s -emit-llvm -o - \
// RUN: -target-feature +sve -target-feature +bf16 -msve-vector-bits=128 \
// RUN: | FileCheck %s --check-prefix=CHECK-128
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu %s -emit-llvm -o - \
// RUN: -target-feature +sve -target-feature +bf16 -msve-vector-bits=256 \
// RUN: | FileCheck %s --check-prefix=CHECK-256
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu %s -emit-llvm -o - \
// RUN: -target-feature +sve -target-feature +bf16 -msve-vector-bits=512 \
// RUN: | FileCheck %s --check-prefix=CHECK-512
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu %s -emit-llvm -o - \
// RUN: -target-feature +sve -target-feature +bf16 -msve-vector-bits=1024 \
// RUN: | FileCheck %s --check-prefix=CHECK-1024
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu %s -emit-llvm -o - \
// RUN: -target-feature +sve -target-feature +bf16 -msve-vector-bits=2048 \
// RUN: | FileCheck %s --check-prefix=CHECK-2048
#define N __ARM_FEATURE_SVE_BITS
Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute" This relands D85743 with a fix for test CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass manager with '-fno-experimental-new-pass-manager'. Test was failing due to IR differences with the new pass manager which broke the Fuchsia builder [1]. Reverted in 2e7041f. [1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375 Original summary: This patch implements codegen for the 'arm_sve_vector_bits' type attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1]. The purpose of this attribute is to define vector-length-specific (VLS) versions of existing vector-length-agnostic (VLA) types. VLSTs are represented as VectorType in the AST and fixed-length vectors in the IR everywhere except in function args/return. Implemented in this patch is codegen support for the following: * Implicit casting between VLA <-> VLS types. * Coercion of VLS types in function args/return. * Mangling of VLS types. Casting is handled by the CK_BitCast operation, which has been extended to support the two new vector kinds for fixed-length SVE predicate and data vectors, where the cast is implemented through memory rather than a bitcast which is unsupported. Implementing this as a normal bitcast would require relaxing checks in LLVM to allow bitcasting between scalable and fixed types. Another option was adding target-specific intrinsics, although codegen support would need to be added for these intrinsics. Given this, casting through memory seemed like the best approach as it's supported today and existing optimisations may remove unnecessary loads/stores, although there is room for improvement here. Coercion of VLSTs in function args/return from fixed to scalable is implemented through the AArch64 ABI in TargetInfo. The VLA and VLS types are defined by the ACLE to map to the same machine-level SVE vectors. VLS types are mangled in the same way as: __SVE_VLS<typename, unsigned> where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example: #if __ARM_FEATURE_SVE_BITS==512 // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE typedef svint32_t vec __attribute__((arm_sve_vector_bits(512))); // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE typedef svbool_t pred __attribute__((arm_sve_vector_bits(512))); #endif The latest ACLE specification (00bet5) does not contain details of this mangling scheme, it will be specified in the next revision. The mangling scheme is otherwise defined in the appendices to the Procedure Call Standard for the Arm Architecture, see [2] for more information. [1] https://developer.arm.com/documentation/100987/latest [2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85743
2020-08-11 22:30:02 +08:00
typedef __SVInt8_t fixed_int8_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVInt16_t fixed_int16_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVInt32_t fixed_int32_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVInt64_t fixed_int64_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVUint8_t fixed_uint8_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVUint16_t fixed_uint16_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVUint32_t fixed_uint32_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVUint64_t fixed_uint64_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVFloat16_t fixed_float16_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVFloat32_t fixed_float32_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVFloat64_t fixed_float64_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVBFloat16_t fixed_bfloat16_t __attribute__((arm_sve_vector_bits(N)));
typedef __SVBool_t fixed_bool_t __attribute__((arm_sve_vector_bits(N)));
template <typename T> struct S {};
// CHECK-128: _Z2f11SI9__SVE_VLSIu10__SVInt8_tLj128EEE
// CHECK-256: _Z2f11SI9__SVE_VLSIu10__SVInt8_tLj256EEE
// CHECK-512: _Z2f11SI9__SVE_VLSIu10__SVInt8_tLj512EEE
// CHECK-1024: _Z2f11SI9__SVE_VLSIu10__SVInt8_tLj1024EEE
// CHECK-2048: _Z2f11SI9__SVE_VLSIu10__SVInt8_tLj2048EEE
void f1(S<fixed_int8_t>) {}
// CHECK-128: _Z2f21SI9__SVE_VLSIu11__SVInt16_tLj128EEE
// CHECK-256: _Z2f21SI9__SVE_VLSIu11__SVInt16_tLj256EEE
// CHECK-512: _Z2f21SI9__SVE_VLSIu11__SVInt16_tLj512EEE
// CHECK-1024: _Z2f21SI9__SVE_VLSIu11__SVInt16_tLj1024EEE
// CHECK-2048: _Z2f21SI9__SVE_VLSIu11__SVInt16_tLj2048EEE
void f2(S<fixed_int16_t>) {}
// CHECK-128: _Z2f31SI9__SVE_VLSIu11__SVInt32_tLj128EEE
// CHECK-256: _Z2f31SI9__SVE_VLSIu11__SVInt32_tLj256EEE
// CHECK-512: _Z2f31SI9__SVE_VLSIu11__SVInt32_tLj512EEE
// CHECK-1024: _Z2f31SI9__SVE_VLSIu11__SVInt32_tLj1024EEE
// CHECK-2048: _Z2f31SI9__SVE_VLSIu11__SVInt32_tLj2048EEE
void f3(S<fixed_int32_t>) {}
// CHECK-128: _Z2f41SI9__SVE_VLSIu11__SVInt64_tLj128EEE
// CHECK-256: _Z2f41SI9__SVE_VLSIu11__SVInt64_tLj256EEE
// CHECK-512: _Z2f41SI9__SVE_VLSIu11__SVInt64_tLj512EEE
// CHECK-1024: _Z2f41SI9__SVE_VLSIu11__SVInt64_tLj1024EEE
// CHECK-2048: _Z2f41SI9__SVE_VLSIu11__SVInt64_tLj2048EEE
void f4(S<fixed_int64_t>) {}
// CHECK-128: _Z2f51SI9__SVE_VLSIu11__SVUint8_tLj128EEE
// CHECK-256: _Z2f51SI9__SVE_VLSIu11__SVUint8_tLj256EEE
// CHECK-512: _Z2f51SI9__SVE_VLSIu11__SVUint8_tLj512EEE
// CHECK-1024: _Z2f51SI9__SVE_VLSIu11__SVUint8_tLj1024EEE
// CHECK-2048: _Z2f51SI9__SVE_VLSIu11__SVUint8_tLj2048EEE
void f5(S<fixed_uint8_t>) {}
// CHECK-128: _Z2f61SI9__SVE_VLSIu12__SVUint16_tLj128EEE
// CHECK-256: _Z2f61SI9__SVE_VLSIu12__SVUint16_tLj256EEE
// CHECK-512: _Z2f61SI9__SVE_VLSIu12__SVUint16_tLj512EEE
// CHECK-1024: _Z2f61SI9__SVE_VLSIu12__SVUint16_tLj1024EEE
// CHECK-2048: _Z2f61SI9__SVE_VLSIu12__SVUint16_tLj2048EEE
void f6(S<fixed_uint16_t>) {}
// CHECK-128: _Z2f71SI9__SVE_VLSIu12__SVUint32_tLj128EEE
// CHECK-256: _Z2f71SI9__SVE_VLSIu12__SVUint32_tLj256EEE
// CHECK-512: _Z2f71SI9__SVE_VLSIu12__SVUint32_tLj512EEE
// CHECK-1024: _Z2f71SI9__SVE_VLSIu12__SVUint32_tLj1024EEE
// CHECK-2048: _Z2f71SI9__SVE_VLSIu12__SVUint32_tLj2048EEE
void f7(S<fixed_uint32_t>) {}
// CHECK-128: _Z2f81SI9__SVE_VLSIu12__SVUint64_tLj128EEE
// CHECK-256: _Z2f81SI9__SVE_VLSIu12__SVUint64_tLj256EEE
// CHECK-512: _Z2f81SI9__SVE_VLSIu12__SVUint64_tLj512EEE
// CHECK-1024: _Z2f81SI9__SVE_VLSIu12__SVUint64_tLj1024EEE
// CHECK-2048: _Z2f81SI9__SVE_VLSIu12__SVUint64_tLj2048EEE
void f8(S<fixed_uint64_t>) {}
// CHECK-128: _Z2f91SI9__SVE_VLSIu13__SVFloat16_tLj128EEE
// CHECK-256: _Z2f91SI9__SVE_VLSIu13__SVFloat16_tLj256EEE
// CHECK-512: _Z2f91SI9__SVE_VLSIu13__SVFloat16_tLj512EEE
// CHECK-1024: _Z2f91SI9__SVE_VLSIu13__SVFloat16_tLj1024EEE
// CHECK-2048: _Z2f91SI9__SVE_VLSIu13__SVFloat16_tLj2048EEE
void f9(S<fixed_float16_t>) {}
// CHECK-128: _Z3f101SI9__SVE_VLSIu13__SVFloat32_tLj128EEE
// CHECK-256: _Z3f101SI9__SVE_VLSIu13__SVFloat32_tLj256EEE
// CHECK-512: _Z3f101SI9__SVE_VLSIu13__SVFloat32_tLj512EEE
// CHECK-1024: _Z3f101SI9__SVE_VLSIu13__SVFloat32_tLj1024EEE
// CHECK-2048: _Z3f101SI9__SVE_VLSIu13__SVFloat32_tLj2048EEE
void f10(S<fixed_float32_t>) {}
// CHECK-128: _Z3f111SI9__SVE_VLSIu13__SVFloat64_tLj128EEE
// CHECK-256: _Z3f111SI9__SVE_VLSIu13__SVFloat64_tLj256EEE
// CHECK-512: _Z3f111SI9__SVE_VLSIu13__SVFloat64_tLj512EEE
// CHECK-1024: _Z3f111SI9__SVE_VLSIu13__SVFloat64_tLj1024EEE
// CHECK-2048: _Z3f111SI9__SVE_VLSIu13__SVFloat64_tLj2048EEE
void f11(S<fixed_float64_t>) {}
// CHECK-128: _Z3f121SI9__SVE_VLSIu14__SVBfloat16_tLj128EEE
// CHECK-256: _Z3f121SI9__SVE_VLSIu14__SVBfloat16_tLj256EEE
// CHECK-512: _Z3f121SI9__SVE_VLSIu14__SVBfloat16_tLj512EEE
// CHECK-1024: _Z3f121SI9__SVE_VLSIu14__SVBfloat16_tLj1024EEE
// CHECK-2048: _Z3f121SI9__SVE_VLSIu14__SVBfloat16_tLj2048EEE
void f12(S<fixed_bfloat16_t>) {}
// CHECK-128: _Z3f131SI9__SVE_VLSIu10__SVBool_tLj128EEE
// CHECK-256: _Z3f131SI9__SVE_VLSIu10__SVBool_tLj256EEE
// CHECK-512: _Z3f131SI9__SVE_VLSIu10__SVBool_tLj512EEE
// CHECK-1024: _Z3f131SI9__SVE_VLSIu10__SVBool_tLj1024EEE
// CHECK-2048: _Z3f131SI9__SVE_VLSIu10__SVBool_tLj2048EEE
void f13(S<fixed_bool_t>) {}