llvm-project

Commit Graph

Author	SHA1	Message	Date
Bradley Smith	e57e1e4e00	[clang][AArch64][SVE] Avoid going through memory for fixed/scalable predicate casts For fixed SVE types, predicates are represented using vectors of i8, where as for scalable types they are represented using vectors of i1. We can avoid going through memory for casts between these by bitcasting the i1 scalable vectors to/from a scalable i8 vector of matching size, which can then use the existing vector insert/extract logic. Differential Revision: https://reviews.llvm.org/D106860	2021-08-04 16:10:37 +00:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Joe Ellis	2ed7db0d20	[InstSimplify] Remove redundant {insert,extract}_vector intrinsic chains This commit removes some redundant {insert,extract}_vector intrinsic chains by implementing the following patterns as instsimplifies: (insert_vector _, (extract_vector X, 0), 0) -> X (extract_vector (insert_vector _, X, 0), 0) -> X Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D101986	2021-05-13 16:09:50 +00:00
Joe Ellis	8ea72b3887	[clang][AArch64][SVE] Avoid going through memory for coerced VLST return values VLST return values are coerced to VLATs in the function epilog for consistency with the VLAT ABI. Previously, this coercion was done through memory. It is preferable to use the llvm.experimental.vector.insert intrinsic to avoid going through memory here. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D94290	2021-01-11 12:10:59 +00:00
Joe Ellis	3d5b18a3fd	[clang][AArch64][SVE] Avoid going through memory for coerced VLST arguments VLST arguments are coerced to VLATs at the function boundary for consistency with the VLAT ABI. They are then bitcast back to VLSTs in the function prolog. Previously, this conversion is done through memory. With the introduction of the llvm.vector.{insert,extract} intrinsic, we can avoid going through memory here. Depends on D92761 Differential Revision: https://reviews.llvm.org/D92762	2021-01-05 15:18:21 +00:00
Joe Ellis	dad07baf12	[clang][AArch64][SVE] Avoid going through memory for VLAT <-> VLST casts This change makes use of the llvm.vector.extract intrinsic to avoid going through memory when performing bitcasts between vector-length agnostic types and vector-length specific types. Depends on D91362 Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D92761	2020-12-16 12:24:32 +00:00
Roman Lebedev	e00f189d39	[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592) (it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html) This canonicalization seems dubious. Most importantly, while it does not create `inttoptr` casts by itself, it may cause them to appear later, see e.g. D88788. I think it's pretty obvious that it is an undesirable outcome, by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts are not no-op, and are no longer eager to look past them. Which e.g. means that given ``` %a = load i32 %b = inttoptr %a %c = inttoptr %a ``` we likely won't be able to tell that `%b` and `%c` is the same thing. As we can see in D88789 / D88788 / D88806 / D75505, we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least) And we can't recover the situation post-inlining in instcombine. So it really does look like this fold is actively breaking otherwise-good IR, in a way that is not recoverable. And that means, this fold isn't helpful in exposing the passes that are otherwise unaware of these patterns it produces. Thusly, i propose to simply not perform such a canonicalization. The original motivational RFC does not state what larger problem that canonicalization was trying to solve, so i'm not sure how this plays out in the larger picture. On vanilla llvm test-suite + RawSpeed, this results in increase of asm instructions and final object size by ~+0.05% decreases final count of bitcasts by -4.79% (-28990), ptrtoint casts by -15.41% (-3423), and of inttoptr casts by -25.59% (-6919, sic). Overall, there's -0.04% less IR blocks, -0.39% instructions. See https://bugs.llvm.org/show_bug.cgi?id=47592 Differential Revision: https://reviews.llvm.org/D88789	2020-10-06 00:00:30 +03:00
Roman Lebedev	aaae13d0c2	[NFC][clang][codegen] Autogenerate a few ARM SVE tests that are being affected by an upcoming patch	2020-10-04 19:54:09 +03:00
Cullen Rhodes	9218f92838	[clang][aarch64] ACLE: Support implicit casts between GNU and SVE vectors This patch adds support for implicit casting between GNU vectors and SVE vectors when `__ARM_FEATURE_SVE_BITS==N`, as defined by the Arm C Language Extensions (ACLE, version 00bet5, section 3.7.3.3) for SVE [1]. This behavior makes it possible to use GNU vectors with ACLE functions that operate on VLAT. For example: typedef int8_t vec __attribute__((vector_size(32))); vec f(vec x) { return svasrd_x(svptrue_b8(), x, 1); } Tests are also added for implicit casting between GNU and fixed-length SVE vectors created by the 'arm_sve_vector_bits' attribute. This behavior makes it possible to use VLST with existing interfaces that operate on GNUT. For example: typedef int8_t vec1 __attribute__((vector_size(32))); void f(vec1); #if __ARM_FEATURE_SVE_BITS==256 && __ARM_FEATURE_SVE_VECTOR_OPERATORS typedef svint8_t vec2 __attribute__((arm_sve_vector_bits(256))); void g(vec2 x) { f(x); } // OK #endif The `__ARM_FEATURE_SVE_VECTOR_OPERATORS` feature macro indicates interoperability with the GNU vector extension. This is the first patch providing support for this feature, which once complete will be enabled by the `-msve-vector-bits` flag, as the `__ARM_FEATURE_SVE_BITS` feature currently is. [1] https://developer.arm.com/documentation/100987/latest Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87607	2020-09-17 09:35:30 +00:00
David Green	ab2ed8bce9	[SVE] Regenerate sve vector bits tests. NFC	2020-09-11 18:51:57 +01:00
Cullen Rhodes	f9091e56d3	[clang][aarch64] Drop experimental from __ARM_FEATURE_SVE_BITS macro The __ARM_FEATURE_SVE_BITS feature macro is specified in the Arm C Language Extensions (ACLE) for SVE [1] (version 00bet5). From the spec, where __ARM_FEATURE_SVE_BITS==N: When N is nonzero, indicates that the implementation is generating code for an N-bit SVE target and that the arm_sve_vector_bits(N) attribute is available. This was defined in D83550 as __ARM_FEATURE_SVE_BITS_EXPERIMENTAL and enabled under the -msve-vector-bits flag to simplify initial tests. This patch drops _EXPERIMENTAL now there is support for the feature. [1] https://developer.arm.com/documentation/100987/latest Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D86720	2020-09-03 09:39:37 +00:00
Cullen Rhodes	2ddf795e8c	Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute" This relands D85743 with a fix for test CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass manager with '-fno-experimental-new-pass-manager'. Test was failing due to IR differences with the new pass manager which broke the Fuchsia builder [1]. Reverted in `2e7041f`. [1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375 Original summary: This patch implements codegen for the 'arm_sve_vector_bits' type attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1]. The purpose of this attribute is to define vector-length-specific (VLS) versions of existing vector-length-agnostic (VLA) types. VLSTs are represented as VectorType in the AST and fixed-length vectors in the IR everywhere except in function args/return. Implemented in this patch is codegen support for the following: * Implicit casting between VLA <-> VLS types. * Coercion of VLS types in function args/return. * Mangling of VLS types. Casting is handled by the CK_BitCast operation, which has been extended to support the two new vector kinds for fixed-length SVE predicate and data vectors, where the cast is implemented through memory rather than a bitcast which is unsupported. Implementing this as a normal bitcast would require relaxing checks in LLVM to allow bitcasting between scalable and fixed types. Another option was adding target-specific intrinsics, although codegen support would need to be added for these intrinsics. Given this, casting through memory seemed like the best approach as it's supported today and existing optimisations may remove unnecessary loads/stores, although there is room for improvement here. Coercion of VLSTs in function args/return from fixed to scalable is implemented through the AArch64 ABI in TargetInfo. The VLA and VLS types are defined by the ACLE to map to the same machine-level SVE vectors. VLS types are mangled in the same way as: __SVE_VLS<typename, unsigned> where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example: #if __ARM_FEATURE_SVE_BITS==512 // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE typedef svint32_t vec __attribute__((arm_sve_vector_bits(512))); // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE typedef svbool_t pred __attribute__((arm_sve_vector_bits(512))); #endif The latest ACLE specification (00bet5) does not contain details of this mangling scheme, it will be specified in the next revision. The mangling scheme is otherwise defined in the appendices to the Procedure Call Standard for the Arm Architecture, see [2] for more information. [1] https://developer.arm.com/documentation/100987/latest [2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85743	2020-08-28 15:57:09 +00:00
Cullen Rhodes	2e7041fdc2	Revert "[CodeGen][AArch64] Support arm_sve_vector_bits attribute" Test CodeGen/attr-arm-sve-vector-bits-call.c is failing on some builders [1][2]. Reverting whilst I investigate. [1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375 [2] https://luci-milo.appspot.com/p/fuchsia/builders/ci/clang-linux-x64/b8870800848452818112 This reverts commit `42587345a3`.	2020-08-27 21:31:05 +00:00
Cullen Rhodes	42587345a3	[CodeGen][AArch64] Support arm_sve_vector_bits attribute This patch implements codegen for the 'arm_sve_vector_bits' type attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1]. The purpose of this attribute is to define vector-length-specific (VLS) versions of existing vector-length-agnostic (VLA) types. VLSTs are represented as VectorType in the AST and fixed-length vectors in the IR everywhere except in function args/return. Implemented in this patch is codegen support for the following: * Implicit casting between VLA <-> VLS types. * Coercion of VLS types in function args/return. * Mangling of VLS types. Casting is handled by the CK_BitCast operation, which has been extended to support the two new vector kinds for fixed-length SVE predicate and data vectors, where the cast is implemented through memory rather than a bitcast which is unsupported. Implementing this as a normal bitcast would require relaxing checks in LLVM to allow bitcasting between scalable and fixed types. Another option was adding target-specific intrinsics, although codegen support would need to be added for these intrinsics. Given this, casting through memory seemed like the best approach as it's supported today and existing optimisations may remove unnecessary loads/stores, although there is room for improvement here. Coercion of VLSTs in function args/return from fixed to scalable is implemented through the AArch64 ABI in TargetInfo. The VLA and VLS types are defined by the ACLE to map to the same machine-level SVE vectors. VLS types are mangled in the same way as: __SVE_VLS<typename, unsigned> where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example: #if __ARM_FEATURE_SVE_BITS==512 // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE typedef svint32_t vec __attribute__((arm_sve_vector_bits(512))); // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE typedef svbool_t pred __attribute__((arm_sve_vector_bits(512))); #endif The latest ACLE specification (00bet5) does not contain details of this mangling scheme, it will be specified in the next revision. The mangling scheme is otherwise defined in the appendices to the Procedure Call Standard for the Arm Architecture, see [2] for more information. [1] https://developer.arm.com/documentation/100987/latest [2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85743	2020-08-27 15:11:58 +00:00

14 Commits