llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
David Green	fa15255d8a	[ARM] Convert floating point splats to integer Under MVE a vdup will always take a gpr register, not a floating point value. During DAG combine we convert the types to a bitcast to an integer in an attempt to fold the bitcast into other instructions. This is OK, but only works inside the same basic block. To do the same trick across a basic block boundary we need to convert the type in codegenprepare, before the splat is sunk into the loop. This adds a convertSplatType function to codegenprepare to do that, putting bitcasts around the splat to force the type to an integer. There is then some adjustment to the code in shouldSinkOperands to handle the extra bitcasts. Differential Revision: https://reviews.llvm.org/D78728	2020-05-13 15:24:16 +01:00
David Green	1084b32339	[ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh This changes the logic with lowering fp16 bitcasts to always produce either a VMOVhr or a VMOVrh, instead of only trying to do it with certain surrounding nodes. To perform the same optimisations demand bits and known bits information has been added for them. Differential Revision: https://reviews.llvm.org/D78587	2020-04-28 16:12:53 +01:00
David Green	eecba95067	[ARM] Replace arm vendor with none. NFC	2020-04-22 18:19:35 +01:00
David Green	a0b1616359	[ARM] Regenerate tests. NFC	2020-04-19 13:45:39 +01:00
Simon Tatham	8f1651ccea	[ARM,MVE] Add missing tests for vqdmlash intrinsics. Summary: These were accidentally left out of D76123. I added tests for the other three instructions in this small cross-product family (vqdmlah, vqrdmlah, vqrdmlash) but missed this one. Reviewers: miyuki Reviewed By: miyuki Subscribers: kristof.beyls, dmgreen, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76714	2020-03-25 09:46:16 +00:00
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	45a9945b9e	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
David Green	b3499f572d	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
Simon Tatham	e13d153c1b	[ARM,MVE] Add intrinsics for the VQDMLAD family. Summary: This is another set of instructions too complicated to be sensibly expressed in IR by anything short of a target-specific intrinsic. Given input vectors a,b, the instruction generates intermediate values 2(a[0]b[0]+a[1]+b[1]), 2(a[2]b[2]+a[3]+b[3]), etc; takes the high half of each double-width values, and overwrites half the lanes in the output vector c, which you therefore have to provide the input value of. Optionally you can swap the elements of b so that the are things like a[0]b[1]+a[1]b[0]; optionally you can round to nearest when taking the high half; and optionally you can take the difference rather than sum of the two products. Finally, saturation is applied when converting back to a single-width vector lane. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76359	2020-03-18 17:11:22 +00:00
Simon Tatham	928776de92	[ARM,MVE] Add intrinsics for the VQDMLAH family. Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123	2020-03-18 10:55:04 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
David Green	f67d93dc23	[ARM] Constant long shift combines This changes the way that asrl and lsrl intrinsics are lowered, going via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes. On top of that, it adds some constant folds for long shifts, in case it turns out that the shift amount was either constant or 0. Differential Revision: https://reviews.llvm.org/D75553	2020-03-13 08:54:59 +00:00
David Green	05334de679	[ARM] Long shift tests. NFC	2020-03-12 19:01:49 +00:00
Simon Tatham	3f8e714e2f	[ARM,MVE] Add intrinsics and isel for MVE fused multiply-add. Summary: This adds the ACLE intrinsic family for the VFMA and VFMS instructions, which perform fused multiply-add on vectors of floats. I've represented the unpredicated versions in IR using the cross- platform `@llvm.fma` IR intrinsic. We already had isel rules to convert one of those into a vector VFMA in the simplest possible way; but we didn't have rules to detect a negated argument and turn it into VFMS, or rules to detect a splat argument and turn it into one of the two vector/scalar forms of the instruction. Now we have all of those. The predicated form uses a target-specific intrinsic as usual, but I've stuck to just one, for a predicated FMA. The subtraction and splat versions are code-generated by passing an fneg or a splat as one of its operands, the same way as the unpredicated version. In arm_mve_defs.h, I've had to introduce a tiny extra piece of infrastructure: a record `id` for use in codegen dags which implements the identity function. (Just because you can't declare a Tablegen value of type dag which is //only// a `$varname`: you have to wrap it in something. Now I can write `(id $varname)` to get the same effect.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75998	2020-03-12 11:13:50 +00:00
Simon Tatham	068b2f313c	[ARM,MVE] Add the `vshlcq` intrinsics. Summary: The VSHLC instruction performs a left shift of a whole vector register by an immediate shift count up to 32, shifting in new bits at the low end from a GPR and delivering the shifted-out bits from the high end back into the same GPR. Since the instruction produces two outputs (the shifted vector register and the output GPR of shifted-out bits), it has to be instruction-selected in C++ rather than Tablegen. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75445	2020-03-04 08:49:27 +00:00
Simon Tatham	810127f6ab	[ARM,MVE] Add the `vsbciq` intrinsics. Summary: These are exactly parallel to the existing `vadciq` intrinsics, which we implemented last year as part of the original MVE intrinsics framework setup. Just like VADC/VADCI, the MVE VSBC/VSBCI instructions deliver two outputs, both of which the intrinsic exposes: a modified vector register and a carry flag. So they have to be instruction-selected in C++ rather than Tablegen. However, in this case, that's trivial: the same C++ isel routine we already have for VADC works unchanged, and all we have to do is to pass it a different instruction id. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75444	2020-03-04 08:49:27 +00:00
Simon Tatham	1a8cbfa514	[ARM,MVE] Add ACLE intrinsics for VCVT[ANPM] family. Summary: These instructions convert a vector of floats to a vector of integers of the same size, with assorted non-default rounding modes. Implemented in IR as target-specific intrinsics, because as far as I can see there are no matches for that functionality in the standard IR intrinsics list. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75255	2020-03-02 10:33:30 +00:00
Simon Tatham	b08d2ddd69	[ARM,MVE] Add ACLE intrinsics for VCVT.F32.F16 family. Summary: These instructions make a vector of `<4 x float>` by widening every other lane of a vector of `<8 x half>`. I wondered about representing these using standard IR, along the lines of a shufflevector to extract elements of the input into a `<4 x half>` followed by an `fpext` to turn that into `<4 x float>`. But it looks as if that would take a lot of work in isel lowering to make it match any pattern I could sensibly write in Tablegen, and also I haven't been able to think of any other case where that pattern might be generated in IR, so there wouldn't be any extra code generation win from doing it that way. Therefore, I've just used another target-specific intrinsic. We can always change it to the other way later if anyone thinks of a good reason. (In order to put the intrinsic definition near similar things in `IntrinsicsARM.td`, I've also lifted the definition of the `MVEMXPredicated` multiclass higher up the file, without changing it.) Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75254	2020-03-02 10:33:30 +00:00
Simon Tatham	a41ecf0eb0	[ARM,MVE] Add ACLE intrinsics for VQMOV[U]N family. Summary: These instructions work like VMOVN (narrowing a vector of wide values to half size, and overwriting every other lane of an output register with the result), except that the narrowing conversion is saturating. They come in three signedness flavours: signed to signed, unsigned to unsigned, and signed to unsigned. All are represented in IR by a target-specific intrinsic that takes two separate 'unsigned' flags. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75252	2020-03-02 10:33:30 +00:00
Simon Tatham	9eb3cc10b2	[ARM,MVE] Add predicated intrinsics for many unary functions. Summary: This commit adds the predicated MVE intrinsics for the same set of unary operations that I added in their unpredicated forms in * D74333 (vrint) * D74334 (vrev) * D74335 (vclz, vcls) * D74336 (vmovl) * D74337 (vmovn) but since the predicated versions are a lot more similar to each other, I've kept them all together in a single big patch. Everything here is done in the standard way we've been doing other predicated operations: an IR intrinsic called `@llvm.arm.mve.foo.predicated` and some isel rules that match that alongside whatever they accept for the unpredicated version of the same instruction. In order to write the isel rules conveniently, I've refactored the existing isel rules for the affected instructions into multiclasses parametrised by a vector-type class, in the usual way. All those refactorings are intended to leave the existing isel rules unchanged: the only difference should be that new ones for the predicated intrinsics are introduced. The only tiny infrastructure change I needed in this commit was to change the implementation of `IntrinsicMX` in `arm_mve_defs.td` so that the records it defines are anonymous rather than named (and use `NameOverride` to set the output intrinsic name), which allows me to call it twice in two multiclasses with the same `NAME` without a tablegen-time error. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75165	2020-02-26 15:12:07 +00:00
Mikhail Maltsev	f4fd7dbf85	[ARM,MVE] Add vqdmull[b,t]q intrinsic families Summary: This patch adds two families of ACLE intrinsics: vqdmullbq and vqdmulltq (including vector-vector and vector-scalar variants) and the corresponding LLVM IR intrinsics llvm.arm.mve.vqdmull and llvm.arm.mve.vqdmull.predicated. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74845	2020-02-20 10:51:19 +00:00
Mikhail Maltsev	461fd94f00	[ARM,MVE] Fix predicate types of some intrinsics Summary: Some predicated MVE intrinsics return a vector with element size different from the input vector element size. In this case the predicate must type correspond to the output vector type. The following intrinsics use the incorrect predicate type: * llvm.arm.mve.mull.int.predicated * llvm.arm.mve.mull.poly.predicated * llvm.arm.mve.vshll.imm.predicated This patch fixes the issue. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74838	2020-02-19 16:24:54 +00:00
Mikhail Maltsev	63809d365e	[ARM,MVE] Add vbrsrq intrinsics family Summary: This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit reverse and shift right. The intrinsics are compiled into the VBRSR instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr and arm.mve.vbrsr.predicated. Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74721	2020-02-18 17:31:21 +00:00
Simon Tatham	c32af4447f	[ARM,MVE] Add the vmovnbq,vmovntq intrinsic family. Summary: These are in some sense the inverse of vmovl[bt]q: they take a vector of n wide elements and truncate each to half its width. So they only write half a vector's worth of output data, and therefore they also take an 'inactive' parameter to provide the other half of the data in the output vector. So vmovnb overwrites the even lanes of 'inactive' with the narrowed values from the main input, and vmovnt overwrites the odd lanes. LLVM had existing codegen which generates these MVE instructions in response to IR that takes two vectors of wide elements, or two vectors of narrow ones. But in this case, we have one vector of each. So my clang codegen strategy is to narrow the input vector of wide elements by simply reinterpreting it as the output type, and then we have two narrow vectors and can represent the operation as a vector shuffle that interleaves lanes from both of them. Even so, not all the cases I needed ended up being selected as a single MVE instruction, so I've added a couple more patterns that spot combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be generated as a VMOVN instruction with operands swapped. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74337	2020-02-18 09:34:50 +00:00
Simon Tatham	5e97940cd2	[ARM,MVE] Add the vmovlbq,vmovltq intrinsic family. Summary: These intrinsics take a vector of 2n elements, and return a vector of n wider elements obtained by sign- or zero-extending every other element of the input vector. They're represented in IR as a shufflevector that extracts the odd or even elements of the input, followed by a sext or zext. Existing LLVM codegen already matches this pattern and generates the VMOVLB instruction (which widens the even-index input lanes). But no existing isel rule was generating VMOVLT, so I've added some. However, the new rules currently only work in little-endian MVE, because the pattern they expect from isel lowering includes a bitconvert which doesn't have the right semantics in big-endian. The output of one existing codegen test is improved by those new rules. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74336	2020-02-18 09:34:50 +00:00
Simon Tatham	68b49f7ef4	[ARM,MVE] Add intrinsics vclzq and vclsq. Summary: vclzq maps nicely to the existing target-independent @llvm.ctlz IR intrinsic. But vclsq ('count leading sign bits') has no corresponding target-independent intrinsic, so I've made up @llvm.arm.mve.vcls. This commit adds the unpredicated forms only. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74335	2020-02-18 09:34:50 +00:00
Simon Tatham	c8b3196e54	[ARM,MVE] Add intrinsics for FP rounding operations. Summary: This adds the unpredicated forms of six different MVE intrinsics which all round a vector of floating-point numbers to integer values, leaving them still in FP format, differing only in rounding mode and exception settings. Five of them map to existing target-independent intrinsics in LLVM IR, such as @llvm.trunc and @llvm.rint. The sixth, mapping to the `vrintn` instruction, is done by inventing a target-specific intrinsic. (`vrintn` behaves the same as `vrintx` in terms of the output value: the side effects on the FPSCR flags are the only difference between the two. But ACLE specifies separate user-callable intrinsics for the two, so the side effects matter enough to make sure we generate the right one of the two instructions in each case.) Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74333	2020-02-18 09:34:50 +00:00
Mikhail Maltsev	489f62e801	[ARM,MVE] Add vector-scalar intrinsics Summary: This patch adds vector-scalar variants to the following families of MVE intrinsics: * vaddq * vsubq * vmulq * vqaddq * vqsubq * vhaddq * vhsubq * vqdmulhq * vqrdmulhq The vector-scalar variants perform a splat operation on the scalar operand and then perform the same operations as their vector-vector counterparts. Code generation is done accordingly (using LLVM IR 'insert' and 'shuffle' operations which are later converted into an ARMvdup SDNode). Reviewers: simon_tatham, dmgreen, MarkMurrayARM, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74620	2020-02-17 17:47:05 +00:00
Mikhail Maltsev	2694cc3dca	[ARM][MVE] Add fixed point vector conversion intrinsics Summary: This patch implements the following Arm ACLE MVE intrinsics: * vcvtq_n_* * vcvtq_m_n_* * vcvtq_x_n_* and two corresponding LLVM IR intrinsics: * int_arm_mve_vcvt_fix (vcvtq_n_) int_arm_mve_vcvt_fix_predicated (vcvtq_m_n_, vcvtq_x_n_) Reviewers: simon_tatham, ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74134	2020-02-06 16:49:45 +00:00
Simon Tatham	f8d4afc49a	[ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq. Summary: These instructions generate a vector of consecutive elements starting from a given base value and incrementing by 1, 2, 4 or 8. The `wdup` versions also wrap the values back to zero when they reach a given limit value. The instruction updates the scalar base register so that another use of the same instruction will continue the sequence from where the previous one left off. At the IR level, I've represented these instructions as a family of target-specific intrinsics with two return values (the constructed vector and the updated base). The user-facing ACLE API provides a set of intrinsics that throw away the written-back base and another set that receive it as a pointer so they can update it, plus the usual predicated versions. Because the intrinsics return two values (as do the underlying instructions), the isel has to be done in C++. This is the first family of MVE intrinsics that use the `imm_1248` immediate type in the clang Tablegen framework, so naturally, I found I'd given it the wrong C integer type. Also added some tests of the check that the immediate has a legal value, because this is the first time those particular checks have been exercised. Finally, I also had to fix a bug in MveEmitter which failed an assertion when I nested two `seq` nodes (the inner one used to extract the two values from the pair returned by the IR intrinsic, and the outer one put on by the predication multiclass). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73357	2020-02-03 11:20:06 +00:00
Simon Tatham	cf7e98e6f7	[ARM,MVE] Add intrinsics for vdupq. Summary: The unpredicated case of this is trivial: the clang codegen just makes a vector splat of the input, and LLVM isel is already prepared to handle that. For the predicated version, I've generated a `select` between the same vector splat and the `inactive` input parameter, and added new Tablegen isel rules to match that pattern into a predicated `MVE_VDUP` instruction. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73356	2020-02-03 11:20:06 +00:00
David Green	8a6b948eb5	[MVE] Fixup order of gather writeback intrinsic outputs The MVE_VLDRWU32_qi_pre gather loads, like the other _pre/_post mve loads returns the writeback as result 0, the value as result 1. The llvm ir intrinsic seems to have this the other way around though, and so when lowering from one to the other we need to switch the first two outputs. I've also fixed up the types of _pre/_post on normal MVE loads. There we were already getting the values the right way around, just not for the types. I don't believe this was causing anything to go wrong, but it was very confusing to read in the debug output. Differential Revision: https://reviews.llvm.org/D73370	2020-01-27 14:08:06 +00:00
Simon Tatham	4321c6af28	[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics. Summary: Immediate vmvnq is code-generated as a simple vector constant in IR, and left to the backend to recognize that it can be created with an MVE VMVN instruction. The predicated version is represented as a select between the input and the same constant, and I've added a Tablegen isel rule to turn that into a predicated VMVN. (That should be better than the previous VMVN + VPSEL: it's the same number of instructions but now it can fold into an adjacent VPT block.) The unpredicated forms of VBIC and VORR are done by enabling the same isel lowering as for NEON, recognizing appropriate immediates and rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I then instruction-select into the right MVE instructions (now that I've also reworked those instructions to use the same MC operand encoding). In order to do that, I had to promote the Tablegen SDNode instance `NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and similarly for `NEONvbicImm`. The predicated forms of VBIC and VORR are represented as a vector select between the original input vector and the output of the unpredicated operation. The main convenience of this is that it still lets me use the existing isel lowering for VBICIMM/VORRIMM, and not have to write another copy of the operand encoding translation code. This intrinsic family is the first to use the `imm_simd` system I put into the MveEmitter tablegen backend. So, naturally, it showed up a bug or two (emitting bogus range checks and the like). Fixed those, and added a full set of tests for the permissible immediates in the existing Sema test. Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching because lowering started turning its input into a VBICIMM. Now it recognizes the VBICIMM instead. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72934	2020-01-23 11:53:52 +00:00
Mark Murray	b10a0eb04a	[ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments. Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken. Reviewers: dmgreen, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72830	2020-01-20 14:33:26 +00:00
David Green	ff2e67a4f7	[ARM] MVE VLDn postinc This adds Post inc variants of the VLD2/4 and VST2/4 instructions in MVE. It uses the same mechanism/nodes as Neon, transforming the intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a post-inc instruction. The code to do that is mostly taken from the existing Neon code, but simplified as less variants are needed. It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4 instrinsics, which allow the nodes to have MMO's, calculated as the full length to the memory being loaded/stored. Differential Revision: https://reviews.llvm.org/D71194	2020-01-20 06:57:07 +00:00
David Green	d6075726b9	[ARM] MVE VLDn post inc tests. NFC	2020-01-20 06:57:07 +00:00
Mark Murray	da9d57d2c2	[ARM][MVE][Intrinsics] Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics. Summary: Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics and unit tests. Reviewers: simon_tatham, miyuki, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72761	2020-01-15 17:20:15 +00:00
Simon Tatham	dac7b23cc3	[ARM,MVE] Intrinsics for variable shift instructions. This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329	2020-01-08 14:42:24 +00:00
Simon Tatham	3100480925	[ARM,MVE] Intrinsics for partial-overwrite imm shifts. This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328	2020-01-08 14:42:24 +00:00
Simon Tatham	34817e04fe	[ARM,MVE] Fix many signedness errors in MVE intrinsics. Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270	2020-01-06 16:33:16 +00:00
Simon Tatham	4978296cd8	[ARM,MVE] Support -ve offsets in gather-load intrinsics. Summary: The ACLE intrinsics with `gather_base` or `scatter_base` in the name are wrappers on the MVE load/store instructions that take a vector of base addresses and an immediate offset. The immediate offset can be up to 127 times the alignment unit, and it can be positive or negative. At the MC layer, we got that right. But in the Sema error checking for the wrapping intrinsics, the offset was erroneously constrained to be positive. To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that defines the intrinsics. But that causes integer literals like `0xfffffffffffffe04` to appear in the autogenerated calls to `SemaBuiltinConstantArgRange`, which provokes a compiler warning because that's out of the non-overflowing range of an `int64_t`. So I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead. Updated the tests of the Sema checks themselves, and also adjusted a random sample of the CodeGen tests to actually use negative offsets and prove they get all the way through code generation without causing a crash. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72268	2020-01-06 16:33:07 +00:00
Simon Tatham	b99ef32d04	[ARM,MVE] Generate the right instruction for vmaxnmq_m_f16. Summary: Due to a copy-paste error in the isel patterns, the predicated version of this intrinsic was expanding to the `VMAXNMT.F32` instruction instead of `VMAXNMT.F16`. Similarly for vminnm. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72269	2020-01-06 16:28:20 +00:00
Mark Murray	a2cd4600ec	[ARM][MVE][Intrinsics] All vqdmulhq/vqrdmulhq tests should be for signed numbers. Fix broken tests. I can't yet explain how they worked locally pre-commit.	2019-12-13 17:29:59 +00:00
Mikhail Maltsev	99581fd4c8	[ARM][MVE] Add vector reduction intrinsics with two vector operands Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062	2019-12-13 13:17:29 +00:00
Simon Tatham	25305a9311	[ARM][MVE] Add intrinsics for more immediate shifts. Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458	2019-12-13 13:07:39 +00:00
Mark Murray	228c74076d	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics. Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421	2019-12-13 11:51:23 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
Mikhail Maltsev	e6d3261c67	[ARM][MVE] Refactor complex vector intrinsics [NFCI] Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245	2019-12-10 16:21:52 +00:00

1 2

69 Commits