llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b590e0fd81	[TargetLowering][ARM][X86] Change softenSetCCOperands handling of ONE to avoid spurious exceptions for QNANs with strict FP quiet compares ONE is currently softened to OGT \| OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO \| OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN. This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception. There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix. Differential Revision: https://reviews.llvm.org/D72477	2020-01-10 11:00:17 -08:00
Diogo Sampaio	b1bb5ce96d	Reverting, broke some bots. Need further investigation. Summary: This reverts commit `8c12769f30`. Reviewers: Subscribers:	2020-01-10 13:40:41 +00:00
Diogo Sampaio	8c12769f30	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680	2020-01-10 11:25:44 +00:00
Momchil Velikov	173b711e83	[ARM][MVE] MVE-I should not be disabled by -mfpu=none Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843	2020-01-09 14:03:25 +00:00
Sam Parker	1cba261239	Revert "[ARM][LowOverheadLoops] Update liveness info" This reverts commit `e93e0d413f`. There's some ordering problems on some on the buildbots which needs investigating.	2020-01-09 09:22:06 +00:00
Sam Parker	e93e0d413f	[ARM][LowOverheadLoops] Update liveness info After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-09 08:33:47 +00:00
Simon Tatham	dac7b23cc3	[ARM,MVE] Intrinsics for variable shift instructions. This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329	2020-01-08 14:42:24 +00:00
Simon Tatham	3100480925	[ARM,MVE] Intrinsics for partial-overwrite imm shifts. This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328	2020-01-08 14:42:24 +00:00
Anna Welker	346f6b54bd	[ARM][MVE] Enable masked gathers from vector of pointers Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743	2020-01-08 13:43:12 +00:00
Sam Parker	96d2d96b03	[NFC][ARM] Update tests Run the update_mir_test on some of the low-overhead loop tests.	2020-01-08 06:08:30 -05:00
Sjoerd Meijer	ee811808a9	[ARM][MVE] Renamed VPT Block tests and files to something more informative. NFC	2020-01-07 16:16:54 +00:00
Sjoerd Meijer	e34801c8e6	[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470	2020-01-07 13:54:47 +00:00
David Green	f88d52728b	[ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering The segmented stack lowering code appears to be using ARM opcodes under Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR seems wrong. Either way, using the correct thumb vs arm opcodes is more correct. Differential Revision: https://reviews.llvm.org/D72074	2020-01-06 16:38:49 +00:00
Simon Tatham	34817e04fe	[ARM,MVE] Fix many signedness errors in MVE intrinsics. Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270	2020-01-06 16:33:16 +00:00
Simon Tatham	4978296cd8	[ARM,MVE] Support -ve offsets in gather-load intrinsics. Summary: The ACLE intrinsics with `gather_base` or `scatter_base` in the name are wrappers on the MVE load/store instructions that take a vector of base addresses and an immediate offset. The immediate offset can be up to 127 times the alignment unit, and it can be positive or negative. At the MC layer, we got that right. But in the Sema error checking for the wrapping intrinsics, the offset was erroneously constrained to be positive. To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that defines the intrinsics. But that causes integer literals like `0xfffffffffffffe04` to appear in the autogenerated calls to `SemaBuiltinConstantArgRange`, which provokes a compiler warning because that's out of the non-overflowing range of an `int64_t`. So I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead. Updated the tests of the Sema checks themselves, and also adjusted a random sample of the CodeGen tests to actually use negative offsets and prove they get all the way through code generation without causing a crash. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72268	2020-01-06 16:33:07 +00:00
Simon Tatham	b99ef32d04	[ARM,MVE] Generate the right instruction for vmaxnmq_m_f16. Summary: Due to a copy-paste error in the isel patterns, the predicated version of this intrinsic was expanding to the `VMAXNMT.F32` instruction instead of `VMAXNMT.F16`. Similarly for vminnm. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72269	2020-01-06 16:28:20 +00:00
David Green	fb8c9a339a	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139	2020-01-05 11:24:04 +00:00
Simon Pilgrim	eb0e1978df	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887	2020-01-04 13:15:50 +00:00
Roman Lebedev	3d492d7503	[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:45 +03:00
Sam Parker	69cfbb460e	[ARM][NFC] Update MIR test	2020-01-03 14:51:15 +00:00
QingShan Zhang	2133d3c558	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000	2020-01-03 03:26:41 +00:00
David Green	f323ab919a	[ARM] Add +mve feature to mve tests. NFC	2020-01-01 17:25:20 +00:00
Diogo Sampaio	f33fd9648c	[ARM][Thumb][FIX] Add unwinding information to t4 Summary: Add missing part of patch D71361. Now that the stack-frame can be operated using a addw/subw instruction, they should appear in the unwinding list. Reviewers: dmgreen, efriedma Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72000	2019-12-30 15:59:48 +00:00
David Green	b4abe7afbf	[ARM] Sink splat to ICmp This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997	2019-12-30 12:58:14 +00:00
David Green	a5a141544d	[ARM] MVE sink ICmp test. NFC	2019-12-30 12:58:13 +00:00
Diogo Sampaio	8232497c31	[ARM][THUMB2] Allow emitting T3 types of add and sub Summary: This patch allows to emit thumb2 add and sub instructions with 12 bit immediates in the emitT2RegPlusImmediate function. - Splitting parts of the D70680 Reviewers: eli.friedman, olista01, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71361	2019-12-30 11:03:58 +00:00
Fangrui Song	a36ddf0aa9	Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351	2019-12-24 16:27:51 -08:00
Fangrui Song	eb16435b5e	Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351	2019-12-24 16:05:15 -08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Sam Parker	acbc9aed72	[ARM][MVE] Fixes for tail predication. 1) Fix an issue with the incorrect value being used for the number of elements being passed to [d\|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp. Differential Revision: https://reviews.llvm.org/D71609	2019-12-20 09:34:18 +00:00
Sam Parker	4042518335	[ARM][MVE] Tail predicate in the presence of vcmp Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in. The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst. Differential Revision: https://reviews.llvm.org/D71107	2019-12-20 08:42:11 +00:00
Mark Murray	a2cd4600ec	[ARM][MVE][Intrinsics] All vqdmulhq/vqrdmulhq tests should be for signed numbers. Fix broken tests. I can't yet explain how they worked locally pre-commit.	2019-12-13 17:29:59 +00:00
Mikhail Maltsev	99581fd4c8	[ARM][MVE] Add vector reduction intrinsics with two vector operands Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062	2019-12-13 13:17:29 +00:00
Simon Tatham	25305a9311	[ARM][MVE] Add intrinsics for more immediate shifts. Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458	2019-12-13 13:07:39 +00:00
Mark Murray	228c74076d	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics. Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421	2019-12-13 11:51:23 +00:00
Sam Parker	1274ac3dc2	[ARM][MVE] Sink vector shift operand Recommit `e0b966643f`. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 14:34:00 +00:00
Sam Parker	f8ff3bf55b	Revert "[ARM][MVE] Sink vector shift operand" This reverts commit `e0b966643f`. Instruction selection is failing with expensive checks.	2019-12-12 07:52:57 +00:00
Sam Parker	e0b966643f	[ARM][MVE] Sink vector shift operand The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 07:35:21 +00:00
Diogo Sampaio	ee21934588	[ARM][NFC] Change test to use CHECK-NEXT	2019-12-11 14:25:36 +00:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
Mikhail Maltsev	e6d3261c67	[ARM][MVE] Refactor complex vector intrinsics [NFCI] Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245	2019-12-10 16:21:52 +00:00
Eric Christopher	9c6b7f68b8	Revert "[ARM][MVE] Add intrinsics for immediate shifts." and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: `8d70f3c933` `ff4dceef92` `d97b3e3e65`	2019-12-09 16:47:38 -08:00
Mark Murray	fc3417cb5a	[ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics. Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen, miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71198	2019-12-09 17:41:47 +00:00
Mark Murray	2eb61fa5d6	[ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT\|POLY) intrinsics. Summary: Add VMULL[BT]Q_(INT\|POLY) intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71066	2019-12-09 17:41:47 +00:00
Simon Tatham	d97b3e3e65	[ARM][MVE] Add intrinsics for immediate shifts. Summary: This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-09 15:44:09 +00:00
Mikhail Maltsev	0d1490bf6a	[ARM][MVE] Add complex vector intrinsics Summary: This patch adds intrinsics for the following MVE instructions: * VCADD, VHCADD * VCMUL * VCMLA Each of the above 3 groups has a corresponding new LLVM IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71190	2019-12-09 12:05:59 +00:00
David Green	b1aba0378e	[ARM] Enable MVE masked loads and stores With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968	2019-12-09 11:37:34 +00:00
David Green	792fab343b	[ARM] Attempt to use whole register vmovs for MVE shuffles. MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509	2019-12-08 10:53:54 +00:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00

1 2 3 4 5 ...

859 Commits