llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	06e12893ff	[ARM][LowOverheadLoops] Skip debug values While iterating through the loop, don't inspect any dbg values. Differential Revision: https://reviews.llvm.org/D73688	2020-01-30 11:51:58 +00:00
Sam Parker	6726d67bfd	[ARM][LowOverheadLoops] Check scalar predicates When trying to remove the loop iteration count, check that the instruction will always execute. Differential Revision: https://reviews.llvm.org/D73682	2020-01-30 09:13:04 +00:00
Sam Parker	dc0d84f09e	[NFC][ARM] Add test	2020-01-29 06:59:21 -05:00
David Green	8a6b948eb5	[MVE] Fixup order of gather writeback intrinsic outputs The MVE_VLDRWU32_qi_pre gather loads, like the other _pre/_post mve loads returns the writeback as result 0, the value as result 1. The llvm ir intrinsic seems to have this the other way around though, and so when lowering from one to the other we need to switch the first two outputs. I've also fixed up the types of _pre/_post on normal MVE loads. There we were already getting the values the right way around, just not for the types. I don't believe this was causing anything to go wrong, but it was very confusing to read in the debug output. Differential Revision: https://reviews.llvm.org/D73370	2020-01-27 14:08:06 +00:00
Sjoerd Meijer	b567ff2fa0	[ARM][MVE] Tail-predication: support constant trip count We had support for runtime trip count values, but not constants, and this adds supports for that. And added a minor optimisation while I was add it: don't invoke Cleanup when there's nothing to clean up. Differential Revision: https://reviews.llvm.org/D73198	2020-01-27 11:05:26 +00:00
Sam Parker	6c2df5d14f	[ARM][LowOverheadLoops] Dont ignore VCTP When expanding the LoopStart, we try to remove the iteration count calculation. However, if part of the calculation was also used to calculate the number of elements we could end up deleting instructions that were required to feed DLSTP/WLSTP. Differential Revision: https://reviews.llvm.org/D73275	2020-01-27 10:59:12 +00:00
David Green	b535aa405a	[ARM] Use reduction intrinsics for larger than legal reductions The codegen for splitting a llvm.vector.reduction intrinsic into parts will be better than the codegen for the generic reductions. This will only directly effect when vectorization factors are specified by the user. Also added tests to make sure the codegen for larger reductions is OK. Differential Revision: https://reviews.llvm.org/D72257	2020-01-24 17:07:24 +00:00
Sam Parker	0ae13766ff	[NFC][ARM] Add test	2020-01-24 11:00:18 +00:00
Sam Parker	05532575e8	[RDA] Skip debug values Skip debug instructions when iterating through a block to find uses. Differential Revision: https://reviews.llvm.org/D73273	2020-01-23 17:04:54 +00:00
Sam Parker	0c943c6117	[NFC][ARM] Add test	2020-01-23 16:21:52 +00:00
Simon Tatham	4321c6af28	[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics. Summary: Immediate vmvnq is code-generated as a simple vector constant in IR, and left to the backend to recognize that it can be created with an MVE VMVN instruction. The predicated version is represented as a select between the input and the same constant, and I've added a Tablegen isel rule to turn that into a predicated VMVN. (That should be better than the previous VMVN + VPSEL: it's the same number of instructions but now it can fold into an adjacent VPT block.) The unpredicated forms of VBIC and VORR are done by enabling the same isel lowering as for NEON, recognizing appropriate immediates and rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I then instruction-select into the right MVE instructions (now that I've also reworked those instructions to use the same MC operand encoding). In order to do that, I had to promote the Tablegen SDNode instance `NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and similarly for `NEONvbicImm`. The predicated forms of VBIC and VORR are represented as a vector select between the original input vector and the output of the unpredicated operation. The main convenience of this is that it still lets me use the existing isel lowering for VBICIMM/VORRIMM, and not have to write another copy of the operand encoding translation code. This intrinsic family is the first to use the `imm_simd` system I put into the MveEmitter tablegen backend. So, naturally, it showed up a bug or two (emitting bogus range checks and the like). Fixed those, and added a full set of tests for the permissible immediates in the existing Sema test. Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching because lowering started turning its input into a VBICIMM. Now it recognizes the VBICIMM instead. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72934	2020-01-23 11:53:52 +00:00
David Green	58991ba773	[ARM] Mark MVE loads/store as not having side effects The hasSideEffect parameter is usually automatically inferred from instruction patterns. For some of our MVE instructions, we do not have patterns though, such as for the pre/post inc loads and stores. This instead specifies the flag manually on the base MVE_VLDRSTR_base tablegen class, making sure we get this correct. This can help with scheduling multiple loads more optimally. Here I've added a unittest as a more direct form of testing. Differential Revision: https://reviews.llvm.org/D73117	2020-01-22 17:56:55 +00:00
Sam Parker	c04b9ba595	[ARM][MVE] Clear MaskedInsts vector In MVETailPredication, clear the vector before running on a new loop. Differential Revision: https://reviews.llvm.org/D73048	2020-01-22 04:27:36 -05:00
Anna Welker	ff9877ce34	[ARM][MVE] Enable masked scatter Extends the gather/scatter pass in MVEGatherScatterLowering.cpp to enable the transformation of masked scatters into calls to MVE's masked scatter intrinsic. Differential Revision: https://reviews.llvm.org/D72856	2020-01-21 09:46:26 +00:00
Mark Murray	b10a0eb04a	[ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments. Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken. Reviewers: dmgreen, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72830	2020-01-20 14:33:26 +00:00
Sjoerd Meijer	8cba99e2aa	[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops. Differential Revision: https://reviews.llvm.org/D72714	2020-01-20 10:26:36 +00:00
David Green	ff2e67a4f7	[ARM] MVE VLDn postinc This adds Post inc variants of the VLD2/4 and VST2/4 instructions in MVE. It uses the same mechanism/nodes as Neon, transforming the intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a post-inc instruction. The code to do that is mostly taken from the existing Neon code, but simplified as less variants are needed. It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4 instrinsics, which allow the nodes to have MMO's, calculated as the full length to the memory being loaded/stored. Differential Revision: https://reviews.llvm.org/D71194	2020-01-20 06:57:07 +00:00
David Green	d6075726b9	[ARM] MVE VLDn post inc tests. NFC	2020-01-20 06:57:07 +00:00
David Green	5e51f75542	[ARM] Favour post inc for MVE loops We were previously not necessarily favouring postinc for the MVE loads and stores, leading to extra code prior to the loop to set up the preinc. MVE in general can benefit from postinc (as we don't have unrolled loops), and certain instructions like the VLD2's only post-inc versions are available. Differential Revision: https://reviews.llvm.org/D70790	2020-01-20 06:57:07 +00:00
Sam Parker	42350cd893	[ARM][MVE] Tail Predicate IsSafeToRemove Introduce a method to walk through use-def chains to decide whether it's possible to remove a given instruction and its users. These instructions are then stored in a set until the end of the transform when they're erased. This is now used to perform checks on the iteration count (LoopDec chain), element count (VCTP chain) and the possibly redundant iteration count. As well as being able to remove chains of instructions, we know also check that the sub feeding the vctp is producing the expected value. Differential Revision: https://reviews.llvm.org/D71837	2020-01-17 13:19:14 +00:00
Sam Parker	760b175109	[ARM][LowOverheadLoops] Update liveness info Recommitting `e93e0d413f` after reverting due to test failures, which will hopefully now be fixed. Original commit message: After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-16 15:44:25 +00:00
Anna Welker	c24cf97960	[ARM][MVE] Enable extending gathers Enables the masked gather pass to create extending masked gathers. Differential Revision: https://reviews.llvm.org/D72451	2020-01-16 15:24:54 +00:00
Mark Murray	da9d57d2c2	[ARM][MVE][Intrinsics] Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics. Summary: Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics and unit tests. Reviewers: simon_tatham, miyuki, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72761	2020-01-15 17:20:15 +00:00
David Green	1b264a8263	[ARM] Reegenerate MVE tests. NFC The mve-phireg.ll test no longer really tests what it was added for, but the original case was fairly complex. I've left the test in as a general codegen test.	2020-01-15 08:10:38 +00:00
Sjoerd Meijer	a08c0adee0	[ARM][MVE] VTP Block Pass fix Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP instruction that can be folded. The problem was that for each VPT block, we record the predicate statements with a list, but the same instruction was added twice. Thus, we were running in an assert trying to remove the same instruction twice. To avoid this the instructions are now recorded with a set. Differential Revision: https://reviews.llvm.org/D72699	2020-01-14 16:10:55 +00:00
Sam Parker	e27632c302	[ARM][LowOverheadLoops] Allow all MVE instrs. We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead. This also fixes a previous test failure. Differential Revision: https://reviews.llvm.org/D72509	2020-01-14 12:03:58 +00:00
Sam Parker	bad6032bc1	[ARM][LowOverheadLoops] Change predicate inspection Use the already provided helper function to get the operand type so that we can detect whether the vpr is being used as a predicate or not. Also use existing helpers to get the predicate indices when we converting the vpt blocks. This enables us to support both types of vpr predicate operand. Differential Revision: https://reviews.llvm.org/D72504	2020-01-14 11:47:34 +00:00
Diogo Sampaio	d94d079a6a	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb Reviewed By: efriedma Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680	2020-01-14 11:47:19 +00:00
Sam Parker	e73b20c57d	[ARM][MVE] Disallow VPSEL for tail predication Due to the current way that we collect predicated instructions, we can't easily handle vpsel in tail predicated loops. There are a couple of issues: 1) It will use the VPR as a predicate operand, but doesn't have to be instead a VPT block, which means we can assert while building up the VPT block because we don't find another VPST to being a new one. 2) VPSEL still requires a VPR operand even after tail predicating, which means we can't remove it unless there is another instruction, such as vcmp, that can provide the VPR def. The first issue should be a relatively simple fix in the logic of the LowOverheadLoops pass, whereas the second will require us to represent the 'implicit' tail predication with an explicit value. Differential Revision: https://reviews.llvm.org/D72629	2020-01-14 11:41:17 +00:00
Anna Welker	72ca86fd34	[ARM][MVE] Masked gathers from base + vector of offsets Enables the masked gather pass to create a masked gather loading from a base and vector of offsets. This also enables v8i16 and v16i8 gather loads. Differential Revision: https://reviews.llvm.org/D72330	2020-01-14 10:33:52 +00:00
Craig Topper	b590e0fd81	[TargetLowering][ARM][X86] Change softenSetCCOperands handling of ONE to avoid spurious exceptions for QNANs with strict FP quiet compares ONE is currently softened to OGT \| OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO \| OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN. This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception. There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix. Differential Revision: https://reviews.llvm.org/D72477	2020-01-10 11:00:17 -08:00
Diogo Sampaio	b1bb5ce96d	Reverting, broke some bots. Need further investigation. Summary: This reverts commit `8c12769f30`. Reviewers: Subscribers:	2020-01-10 13:40:41 +00:00
Diogo Sampaio	8c12769f30	[ARM][Thumb2] Fix ADD/SUB invalid writes to SP Summary: This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80". The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable. To enforce that the instruction t2(ADD\|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP). Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations. When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant ) It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt). Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma Reviewed By: efriedma Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70680	2020-01-10 11:25:44 +00:00
Momchil Velikov	173b711e83	[ARM][MVE] MVE-I should not be disabled by -mfpu=none Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843	2020-01-09 14:03:25 +00:00
Sam Parker	1cba261239	Revert "[ARM][LowOverheadLoops] Update liveness info" This reverts commit `e93e0d413f`. There's some ordering problems on some on the buildbots which needs investigating.	2020-01-09 09:22:06 +00:00
Sam Parker	e93e0d413f	[ARM][LowOverheadLoops] Update liveness info After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-09 08:33:47 +00:00
Simon Tatham	dac7b23cc3	[ARM,MVE] Intrinsics for variable shift instructions. This batch of intrinsics fills in all the shift instructions that take a variable shift distance in a register, instead of an immediate. Some of these instructions take a single shift distance in a scalar register and apply it to all lanes; others take a vector of per-lane distances. These instructions are all basically one family, varying in whether they saturate out-of-range values, and whether they round when bits are shifted off the bottom. I've implemented them at the IR level by a much smaller family of IR intrinsics, which take flag parameters to indicate saturating and/or rounding (along with the usual one to specify signed/unsigned integers). An oddity is that all of them are //left// shift instructions – but if you pass a negative shift count, they'll shift right. So the vector shift distances are always vectors of //signed// integers, regardless of whether you're considering the other input vector to be of signed or unsigned. Also, even the simplest `vshlq` instruction in this family (neither saturating nor rounding) has to be implemented as an IR intrinsic, because the ordinary LLVM IR `shl` operation would consider an out-of-range shift count to be undefined behavior. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72329	2020-01-08 14:42:24 +00:00
Simon Tatham	3100480925	[ARM,MVE] Intrinsics for partial-overwrite imm shifts. This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328	2020-01-08 14:42:24 +00:00
Anna Welker	346f6b54bd	[ARM][MVE] Enable masked gathers from vector of pointers Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743	2020-01-08 13:43:12 +00:00
Sam Parker	96d2d96b03	[NFC][ARM] Update tests Run the update_mir_test on some of the low-overhead loop tests.	2020-01-08 06:08:30 -05:00
Sjoerd Meijer	ee811808a9	[ARM][MVE] Renamed VPT Block tests and files to something more informative. NFC	2020-01-07 16:16:54 +00:00
Sjoerd Meijer	e34801c8e6	[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS This is a recommit of D71330, but with a few things fixed and changed: 1) ReachingDefAnalysis: this was not running with optnone as it was checking skipFunction(), which other analysis passes don't do. I guess this is a copy-paste from a codegen pass. 2) VPTBlockPass: here I've added skipFunction(), because like most/all optimisations, we don't want to run this with optnone. This fixes the issues with the initial/previous commit: the VPTBlockPass was running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was crashing querying ReachingDefAnalysis. I've added test case mve-vpt-block-optnone.mir to check that we don't run VPTBlock with optnone. Differential Revision: https://reviews.llvm.org/D71470	2020-01-07 13:54:47 +00:00
David Green	f88d52728b	[ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering The segmented stack lowering code appears to be using ARM opcodes under Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR seems wrong. Either way, using the correct thumb vs arm opcodes is more correct. Differential Revision: https://reviews.llvm.org/D72074	2020-01-06 16:38:49 +00:00
Simon Tatham	34817e04fe	[ARM,MVE] Fix many signedness errors in MVE intrinsics. Summary: Running an end-to-end test last week I noticed that a lot of the ACLE intrinsics that operate differently on vectors of signed and unsigned integers were ending up generating the signed version of the instruction unconditionally. This is because the IR intrinsics had no way to distinguish signed from unsigned: the LLVM type system just calls them both `v8i16` (or whatever), so you need either separate intrinsics for signed and unsigned, or a flag parameter that tells ISel which one to choose. This patch fixes all the problems of that kind that I've noticed, by adding an i32 flag parameter to many of the IR intrinsics which is set to 1 for unsigned (matching the existing practice in cases where we got it right), and conditioning all the isel patterns on that flag. So the fundamental change is in `IntrinsicsARM.td`, changing the low-level IR intrinsics API; there are knock-on changes in `arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the new unsigned flags). The rest of this patch is boringly updating tests. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72270	2020-01-06 16:33:16 +00:00
Simon Tatham	4978296cd8	[ARM,MVE] Support -ve offsets in gather-load intrinsics. Summary: The ACLE intrinsics with `gather_base` or `scatter_base` in the name are wrappers on the MVE load/store instructions that take a vector of base addresses and an immediate offset. The immediate offset can be up to 127 times the alignment unit, and it can be positive or negative. At the MC layer, we got that right. But in the Sema error checking for the wrapping intrinsics, the offset was erroneously constrained to be positive. To fix this I've adjusted the `imm_mem7bit` class in the Tablegen that defines the intrinsics. But that causes integer literals like `0xfffffffffffffe04` to appear in the autogenerated calls to `SemaBuiltinConstantArgRange`, which provokes a compiler warning because that's out of the non-overflowing range of an `int64_t`. So I've also tweaked `MveEmitter` to emit that as `-0x1fc` instead. Updated the tests of the Sema checks themselves, and also adjusted a random sample of the CodeGen tests to actually use negative offsets and prove they get all the way through code generation without causing a crash. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72268	2020-01-06 16:33:07 +00:00
Simon Tatham	b99ef32d04	[ARM,MVE] Generate the right instruction for vmaxnmq_m_f16. Summary: Due to a copy-paste error in the isel patterns, the predicated version of this intrinsic was expanding to the `VMAXNMT.F32` instruction instead of `VMAXNMT.F16`. Similarly for vminnm. Reviewers: dmgreen, miyuki, MarkMurrayARM Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72269	2020-01-06 16:28:20 +00:00
David Green	fb8c9a339a	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139	2020-01-05 11:24:04 +00:00
Simon Pilgrim	eb0e1978df	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887	2020-01-04 13:15:50 +00:00
Roman Lebedev	3d492d7503	[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:45 +03:00
Sam Parker	69cfbb460e	[ARM][NFC] Update MIR test	2020-01-03 14:51:15 +00:00

1 2 3 4 5 ...

889 Commits