llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Sam Parker	b30adfb529	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613	2020-08-28 13:56:16 +01:00
Fangrui Song	c466c5fa7e	[ARM] Fix build after D86087	2020-08-18 09:20:32 -07:00
David Green	3471520b1f	[ARM] Allow tail predication of VLDn VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one. This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have. Differential Revision: https://reviews.llvm.org/D86022	2020-08-18 17:15:45 +01:00
Sam Tebbs	31f02ac60a	[ARM] Use mov operand if the mov cannot be moved while tail predicating There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead. Differential Revision: https://reviews.llvm.org/D86087	2020-08-18 17:10:29 +01:00
Sam Parker	3ee580d017	[ARM][LowOverheadLoops] Handle reductions While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which store the previous value of vadd and then a vpsel in the exit block which is predicated upon a vctp. This vctp will combine the last two iterations using the vmov and vadd into a vector which can then be consumed by a vaddv. Once we have determined that it's safe to perform tail-predication, we need to change this sequence of instructions so that the predication doesn't produce incorrect code. This involves changing the register allocation of the vadd so it updates itself and the predication on the final iteration will not update the falsely predicated lanes. This mimics what the vmov, vctp and vpsel do and so we then don't need any of those instructions. Differential Revision: https://reviews.llvm.org/D75533	2020-07-01 08:31:49 +01:00
Pierre-vh	835251f7d9	[Target][ARM] Make Low Overhead Loops coexist with VPT blocks. Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOverheadLoops pass so it can handle those cases. It also adds support for VCMPs before the VCTP. Differential Revision: https://reviews.llvm.org/D78206	2020-05-20 12:24:55 +01:00
Pierre-vh	24bf8063d6	[Target][ARM] Replace outdated getARMVPTBlockMask function getARMVPTBlockMask was an outdated function that only handled basic block masks: T, TT, TTT and TTTT. This worked fine before the MVE VPT Block Insertion Pass improvements as it was the only kind of masks that it could generate, but now it can generate more complex masks that uses E predicates, so it's dangerous to use that function to calculate VPT/VPST block masks. I replaced it with 2 different functions: - expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at the end of an existing PredBlockMask. - recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator to a VPT/VPST instruction and recomputes its block mask by looking at the predicated instructions that follows it. This should be used to recompute a block mask after removing/adding a predicated instruction to the block. The expandPredBlockMask function is pretty much imported from the MVE VPT Blocks pass. I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well so they could use these new functions. Differential Revision: https://reviews.llvm.org/D78201	2020-05-12 12:10:15 +01:00
David Green	892af45c86	[ARM] Distribute MVE post-increments This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions. So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully. Differential Revision: https://reviews.llvm.org/D77813	2020-04-22 14:16:51 +01:00
Pierre-vh	dad848280d	[Target][ARM] Change VPTMaskValues to the correct encoding VPTMaskValue was using the "instruction" encoding to represent the masks (= the same encoding as the one used by the instructions in an object file), but it is only used to build MCOperands, so it should use the MCOperand encoding of the masks, which is slightly different. Differential Revision: https://reviews.llvm.org/D76139	2020-04-01 12:34:20 +01:00
Sam Parker	94b195ff12	[ARM][LowOverheadLoops] Add horizontal reduction support Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredication. Differential Revision: https://reviews.llvm.org/D76708	2020-03-30 09:55:41 +01:00
Sam Parker	d7084fa34a	[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes. Differential Revision: https://reviews.llvm.org/D76766	2020-03-27 15:26:13 +00:00
Sam Parker	94cacebcca	[ARM][LowOverheadLoops] Add checks for narrowing Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens. Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later. Differential Revision: https://reviews.llvm.org/D76235	2020-03-24 08:41:48 +00:00
Sam Parker	d941df363d	[NFC][ARM] Reorder some logic Move some logic around in LowOverheadLoop::ValidateLiveOut	2020-03-11 11:40:09 +00:00
Sam Parker	ff9ac33e1e	[ARM][MVE] Validate tail predication values Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not. We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the (observable) results. We're doing this because many instructions in the loop will not be predicated and so the conversion from VPT predication to tail predication can result in different values being produced, because of falsely predicated lanes not being updated in the converted form. A masked load, whether through VPT or tail predication, will write zeros to any of the falsely predicated bytes. So, from the loads, we know that the false lanes are zeroed and here we're trying to track that those false lanes remain zero, or where they change, the differences are masked away by their user(s). All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already. Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form too. Because of this, we can insert loads, stores and other predicated instructions into our KnownFalseZeros set and build from there. Differential Revision: https://reviews.llvm.org/D75452	2020-03-10 09:59:01 +00:00
Sam Parker	5618e9be37	[RDA][ARM] collectKilledOperands across multiple blocks Use MIOperand in collectLocalKilledOperands to make the search global, as we already have to search for global uses too. This allows us to delete more dead code when tail predicating. Differential Revision: https://reviews.llvm.org/D75167	2020-03-03 15:23:05 +00:00
Sam Parker	dfe8f5da4c	[ARM][RDA] Allow multiple killed users In RDA, check against the already decided dead instructions when looking at users. This allows an instruction to be removed if it has multiple users, but they're all dead. This means that IT instructions can be considered killed once all the itstate using instructions are dead. Differential Revision: https://reviews.llvm.org/D75245	2020-03-03 15:12:29 +00:00
Sam Parker	bf61421a02	[RDA] Track implicit-defs Ensure that we're recording implicit defs, as well as visiting implicit uses and implicit defs when we're walking through operands. Differential Revision: https://reviews.llvm.org/D75185	2020-02-28 11:14:42 +00:00
Sam Parker	1d06e75df2	[ARM][RDA] add getUniqueReachingMIDef Add getUniqueReachingMIDef to RDA which performs a global search for a machine instruction that produces a unique definition of a given register at a given point. Also add two helper functions (getMIOperand) that wrap around this functionality to get the incoming definition uses of a given instruction. These now replace the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef has been renamed to getReachingLocalMIDef and has been made private along with getInstFromId. Differential Revision: https://reviews.llvm.org/D74605	2020-02-26 11:15:26 +00:00
Sam Parker	a67eb221e2	[RDA][ARM][LowOverheadLoops] Iteration count IT blocks Change the way that we remove the redundant iteration count code in the presence of IT blocks. collectLocalKilledOperands has been introduced to scan an instructions operands, collecting the killed instructions and then visiting them too. This is used to delete the code in the preheader which calculates the iteration count. We also track any IT blocks within the preheader and, if we remove all the instructions from the IT block, we also remove the IT instruction. isSafeToRemove is used to remove any redundant uses of the iteration count within the loop body. Differential Revision: https://reviews.llvm.org/D74975	2020-02-24 13:51:03 +00:00
Sam Parker	de3e65e60c	[ARM][LowOverheadLoops] Check loop liveouts Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp. Differential Revision: https://reviews.llvm.org/D72713	2020-02-19 12:59:01 +00:00
Sam Parker	fd01b2f4a6	[NFC][ARM] Convert some pointers to references.	2020-02-14 08:29:01 +00:00
Sam Parker	0a8cae10fe	[ReachingDefs] Make isSafeToMove more strict. Test that we're not moving the instruction through instructions with side-effects. Differential Revision: https://reviews.llvm.org/D74058	2020-02-06 14:06:08 +00:00
Sjoerd Meijer	01022af5d5	[ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression Once we have created a tail-predicated hardware-loop, and thus know the number of elements that are processed, we want to clean-up the iteration count expression of that loop. In D73682, we bailed the analysis on conditionally executed instructions. This adds support for IT-blocks, so that we can handle these cases again. The restriction is that we only support IT blocks containing 1 statement, but that seems to cover most cases and forms of the iteration count expression. Differential Revision: https://reviews.llvm.org/D73947	2020-02-05 15:15:46 +00:00
Sam Parker	564275289d	[ARM][LowOverheadLoops] Fix loop count chain Checking that the use-def chain that performs the loop count isSafeToRemove is not sufficient because it means that we can remove register copies that we need to restore lr to its correct value. This change now prevents the transform from kicking in for the 'remove-elem-moves' test which needs to addressed later on. Differential Revision: https://reviews.llvm.org/D74037	2020-02-05 13:21:51 +00:00
Sam Parker	4c7f819204	[ARM][LowOverheadLoops] Ensure memory predication While validating each MVE instruction, check that all instructions that touch memory are somehow predicated upon the VCTP. Differential Revision: https://reviews.llvm.org/D73616	2020-02-05 13:19:08 +00:00
Sam Parker	06e12893ff	[ARM][LowOverheadLoops] Skip debug values While iterating through the loop, don't inspect any dbg values. Differential Revision: https://reviews.llvm.org/D73688	2020-01-30 11:51:58 +00:00
Sam Parker	6726d67bfd	[ARM][LowOverheadLoops] Check scalar predicates When trying to remove the loop iteration count, check that the instruction will always execute. Differential Revision: https://reviews.llvm.org/D73682	2020-01-30 09:13:04 +00:00
Sam Parker	ac30ea2f87	[RDA][ARM] Move functionality into RDA Add several new helpers to RDA: - hasLocalDefBefore - isRegDefinedAfter - isSafeToDefRegAt And move two bits of logic from ARMLowOverheadLoops into RDA: - isSafeToMove - isSafeToRemove Both of these have some wrappers too to make them more convienent to use. Differential Revision: https://reviews.llvm.org/D73460	2020-01-29 03:27:47 -05:00
Sam Parker	6c2df5d14f	[ARM][LowOverheadLoops] Dont ignore VCTP When expanding the LoopStart, we try to remove the iteration count calculation. However, if part of the calculation was also used to calculate the number of elements we could end up deleting instructions that were required to feed DLSTP/WLSTP. Differential Revision: https://reviews.llvm.org/D73275	2020-01-27 10:59:12 +00:00
Sam Parker	ddbc077895	[NFC][ARM] Make some params members instead. Add MachineLoopInfo and ReachingDefAnalysis as members of LowOverheadLoop instead of passing them several times to different methods.	2020-01-24 10:19:17 +00:00
Sam Parker	42350cd893	[ARM][MVE] Tail Predicate IsSafeToRemove Introduce a method to walk through use-def chains to decide whether it's possible to remove a given instruction and its users. These instructions are then stored in a set until the end of the transform when they're erased. This is now used to perform checks on the iteration count (LoopDec chain), element count (VCTP chain) and the possibly redundant iteration count. As well as being able to remove chains of instructions, we know also check that the sub feeding the vctp is producing the expected value. Differential Revision: https://reviews.llvm.org/D71837	2020-01-17 13:19:14 +00:00
Sam Parker	760b175109	[ARM][LowOverheadLoops] Update liveness info Recommitting `e93e0d413f` after reverting due to test failures, which will hopefully now be fixed. Original commit message: After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-16 15:44:25 +00:00
Sam Parker	e27632c302	[ARM][LowOverheadLoops] Allow all MVE instrs. We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead. This also fixes a previous test failure. Differential Revision: https://reviews.llvm.org/D72509	2020-01-14 12:03:58 +00:00
Sam Parker	bad6032bc1	[ARM][LowOverheadLoops] Change predicate inspection Use the already provided helper function to get the operand type so that we can detect whether the vpr is being used as a predicate or not. Also use existing helpers to get the predicate indices when we converting the vpt blocks. This enables us to support both types of vpr predicate operand. Differential Revision: https://reviews.llvm.org/D72504	2020-01-14 11:47:34 +00:00
Sam Parker	e73b20c57d	[ARM][MVE] Disallow VPSEL for tail predication Due to the current way that we collect predicated instructions, we can't easily handle vpsel in tail predicated loops. There are a couple of issues: 1) It will use the VPR as a predicate operand, but doesn't have to be instead a VPT block, which means we can assert while building up the VPT block because we don't find another VPST to being a new one. 2) VPSEL still requires a VPR operand even after tail predicating, which means we can't remove it unless there is another instruction, such as vcmp, that can provide the VPR def. The first issue should be a relatively simple fix in the logic of the LowOverheadLoops pass, whereas the second will require us to represent the 'implicit' tail predication with an explicit value. Differential Revision: https://reviews.llvm.org/D72629	2020-01-14 11:41:17 +00:00
Sjoerd Meijer	add04b9653	ARMLowOverheadLoops: return earlier to avoid printing irrelevant dbg msg. NFC	2020-01-13 10:24:10 +00:00
Sjoerd Meijer	4569f63ae1	ARMLowOverheadLoops: a few more dbg msgs to better trace rejected TP loops. NFC.	2020-01-10 14:11:52 +00:00
Sam Parker	9c91d79dad	[NFC][ARM] LowOverheadLoop comments Add a comment describing the dependencies of the pass.	2020-01-09 12:54:01 +00:00
Sam Parker	1cba261239	Revert "[ARM][LowOverheadLoops] Update liveness info" This reverts commit `e93e0d413f`. There's some ordering problems on some on the buildbots which needs investigating.	2020-01-09 09:22:06 +00:00
Sam Parker	e93e0d413f	[ARM][LowOverheadLoops] Update liveness info After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-09 08:33:47 +00:00
Sjoerd Meijer	0efc9e5a8c	[ARM][MVE] More MVETailPredication debug messages. NFC. I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops. Differential Revision: https://reviews.llvm.org/D71549	2020-01-06 09:56:02 +00:00
Sam Parker	8f6a67632a	[ARM][NFC] Move tail predication checks Extract the tail predication validation checks out into their own LowOverHeadLoop method.	2020-01-03 03:50:54 -05:00
Sam Parker	acbc9aed72	[ARM][MVE] Fixes for tail predication. 1) Fix an issue with the incorrect value being used for the number of elements being passed to [d\|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp. Differential Revision: https://reviews.llvm.org/D71609	2019-12-20 09:34:18 +00:00
Sam Parker	4042518335	[ARM][MVE] Tail predicate in the presence of vcmp Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in. The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst. Differential Revision: https://reviews.llvm.org/D71107	2019-12-20 08:42:11 +00:00
Sjoerd Meijer	049f9672d8	[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC. In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcodes of MVE instructions. This moves all these utility functions to ARMBaseInstrInfo. Diferential Revision: https://reviews.llvm.org/D71426	2019-12-16 09:13:59 +00:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Sam Parker	28166816b0	[ARM][ReachingDefs] Remove dead code in loloops. Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register. For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations. Differential Revision: https://reviews.llvm.org/D70240	2019-11-26 10:27:46 +00:00
Sam Parker	cced971fd3	[ARM][ReachingDefs] RDA in LoLoops Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFromId, return a MachineInstr for which the given integer corresponds to. - hasSameReachingDef, return whether two MachineInstr use the same def of a register. - isRegUsedAfter, return whether a register is used after a given MachineInstr. These methods have been used in ARMLowOverhead to replace searching for uses/defs. Differential Revision: https://reviews.llvm.org/D70009	2019-11-26 10:13:46 +00:00
Sam Parker	8978c12b39	[ARM][MVE] Tail predication conversion This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion, with the following restrictions: - Operates only on single block loops. - The loop can only contain a single vctp instruction. - No other instructions can write to the vpr. - We only allow a subset of the mve instructions in the loop. TODO: Pass the number of elements, not the number of iterations to dlstp/wlstp. Differential Revision: https://reviews.llvm.org/D69945	2019-11-19 08:22:18 +00:00

1 2

63 Commits