Commit Graph

82 Commits

Author SHA1 Message Date
Sam Parker dfa2c14b8f [ARM][LowOverheadLoops] Use iterator for InsertPt.
Use a MachineBasicBlock::iterator instead of a MachineInstr* for the
position of our LoopStart instruction. NFCish, as it change debug
info.
2020-10-01 08:32:35 +01:00
Sam Parker 779a8a028f [ARM][LowOverheadLoops] TryRemove helper.
Make a helper function that wraps around RDA::isSafeToRemove and
utilises the existing DCE IT block checks.
2020-09-30 09:37:24 +01:00
Sam Parker 195c22f273 [ARM] Change VPT state assertion
Just because we haven't encountered an instruction setting the VPR,
it doesn't mean we can't create a VPT block - the VPR maybe a
live-in.

Differential Revision: https://reviews.llvm.org/D88224
2020-09-30 08:01:10 +01:00
Sam Parker 4c19b89b25 [NFC][ARM] Comments and lambdas
Add some comments in LowOverheadLoops and make some lambda variables
explicit arguments instead of capturing.
2020-09-29 08:41:53 +01:00
Sam Parker e82a0084d3 [ARM][LowOverheadLoops] Cleanup and re-arrange
Rename and reorganise how we decide where to put the LoopStart
instruction.
2020-09-28 16:06:30 +01:00
Sam Parker 3d1d089155 [NFC][ARM] Factor out some logic for LoLoops.
Create a DCE function that accepts an instruction.
2020-09-28 14:51:52 +01:00
David Green e4b9867cb6 [ARM] Expand cannotInsertWDLSTPBetween to the last instruction
9d9a11c7be added this check for predicatable instructions between the
D/WLSTP and the loop's start, but it was missing the last instruction in
the block. Change it to use some iterators instead.

Differential Revision: https://reviews.llvm.org/D88354
2020-09-28 09:14:40 +01:00
Sam Parker a399d1880b [ARM] Find VPT implicitly predicated by VCTP
On failing to find a VCTP in the list of instructions that explicitly
predicate the entry of a VPT block, inspect whether the block is
controlled via VPT which is implicitly predicated due to it's
predicated operand(s).

Differential Revision: https://reviews.llvm.org/D87819
2020-09-25 08:50:53 +01:00
Sam Parker 00ee52ae04 [NFC][ARM] Remove dead loop.
Remove a loop that just calculated a couple of values that were now
longer needed.
2020-09-24 15:37:26 +01:00
Sjoerd Meijer 2fc690ac90 [ARM] LowoverheadLoops: add an option to disable tail-predication
This might be useful for testing. We already have an option -tail-predication
but that controls the MVETailPredication pass.  This
-arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops
pass.

Differential Revision: https://reviews.llvm.org/D88212
2020-09-24 13:30:48 +01:00
Sam Parker 9d9a11c7be [ARM] Check for LSTP side-effects.
If the LSTP instruction is inserted with an element count low enough
to immediately predicate some lanes as false, this can have some
unintended effects on any proceeding MVE instructions in the
preheader.

Differential Revision: https://reviews.llvm.org/D88209
2020-09-24 13:28:35 +01:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Sam Parker 94c799fecf [ARM] Trying to fix asan buildbot 2020-09-22 13:43:23 +01:00
Sam Parker b4fa884a73 [ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.

It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.

Differential Revision: https://reviews.llvm.org/D87681
2020-09-22 10:40:27 +01:00
Sam Parker a0c1dcc318 [ARM] Remove MVEDomain from VLDR/STR of P0
Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.

Differential Revision: https://reviews.llvm.org/D87900
2020-09-22 09:05:50 +01:00
Sam Parker 3ce9ec0cfa [ARM] Reorder some logic
Re-order some checks in ValidateMVEInst.
2020-09-16 13:39:22 +01:00
Sam Parker a63b2a4614 [ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really being able to handle VPT instructions when tail
predicating.

Differential Revision: https://reviews.llvm.org/D87610
2020-09-16 11:59:29 +01:00
Sam Tebbs ef0b9f3307 [ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT
This patch combines a VCMP followed by a VPST into a VPT, which has the
same semantics as the combination of the former two.
2020-09-16 09:27:10 +01:00
Sam Tebbs b81c57d646 [ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane
values

The effects of unpredicated vector instruction with unknown
lanes cannot be predicted and therefore cannot be tail predicated. This
does not apply to predicated vector instructions and so this patch
allows tail predication on them.

Differential Revision: https://reviews.llvm.org/D87376
2020-09-10 10:34:32 +01:00
Sam Tebbs 7aabb6ad77 [ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.
2020-09-08 10:30:05 +01:00
Sam Parker b30adfb529 [ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613
2020-08-28 13:56:16 +01:00
Fangrui Song c466c5fa7e [ARM] Fix build after D86087 2020-08-18 09:20:32 -07:00
David Green 3471520b1f [ARM] Allow tail predication of VLDn
VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up loading extra values into off vector lanes, not
effecting the on lanes. The same is true for loads in general where so
long as we are not using the other vector lanes, an unpredicated load
can be converted to a predicated one.

This marks VLD2 and VLD4 instructions as validForTailPredication and
allows any unpredicated load in tail predication loop, which seems to be
valid given the other checks we have.

Differential Revision: https://reviews.llvm.org/D86022
2020-08-18 17:15:45 +01:00
Sam Tebbs 31f02ac60a [ARM] Use mov operand if the mov cannot be moved while tail predicating
There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.

Differential Revision: https://reviews.llvm.org/D86087
2020-08-18 17:10:29 +01:00
Sam Parker 3ee580d017 [ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.

Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.

Differential Revision: https://reviews.llvm.org/D75533
2020-07-01 08:31:49 +01:00
Pierre-vh 835251f7d9 [Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206
2020-05-20 12:24:55 +01:00
Pierre-vh 24bf8063d6 [Target][ARM] Replace outdated getARMVPTBlockMask function
getARMVPTBlockMask was an outdated function that only handled basic
block masks: T, TT, TTT and TTTT. This worked fine before the MVE
VPT Block Insertion Pass improvements as it was the only kind of
masks that it could generate, but now it can generate more complex
masks that uses E predicates, so it's dangerous to use that function
to calculate VPT/VPST block masks.

I replaced it with 2 different functions:
  - expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at
    the end of an existing PredBlockMask.
  - recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator
    to a VPT/VPST instruction and recomputes its block mask by looking
    at the predicated instructions that follows it. This should be
    used to recompute a block mask after removing/adding a predicated
    instruction to the block.

The expandPredBlockMask function is pretty much imported from the MVE
VPT Blocks pass.

I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well
so they could use these new functions.

Differential Revision: https://reviews.llvm.org/D78201
2020-05-12 12:10:15 +01:00
David Green 892af45c86 [ARM] Distribute MVE post-increments
This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always turn into a post-inc during ISel, and due to the nature of
it being a graph we don't always know an order to use for the nodes, not
knowing which nodes to make post-inc and which to use the new post-inc
of. After ISel, we have an order that we can use to post-inc the
following instructions.

So this looks for a loads/store with a starting offset of 0, and an
add/sub from the same base, plus a number of other loads/stores. We then
do some checks and convert the zero offset load/store into a postinc
variant. Any loads/stores after it have the offset subtracted from their
immediates.  For example:
  LDR #4           LDR #4
  LDR #0           LDR_POSTINC #16
  LDR #8           LDR #-8
  LDR #12          LDR #-4
  ADD #16
It only handles MVE loads/stores at the moment. Normal loads/store will
be added in a followup patch, they just have some extra details to
ensure that we keep generating LDRD/LDM successfully.

Differential Revision: https://reviews.llvm.org/D77813
2020-04-22 14:16:51 +01:00
Pierre-vh dad848280d [Target][ARM] Change VPTMaskValues to the correct encoding
VPTMaskValue was using the "instruction" encoding to represent the masks
(= the same encoding as the one used by the instructions in an object file),
but it is only used to build MCOperands, so it should use the MCOperand
encoding of the masks, which is slightly different.

Differential Revision: https://reviews.llvm.org/D76139
2020-04-01 12:34:20 +01:00
Sam Parker 94b195ff12 [ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708
2020-03-30 09:55:41 +01:00
Sam Parker d7084fa34a [ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766
2020-03-27 15:26:13 +00:00
Sam Parker 94cacebcca [ARM][LowOverheadLoops] Add checks for narrowing
Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely,
including checks on specific operations that can generate non-zeros
from zero values, e.g VMVN. We can then check that any instructions
that retain some information in their output register (all narrowing
instructions) that they only use and def registers that always have
zeros in their falsely predicated bytes, whether or not tail
predication happens.

Most of the logic remains the same, just the names of the data
structures and helpers have been renamed to reflect the change in
logic. The key change, apart from the opcode checkers, is that the
FalseZeros set now strictly contains only instructions which will
always generate zeros, and not instructions that could also have
their false bytes masked away later.

Differential Revision: https://reviews.llvm.org/D76235
2020-03-24 08:41:48 +00:00
Sam Parker d941df363d [NFC][ARM] Reorder some logic
Move some logic around in LowOverheadLoop::ValidateLiveOut
2020-03-11 11:40:09 +00:00
Sam Parker ff9ac33e1e [ARM][MVE] Validate tail predication values
Iterate through the loop and check that the observable values
produced are the same whether tail predication happens or not.

We want to find out if the tail-predicated version of this loop will
produce the same values as the loop in its original form. For this to
be true, the newly inserted implicit predication must not change the
the (observable) results.

We're doing this because many instructions in the loop will not be
predicated and so the conversion from VPT predication to tail
predication can result in different values being produced, because of
falsely predicated lanes not being updated in the converted form.

A masked load, whether through VPT or tail predication, will write
zeros to any of the falsely predicated bytes. So, from the loads, we
know that the false lanes are zeroed and here we're trying to track
that those false lanes remain zero, or where they change, the
differences are masked away by their user(s).

All MVE loads and stores have to be predicated, so we know that any
load operands, or stored results are equivalent already. Other
explicitly predicated instructions will perform the same operation in
the original loop and the tail-predicated form too. Because of this,
we can insert loads, stores and other predicated instructions into
our KnownFalseZeros set and build from there.

Differential Revision: https://reviews.llvm.org/D75452
2020-03-10 09:59:01 +00:00
Sam Parker 5618e9be37 [RDA][ARM] collectKilledOperands across multiple blocks
Use MIOperand in collectLocalKilledOperands to make the search
global, as we already have to search for global uses too. This
allows us to delete more dead code when tail predicating.

Differential Revision: https://reviews.llvm.org/D75167
2020-03-03 15:23:05 +00:00
Sam Parker dfe8f5da4c [ARM][RDA] Allow multiple killed users
In RDA, check against the already decided dead instructions when
looking at users. This allows an instruction to be removed if it
has multiple users, but they're all dead.

This means that IT instructions can be considered killed once all
the itstate using instructions are dead.

Differential Revision: https://reviews.llvm.org/D75245
2020-03-03 15:12:29 +00:00
Sam Parker bf61421a02 [RDA] Track implicit-defs
Ensure that we're recording implicit defs, as well as visiting implicit
uses and implicit defs when we're walking through operands.

Differential Revision: https://reviews.llvm.org/D75185
2020-02-28 11:14:42 +00:00
Sam Parker 1d06e75df2 [ARM][RDA] add getUniqueReachingMIDef
Add getUniqueReachingMIDef to RDA which performs a global search for
a machine instruction that produces a unique definition of a given
register at a given point. Also add two helper functions
(getMIOperand) that wrap around this functionality to get the
incoming definition uses of a given instruction. These now replace
the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef
has been renamed to getReachingLocalMIDef and has been made private
along with getInstFromId.

Differential Revision: https://reviews.llvm.org/D74605
2020-02-26 11:15:26 +00:00
Sam Parker a67eb221e2 [RDA][ARM][LowOverheadLoops] Iteration count IT blocks
Change the way that we remove the redundant iteration count code in
the presence of IT blocks. collectLocalKilledOperands has been
introduced to scan an instructions operands, collecting the killed
instructions and then visiting them too. This is used to delete the
code in the preheader which calculates the iteration count. We also
track any IT blocks within the preheader and, if we remove all the
instructions from the IT block, we also remove the IT instruction.
isSafeToRemove is used to remove any redundant uses of the iteration
count within the loop body.

Differential Revision: https://reviews.llvm.org/D74975
2020-02-24 13:51:03 +00:00
Sam Parker de3e65e60c [ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction
within the loop is predicated on the vctp.

Differential Revision: https://reviews.llvm.org/D72713
2020-02-19 12:59:01 +00:00
Sam Parker fd01b2f4a6 [NFC][ARM] Convert some pointers to references. 2020-02-14 08:29:01 +00:00
Sam Parker 0a8cae10fe [ReachingDefs] Make isSafeToMove more strict.
Test that we're not moving the instruction through instructions with
side-effects.

Differential Revision: https://reviews.llvm.org/D74058
2020-02-06 14:06:08 +00:00
Sjoerd Meijer 01022af5d5 [ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.

Differential Revision: https://reviews.llvm.org/D73947
2020-02-05 15:15:46 +00:00
Sam Parker 564275289d [ARM][LowOverheadLoops] Fix loop count chain
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.

Differential Revision: https://reviews.llvm.org/D74037
2020-02-05 13:21:51 +00:00
Sam Parker 4c7f819204 [ARM][LowOverheadLoops] Ensure memory predication
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.

Differential Revision: https://reviews.llvm.org/D73616
2020-02-05 13:19:08 +00:00
Sam Parker 06e12893ff [ARM][LowOverheadLoops] Skip debug values
While iterating through the loop, don't inspect any dbg values.

Differential Revision: https://reviews.llvm.org/D73688
2020-01-30 11:51:58 +00:00
Sam Parker 6726d67bfd [ARM][LowOverheadLoops] Check scalar predicates
When trying to remove the loop iteration count, check that the
instruction will always execute.

Differential Revision: https://reviews.llvm.org/D73682
2020-01-30 09:13:04 +00:00
Sam Parker ac30ea2f87 [RDA][ARM] Move functionality into RDA
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt

And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove

Both of these have some wrappers too to make them more convienent to
use.

Differential Revision: https://reviews.llvm.org/D73460
2020-01-29 03:27:47 -05:00
Sam Parker 6c2df5d14f [ARM][LowOverheadLoops] Dont ignore VCTP
When expanding the LoopStart, we try to remove the iteration count
calculation. However, if part of the calculation was also used to
calculate the number of elements we could end up deleting
instructions that were required to feed DLSTP/WLSTP.

Differential Revision: https://reviews.llvm.org/D73275
2020-01-27 10:59:12 +00:00
Sam Parker ddbc077895 [NFC][ARM] Make some params members instead.
Add MachineLoopInfo and ReachingDefAnalysis as members of
LowOverheadLoop instead of passing them several times to different
methods.
2020-01-24 10:19:17 +00:00