Commit Graph

769 Commits

Author SHA1 Message Date
David Green 543236232c [ARM] Selection for MVE VMOVN
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.

Differential Revision: https://reviews.llvm.org/D68283

llvm-svn: 374781
2019-10-14 15:19:33 +00:00
David Green a5ef3daf1d [ARM] Add some VMOVN tests. NFC
llvm-svn: 374777
2019-10-14 14:29:26 +00:00
David Green 8628bb0491 [ARM] VQSUB instruction
Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics.

Differential Revision: https://reviews.llvm.org/D68567

llvm-svn: 374377
2019-10-10 16:34:30 +00:00
David Green 39596ec2fe [ARM] VQADD instructions
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.

Differential Revision: https://reviews.llvm.org/D68566

llvm-svn: 374336
2019-10-10 13:05:04 +00:00
David Green e2c72929c8 [ARM] Add saturating arithmetic tests for MVE. NFC
llvm-svn: 374159
2019-10-09 12:29:51 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Tim Northover a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
David Green c9b5ab8b1c [ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.

This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.

Differential Revision: https://reviews.llvm.org/D68241

llvm-svn: 373446
2019-10-02 11:40:51 +00:00
David Green a3ebcfe5a6 [ARM] Some MVE shuffle plus extend tests. NFC
llvm-svn: 373368
2019-10-01 18:04:02 +00:00
Sam Parker ef7990a88a [NFC][ARM][MVE] More tests
Add some tail predication tests with fast math.

llvm-svn: 373331
2019-10-01 13:02:14 +00:00
Sam Parker e3b4f0ec25 [NFC][ARM][MVE] More tests
Add some loop tests that cover different float operations and types.

llvm-svn: 373192
2019-09-30 08:49:42 +00:00
Sam Parker aac03ae06a [ARM][MVE] Change VCTP operand
The VCTP instruction will calculate the predicate masked based upon
the number of elements that need to be processed. I had inserted the
sub before the vctp intrinsic and supplied it as the operand, but
this is incorrect as the phi should directly feed the vctp. The sub
is calculating the value for the next iteration.

Differential Revision: https://reviews.llvm.org/D67921

llvm-svn: 373188
2019-09-30 08:03:23 +00:00
Sam Parker 110607b284 [NFC][ARM] Add some tail-predication tests
Use different data types for some simple loops.

llvm-svn: 373064
2019-09-27 10:33:53 +00:00
David Green 10d10102a4 [ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.

Differential Revision: https://reviews.llvm.org/D67664

llvm-svn: 372839
2019-09-25 10:16:48 +00:00
David Green 2fb41fc70c [ARM] Split large widening MVE loads
Similar to rL372717, we can force the splitting of extends of vector loads in
MVE, in order to use the better widening loads as opposed to going through
expensive extends. This adds a combine to early-on detect extends of loads and
split the load in two, from where normal legalisation will kick in and we get a
series of widening loads.

Differential Revision: https://reviews.llvm.org/D67909

llvm-svn: 372721
2019-09-24 10:53:09 +00:00
David Green 2462d421ee [ARM] MVE sext and widen/narrow tests from larger types. NFC
llvm-svn: 372719
2019-09-24 10:39:58 +00:00
David Green 49d851f403 [ARM] Split large truncating MVE stores
MVE does not have a simple sign extend instruction that can move elements
across lanes. We currently often end up moving each lane into and out of a GPR,
in order to get elements into the correct places. When we have a store of a
trunc (or a extend of a load), we can instead just split the store/load in two,
using the narrowing/widening load/store instructions from each half of the
vector.

This does that for stores. It happens very early in a store combine, so as to
easily detect the truncates. (It would be possible to do this later, but that
would involve looking through a buildvector of extract elements. Not impossible
but this way seemed simpler).

By enabling store combines we also get a vmovdrr combine for free, helping some
other tests.

Differential Revision: https://reviews.llvm.org/D67828

llvm-svn: 372717
2019-09-24 10:10:41 +00:00
Sam Parker 9feb429a33 [ARM][MVE] Remove old tail predicates
Remove any predicate that we replace with a vctp intrinsic, and try
to remove their operands too. Also look into the exit block to see if
there's any duplicates of the predicates that we've replaced and
clone the vctp to be used there instead.

Differential Revision: https://reviews.llvm.org/D67709

llvm-svn: 372567
2019-09-23 09:48:25 +00:00
Sam Parker 4ba6d0ded2 [ARM][LowOverheadLoops] Use subs during revert.
Check whether there are any uses or defs between the LoopDec and
LoopEnd. If there's not, then we can use a subs to set the cpsr and
skip generating a cmp.

Differential Revision: https://reviews.llvm.org/D67801

llvm-svn: 372560
2019-09-23 08:57:50 +00:00
Sam Parker 566127e376 [ARM][LowOverheadLoops] Use tBcc when reverting
Check the branch target ranges and use a tBcc instead of t2Bcc when
we can.

Differential Revision: https://reviews.llvm.org/D67796

llvm-svn: 372557
2019-09-23 08:35:31 +00:00
Oliver Cruickshank c84722ff27 [ARM] Fix CTTZ not generating correct instructions MVE
CTTZ intrinsic should have been set to Custom, not Expand

llvm-svn: 372401
2019-09-20 15:03:44 +00:00
David Green 0cfb78e52a [ARM] MVE i1 splat
We needn't BFI each lane individually into a predicate register when each lane
in the same. A simple sign extend and a vmsr will do.

Differential Revision: https://reviews.llvm.org/D67653

llvm-svn: 372313
2019-09-19 12:17:41 +00:00
Sam Parker 56aa691c41 [ARM] Fix for buildbots
I had missed that massive.mir also needed updating.

llvm-svn: 372303
2019-09-19 06:50:19 +00:00
David Green 91724b8530 [ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and stores
We were previously using the SelectT2AddrModeImm7 for both normal and narrowing
MVE loads/stores. As the narrowing instructions do not accept sp as a register,
it makes little sense to optimise a FrameIndex into the load, only to have to
recover that later on. This adds a SelectTAddrModeImm7 which does not do that
folding, and uses it for narrowing load/store patterns.

Differential Revision: https://reviews.llvm.org/D67489

llvm-svn: 372134
2019-09-17 15:32:28 +00:00
David Green 22a2209433 [ARM] Reserve an emergency spill slot for fp16 addressing modes that need it
Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that
use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill
slot for instructions that will be out of range to use sp directly.
AddrMode5FP16 is 8 bits with a scale of 2.

Differential Revision: https://reviews.llvm.org/D67483

llvm-svn: 372132
2019-09-17 15:23:09 +00:00
Sam Parker 1d9ba08543 [ARM] Fix for buildbots
Remove setPreservesCFG from ARMConstantIslandPass and add a couple
of -verify-machine-dom-info instances into the existing codegen
tests.

llvm-svn: 372126
2019-09-17 14:21:36 +00:00
Sam Parker f1d069e54d [ARM] Fix for buildbots
Add --verifymachineinstrs and update the remaining low overhead loop
tests.

llvm-svn: 372121
2019-09-17 13:46:26 +00:00
David Green 1ff9553057 [ARM] Fix for MVE load/store stack accesses
MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices.

Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used.

Differential Revision: https://reviews.llvm.org/D67327

llvm-svn: 372114
2019-09-17 12:58:51 +00:00
Sam Parker 36c922278e [ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!

So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.

Differential Revision: https://reviews.llvm.org/D67539

llvm-svn: 372111
2019-09-17 12:19:32 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
David Green ce7328cb61 [ARM] Fold VCMP into VPT
MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in
a single instruction, performing the compare and starting the VPT block in one.
This teaches the MVEVPTBlockPass to fold them, searching back through the
basicblock for a valid VCMP and creating the VPT from its operands.

There are some changes to the VPT instructions to accommodate this, altering
the order of the operands to match the VCMP better, and changing P0 register
defs to be VPR defs, as is used in other places.

Differential Revision: https://reviews.llvm.org/D66577

llvm-svn: 371982
2019-09-16 13:02:41 +00:00
David Green b325c05732 [ARM] Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

Differential Revision: https://reviews.llvm.org/D67186

llvm-svn: 371932
2019-09-15 14:14:47 +00:00
David Green 06b309d527 [ARM] Simplify and update vmla test. NFC
llvm-svn: 371930
2019-09-15 11:53:05 +00:00
Sam Tebbs 1572b68509 [ARM] Add support for MVE vmaxv and vminv
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.

Differential Revision: https://reviews.llvm.org/D66413

llvm-svn: 371827
2019-09-13 09:11:46 +00:00
Guillaume Chatelet 48904e9452 [Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67433

llvm-svn: 371608
2019-09-11 11:16:48 +00:00
David Green 2b7089949e [ARM] Fix loads and stores for predicate vectors
These predicate vectors can usually be loaded and stored with a single
instruction, a VSTR_P0. However this instruction will store the entire P0
predicate, 16 bits, zeroextended to 32bits. Each lane of the the
v4i1/v8i1/v16i1 representing 4/2/1 bits.

As far as I understand, when llvm says "store this v4i1", it really does need
to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the
interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a
store followed by a load, which is how the code is expanded.

So this instead lowers the v4i1/v8i1 load/store through some shuffles to get
the bits into the correct positions. This, as you might imagine, is not as
efficient as a single instruction. But I believe it is needed for correctness.
v16i1 equally should not load/store 32bits, only storing the 16bits of data.
Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not
changing). This is fine as they are self-consistent, it is only "externally
observable loads/stores" (from our point of view) that need to be corrected.

Differential revision: https://reviews.llvm.org/D67085

llvm-svn: 371419
2019-09-09 16:35:49 +00:00
Sam Parker 1ad508e8e2 [ARM][MVE] VCTP instruction selection
Add codegen support for vctp{8,16,32}.

Differential Revision: https://reviews.llvm.org/D67344

llvm-svn: 371395
2019-09-09 12:54:47 +00:00
Oliver Cruickshank a050307c05 [ARM] Add patterns for VSUB with q and r registers
Added patterns for VSUB to support q and r registers, which reduces
pressure on q registers.

llvm-svn: 371231
2019-09-06 17:02:42 +00:00
Oliver Cruickshank 3aed95af4e [ARM] Add patterns for VADD with q and r registers
Added support for VADD to use q and r registers, which reduces pressure
on q registers.

llvm-svn: 371230
2019-09-06 17:02:35 +00:00
Oliver Cruickshank 9bf27928e1 [ARM] Add patterns for VMUL with q and r registers
Added support for VMUL to use an r register, this reduces pressure on
the q registers.

llvm-svn: 371229
2019-09-06 17:02:21 +00:00
Sam Tebbs f1cdd95a2f [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.

This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Differential revision: https://reviews.llvm.org/D66295

llvm-svn: 371218
2019-09-06 16:01:32 +00:00
Sam Parker 312409e464 [ARM] MVE Tail Predication
The MVE and LOB extensions of Armv8.1m can be combined to enable
'tail predication' which removes the need for a scalar remainder
loop after vectorization. Lane predication is performed implicitly
via a system register. The effects of predication is described in
Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points
being:
- For vector operations that perform reduction across the vector and
  produce a scalar result, whether the value is accumulated or not.
- For non-load instructions, the predicate flags determine if the
  destination register byte is updated with the new value or if the
  previous value is preserved.
- For vector store instructions, whether the store occurs or not.
- For vector load instructions, whether the value that is loaded or
  whether zeros are written to that element of the destination
  register.

This patch implements a pass that takes a hardware loop, containing
masked vector instructions, and converts it something that resembles
an MVE tail predicated loop. Currently, if we had code generation,
we'd generate a loop in which the VCTP would generate the predicate
and VPST would then setup the value of VPR.PO. The loads and stores
would be placed in VPT blocks so this is not tail predication, but
normal VPT predication with the predicate based upon a element
counting induction variable. Further work needs to be done to finally
produce a true tail predicated loop.

Because only the loads and stores are predicated, in both the LLVM IR
and MIR level, we will restrict support to only lane-wise operations
(no horizontal reductions). We will perform a final check on MIR
during loop finalisation too.

Another restriction, specific to MVE, is that all the vector
instructions need operate on the same number of elements. This is
because predication is performed at the byte level and this is set
on entry to the loop, or by the VCTP instead.

Differential Revision: https://reviews.llvm.org/D65884

llvm-svn: 371179
2019-09-06 08:24:41 +00:00
David Green 83a3341246 [ARM] Fixup the creation of VPT blocks
This attempts to just fix the creation of VPT blocks, fixing up the iterating,
which instructions are considered in the bundle, and making sure that we do not
overrun the end of the block.

Differential Revision: https://reviews.llvm.org/D67219

llvm-svn: 371064
2019-09-05 13:37:04 +00:00
David Green 2f3574c168 [ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands
The code here seems to date back to r134705, when tablegen lowering was first
being added. I don't believe that we need to include CPSR implicit operands on
the MCInst. This now works more like other backends (like AArch64), where all
implicit registers are skipped.

This allows the AliasInst for CSEL's to match correctly, as can be seen in the
test changes.

Differential revision: https://reviews.llvm.org/D66703

llvm-svn: 370745
2019-09-03 11:30:54 +00:00
David Green 61973d978b [ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.

It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.

This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.

Differential revision: https://reviews.llvm.org/D66701

llvm-svn: 370741
2019-09-03 11:06:24 +00:00