Commit Graph

300 Commits

Author SHA1 Message Date
John Brawn b37d59353f [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.

The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.

Differential Revision: https://reviews.llvm.org/D73194
2020-02-03 12:59:12 +00:00
David Green c15a56f61a [ARM] Fill in FP16 FMA patterns
This adds fp16 variants of all the fma patterns in the ARM backend.

Differential Revision: https://reviews.llvm.org/D72138
2020-01-05 11:24:04 +00:00
Momchil Velikov 8e8e3181aa [ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.

Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.

Differential Revision: https://reviews.llvm.org/D71266
2019-12-13 18:19:40 +00:00
Victor Campos e478385e77 [ARM] Fix instruction selection for ARMISD::CMOV with f16 type
Summary:
In the cases where the CMOV (f16) SDNode is used with condition codes
LT, LE, VC or NE, it is successfully selected into a VSEL instruction.

In the remaining cases, however, instruction selection fails since VSEL
does not support other condition codes.

This patch handles such cases by using the single-precision version of
the VMOV instruction.

Reviewers: ostannard, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70667
2019-11-29 10:40:37 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
David Green 120a5e9a74 [ARM] Cortex-M4 schedule additions
This is an attempt to fill in some of the missing instructions from the
Cortex-M4 schedule, and make it easier to do the same for other ARM cpus.

- Some instructions are marked as hasNoSchedulingInfo as they are pseudos or
  otherwise do not require scheduling info
- A lot of features have been marked not supported
- Some WriteRes's have been added for cvt instructions.
- Some extra instruction latencies have been added, notably by relaxing the
  regex for dsp instruction to catch more cases, and some fp instructions.

This goes a long way to get the CompleteModel working for this CPU. It does not
go far enough as to get all scheduling info for all output operands correct.

Differential Revision: https://reviews.llvm.org/D67957

llvm-svn: 373163
2019-09-29 08:38:48 +00:00
Alexandros Lamprineas c006b6f4cb [MC][ARM] vscclrm disassembles as vldmia
Happens only when the mve.fp subtarget feature is enabled:

$ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b"
  .text
  vldmia  pc, {d0, d1, d2, d3}
$ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b"
  .text
  vscclrm {d0, d1, d2, d3, vpr}

Assembling returns the correct encoding with or without mve.fp:

$ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}"
  .text
  vscclrm {d0, d1, d2, d3, vpr}   @ encoding: [0x9f,0xec,0x08,0x0b]
$ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}"
  .text
  vscclrm {d0, d1, d2, d3, vpr}   @ encoding: [0x9f,0xec,0x08,0x0b]

The problem seems to be in the TableGen description of VSCCLRMD.
The least significant bit should be set to zero.

Differential Revision: https://reviews.llvm.org/D68025

llvm-svn: 373052
2019-09-27 08:22:24 +00:00
Oliver Stannard 3be2df2418 [ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings
Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set.

Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g.

Fill in the "should-be-(0)" bits.

Designate the Unpredictable{} bits for both VMRS and VMSR.

Patch by Mark Murray!

Differential revision: https://reviews.llvm.org/D66938

llvm-svn: 370729
2019-09-03 09:55:30 +00:00
Kyrylo Tkachov a3e26d1a6c [NFC] Test commit: add full stop at end of comment
llvm-svn: 366195
2019-07-16 09:15:01 +00:00
Mikhail Maltsev ed143c5d59 [ARM] Enable VPUSH/VPOP aliases when either MVE or VFP is present
Summary:
Use the same predicates as VSTMDB/VLDMIA since VPUSH/VPOP alias to
these.

Patch by Momchil Velikov.

Reviewers: ostannard, simon_tatham, SjoerdMeijer, samparker, t.p.northover, dmgreen

Reviewed By: dmgreen

Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64413

llvm-svn: 365604
2019-07-10 08:59:17 +00:00
Simon Tatham 7b63a9533c [ARM] Stop using scalar FP instructions in integer-only MVE mode.
If you compile with `-mattr=+mve` (enabling integer MVE instructions
but not floating-point ones), then the scalar FP //registers// exist
and it's legal to move things in and out of them, load and store them,
but it's not legal to do arithmetic on them.

In D60708, the calls to `addRegisterClass` in ARMISelLowering that
enable use of the scalar FP registers became conditionalised on
`Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so
that loads, stores and moves of those registers would work. But I
didn't realise that that would also enable all the operations on those
types by default.

Now, if the target doesn't have basic VFP, we follow up those
`addRegisterClass` calls by turning back off all the nontrivial
operations you can perform on f32 and f64. That causes several
knock-on failures, which are fixed by allowing the `VMOVDcc` and
`VMOVScc` instructions to be selected even if all you have is
`HasFPRegs`, and adjusting several checks for 'is this a double in a
single-precision-only world?' to the more general 'is this any FP type
we can't do arithmetic on?'. Between those, the whole of the
`float-ops.ll` and `fp16-instructions.ll` tests can now run in
MVE-without-FP mode and generate correct-looking code.

One odd side effect is that I had to relax the check lines in that
test so that they permit test functions like `add_f` to be generated
as tailcalls to software FP library functions, instead of ordinary
calls. Doing that is entirely legal, but the mystery is why this is
the first RUN line that's needed the relaxation: on the usual kind of
non-FP target, no tailcalls ever seem to be generated. Going by the
llc messages, I think `SoftenFloatResult` must be perturbing the code
generation in some way, but that's as much as I can guess.

Reviewers: dmgreen, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63938

llvm-svn: 364909
2019-07-02 11:26:00 +00:00
Simon Tatham 4cf18c2849 [ARM] Explicit lowering of half <-> double conversions.
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.

Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.

But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60692

llvm-svn: 364294
2019-06-25 11:24:50 +00:00
Simon Tatham 8c865cacda [ARM] Add the non-MVE instructions in Arm v8.1-M.
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.

To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need a new
addressing mode.

The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Reviewed By: samparker

Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62667

llvm-svn: 363039
2019-06-11 09:29:18 +00:00
Simon Tatham 67065c5c70 Revert rL362953 and its followup rL362955.
These caused a build failure because I managed not to notice they
depended on a later unpushed commit in my current stack. Sorry about
that.

llvm-svn: 362956
2019-06-10 15:58:19 +00:00
Simon Tatham baeea91933 [ARM] Add the non-MVE instructions in Arm v8.1-M.
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.

To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need some new
addressing modes.

The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Reviewed By: samparker

Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62667

llvm-svn: 362953
2019-06-10 15:36:34 +00:00
Simon Tatham b87669f166 [ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR.
Arm v8.1-M supports the VMOV instructions that move a half-precision
value to and from a GPR, but not if the GPR is SP or PC.

To fix this, I've changed those instructions to use the rGPR register
class instead of GPR. rGPR always excludes PC, and it excludes SP
except in the presence of the HasV8Ops target feature (i.e. Arm v8-A).
So the effect is that VMOV.F16 to and from PC is now illegal
everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A
cores (which I believe is all as it should be).

Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60704

llvm-svn: 362942
2019-06-10 14:43:55 +00:00
Sjoerd Meijer eb072b5a6a [ARM] Change the MC names for VMAXNM/VMINNM
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.

Patch by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60700

llvm-svn: 362097
2019-05-30 14:34:29 +00:00
Sjoerd Meijer 7eb95d672d [ARM] Introduce separate features for FP registers
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).

Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60694

llvm-svn: 362088
2019-05-30 12:37:05 +00:00
David Green 0dbafe191e [ARM] Select fp16 fma
This adds a pattern for fma, similar to the float and double patterns.

Differential Revision: https://reviews.llvm.org/D62330

llvm-svn: 361719
2019-05-26 11:34:30 +00:00
David Green 21542cd6f4 [ARM] Select a number of fp16 rounding functions
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718
2019-05-26 11:13:00 +00:00
David Green 2881325b17 [ARM] Select fp16 fabs
This adds a pattern for the fabs intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62324

llvm-svn: 361715
2019-05-26 10:51:58 +00:00
David Green aeade651f3 [ARM] Select fp16 fsqrt
This adds a pattern for the sqrt intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62322

llvm-svn: 361714
2019-05-26 10:42:24 +00:00
Diana Picus b6e83b98f9 [ARM GlobalISel] Select G_FCONSTANT for VFP3
Make it possible to TableGen code for FCONSTS and FCONSTD.

We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.

There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.

llvm-svn: 358063
2019-04-10 09:14:32 +00:00
Simon Tatham b70fc0c5fd [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Sjoerd Meijer ff3ab33ec8 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Diogo N. Sampaio 01b916e188 [ARM] Tighten f64<->f16 conversion requirements
Fix missing Requires fields.

Patch by Bernard Ogden (bogden)

Reviewers: SjoerdMeijer, javed.absar, t.p.northover	

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D51631

llvm-svn: 342061
2018-09-12 16:24:43 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Sjoerd Meijer 834f7dc7ab [ARM] FP16 vmaxnm/vminnm scalar instructions
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.

Differential Revision: https://reviews.llvm.org/D44675

llvm-svn: 330034
2018-04-13 15:34:26 +00:00
Ivan A. Kosarev f533a6e5aa [NEON] Support intrinsic for scalar and vector versions of the VRINTN instruction
Differential Revision: https://reviews.llvm.org/D45514

llvm-svn: 330011
2018-04-13 12:45:12 +00:00
Christof Douma a1e77c0e02 [ARM] Support float literals under XO
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.

Differential Revision: https://reviews.llvm.org/D44941

llvm-svn: 328691
2018-03-28 10:02:26 +00:00
Christof Douma 4a025cc79d [ARM] Support float literals under XO
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.

This patch also restores using fpcmp with immediate 0 under fp-armv8.

Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
2018-03-23 13:02:03 +00:00
Sjoerd Meijer d391a1a985 [ARM] FP16 codegen support for VSEL
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.

Differential Revision: https://reviews.llvm.org/D44518

llvm-svn: 327695
2018-03-16 08:06:25 +00:00
Sjoerd Meijer 9430c8cd1c [ARM] f16 vcmp fixes
This adds f16 VCMP match rules and fixes the test cases.

Differential Revision: https://reviews.llvm.org/D43291

llvm-svn: 325228
2018-02-15 10:33:07 +00:00
Sjoerd Meijer 8c0739347c [ARM] FP16 mov imm pattern
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).

Differential Revision: https://reviews.llvm.org/D42973

llvm-svn: 324456
2018-02-07 08:37:17 +00:00
Sjoerd Meijer d2718ba95e [ARM] f16 conversions
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.

Differential Revision: https://reviews.llvm.org/D42954

llvm-svn: 324360
2018-02-06 16:28:43 +00:00
Sjoerd Meijer 89ea2648bb [ARM] Armv8.2-A FP16 code generation (part 3/3)
This adds most of the FP16 codegen support, but these areas need further work:

- FP16 literals and immediates are not properly supported yet (e.g. literal
  pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
  added.

This will be addressed in follow-up patches.

Differential Revision: https://reviews.llvm.org/D42849

llvm-svn: 324321
2018-02-06 08:43:56 +00:00
Sjoerd Meijer 98d5359ea2 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861
2018-01-31 10:18:29 +00:00
Sjoerd Meijer 3ddb7fb663 [ARM] FP16Pat and FullFP16Pat patterns. NFC.
Create and use FP16Pat FullFP16Pat helper patterns to make the difference
explicit.

Differential Revision: https://reviews.llvm.org/D42634

llvm-svn: 323640
2018-01-29 11:28:06 +00:00
Sjoerd Meijer 011de9c0ca [ARM] Armv8.2-A FP16 code generation (part 1/3)
This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.

When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.

This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.


Thanks to Sam Parker and Oliver Stannard for their help and reviews!


Differential Revision: https://reviews.llvm.org/D38315

llvm-svn: 323512
2018-01-26 09:26:40 +00:00
Oliver Stannard 9cb89f6611 [ARM] Remove pre-UAL FLDM/FSTM aliases
These are pre-UAL syntax, and we don't support any other pre-UAL instructions,
with the exception of FLDMX/FSTMX, which don't have a UAL equivalent. Therefore
there's no reason to keep them or their AsmParser hacks around.

With the AsmParser hacks removed, the FLDMX and FSTMX instructions get the same
operand diagnostics as the UAL instructions.

Differential revision: https://reviews.llvm.org/D39196

llvm-svn: 318777
2017-11-21 16:20:25 +00:00
Eli Friedman edee9999c4 Revert r312724 ("[ARM] Remove redundant vcvt patterns.").
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.

test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.

Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.

llvm-svn: 314169
2017-09-25 22:07:33 +00:00
Andre Vieira 640527f7f1 [ARM] Fix assembly and disassembly for VMRS/VMSR
Reviewed by: t.p.northover
Differential Revision: https://reviews.llvm.org/D36306

llvm-svn: 313979
2017-09-22 12:17:42 +00:00
Simon Pilgrim 2b1c3bb25d [ARM] Add missing selection patterns for vnmla
For the following function:

  double fn1(double d0, double d1, double d2) {
    double a = -d0 - d1 * d2;
    return a;
  }

on ARM, LLVM generates code along the lines of

  vneg.f64  d0, d0
  vmls.f64  d0, d1, d2

i.e., a negate and a multiply-subtract.

The attached patch adds instruction selection patterns to allow it to generate the single instruction

  vnmla.f64  d0, d1, d2

(multiply-add with negation) instead, like GCC does.

Committed on behalf of @gergo- (Gergö Barany)

Differential Revision: https://reviews.llvm.org/D35911

llvm-svn: 313972
2017-09-22 09:50:52 +00:00
Benjamin Kramer 6ef976d5e1 [ARM] Remove redundant vcvt patterns.
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.

llvm-svn: 312724
2017-09-07 14:52:26 +00:00
Sam Parker 6dc3fcb1c6 [ARM][AArch64] v8.3-A Javascript Conversion
Armv8.3-A adds instructions that convert a double-precision floating
point number to a signed 32-bit integer with round towards zero,
designed for improving Javascript performance.

Differential Revision: https://reviews.llvm.org/D36785

llvm-svn: 311448
2017-08-22 11:08:21 +00:00
Tim Northover f370f2e3c6 Revert "[ARM] Fix assembly and disassembly for VMRS/VMSR"
This reverts r310243. Only MVFR2 is actually restricted to v8 and it'll be a
little while before we can get a proper fix together. Better that we allow
incorrect code than reject correct in the meantime.

llvm-svn: 310384
2017-08-08 17:16:46 +00:00
Andre Vieira 7dffb9bfa6 [ARM] Fix assembly and disassembly for VMRS/VMSR
This patch addresses two issues with assembly and disassembly for VMRS/VMSR:

1.currently VMRS/VMSR instructions accessing fpsid, mvfr{0-2} and fpexc, are
  accepted for non ARMv8-A targets.

2. all VMRS/VMSR instructions accept writing/reading to PC and SP, when only
   ARMv7-A and ARMv8-A should be allowed to write/read to SP and none to PC.

This patch addresses those issues and adds tests for these cases.

Differential Revision: https://reviews.llvm.org/D36306

llvm-svn: 310243
2017-08-07 08:41:05 +00:00
Oliver Stannard 852fbd2fea [ARM] Add scheduling classes for VFNM[AS]
The VFNM[AS] instructions did not have scheduling information attached, which
was causing assertion failures with the Cortex-A57 scheduling model and
-fp-contract=fast, because the Cortex-A57 sched model claims to be complete.

Differential Revision: https://reviews.llvm.org/D34139

llvm-svn: 305288
2017-06-13 13:04:32 +00:00
Oliver Stannard ad0973557c [ARM] Add scheduling info for VFMS
The scalar VFMS instructions did not have scheduling information attached (but
VFMA did), which was causing assertion failures with the Cortex-A57 scheduling
model and -fp-contract=fast.

Differential Revision: https://reviews.llvm.org/D34040

llvm-svn: 305064
2017-06-09 09:19:09 +00:00