These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.
The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.
Differential Revision: https://reviews.llvm.org/D73194
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.
Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.
Differential Revision: https://reviews.llvm.org/D71266
Summary:
In the cases where the CMOV (f16) SDNode is used with condition codes
LT, LE, VC or NE, it is successfully selected into a VSEL instruction.
In the remaining cases, however, instruction selection fails since VSEL
does not support other condition codes.
This patch handles such cases by using the single-precision version of
the VMOV instruction.
Reviewers: ostannard, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70667
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.
In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.
This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
implemented fine-tuning for when to produce vcmpe (i.e. not do it for
equality comparisons). The complexity introduced by those patches
isn't needed anymore if we just always produce vcmp instead. Maybe
these patches need to be reintroduced again once support is needed to
map potential LLVM-IR constrained floating point compare intrinsics to
the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
of these test, the intent of what is tested for isn't related to
whether the vcmp should produce an Invalid Operation exception or not.
Fixes PR43374.
Differential Revision: https://reviews.llvm.org/D68463
llvm-svn: 374025
This is an attempt to fill in some of the missing instructions from the
Cortex-M4 schedule, and make it easier to do the same for other ARM cpus.
- Some instructions are marked as hasNoSchedulingInfo as they are pseudos or
otherwise do not require scheduling info
- A lot of features have been marked not supported
- Some WriteRes's have been added for cvt instructions.
- Some extra instruction latencies have been added, notably by relaxing the
regex for dsp instruction to catch more cases, and some fp instructions.
This goes a long way to get the CompleteModel working for this CPU. It does not
go far enough as to get all scheduling info for all output operands correct.
Differential Revision: https://reviews.llvm.org/D67957
llvm-svn: 373163
Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set.
Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g.
Fill in the "should-be-(0)" bits.
Designate the Unpredictable{} bits for both VMRS and VMSR.
Patch by Mark Murray!
Differential revision: https://reviews.llvm.org/D66938
llvm-svn: 370729
Summary:
Use the same predicates as VSTMDB/VLDMIA since VPUSH/VPOP alias to
these.
Patch by Momchil Velikov.
Reviewers: ostannard, simon_tatham, SjoerdMeijer, samparker, t.p.northover, dmgreen
Reviewed By: dmgreen
Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64413
llvm-svn: 365604
If you compile with `-mattr=+mve` (enabling integer MVE instructions
but not floating-point ones), then the scalar FP //registers// exist
and it's legal to move things in and out of them, load and store them,
but it's not legal to do arithmetic on them.
In D60708, the calls to `addRegisterClass` in ARMISelLowering that
enable use of the scalar FP registers became conditionalised on
`Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so
that loads, stores and moves of those registers would work. But I
didn't realise that that would also enable all the operations on those
types by default.
Now, if the target doesn't have basic VFP, we follow up those
`addRegisterClass` calls by turning back off all the nontrivial
operations you can perform on f32 and f64. That causes several
knock-on failures, which are fixed by allowing the `VMOVDcc` and
`VMOVScc` instructions to be selected even if all you have is
`HasFPRegs`, and adjusting several checks for 'is this a double in a
single-precision-only world?' to the more general 'is this any FP type
we can't do arithmetic on?'. Between those, the whole of the
`float-ops.ll` and `fp16-instructions.ll` tests can now run in
MVE-without-FP mode and generate correct-looking code.
One odd side effect is that I had to relax the check lines in that
test so that they permit test functions like `add_f` to be generated
as tailcalls to software FP library functions, instead of ordinary
calls. Doing that is entirely legal, but the mystery is why this is
the first RUN line that's needed the relaxation: on the usual kind of
non-FP target, no tailcalls ever seem to be generated. Going by the
llc messages, I think `SoftenFloatResult` must be perturbing the code
generation in some way, but that's as much as I can guess.
Reviewers: dmgreen, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63938
llvm-svn: 364909
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.
Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.
But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60692
llvm-svn: 364294
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.
To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need a new
addressing mode.
The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Reviewed By: samparker
Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62667
llvm-svn: 363039
These caused a build failure because I managed not to notice they
depended on a later unpushed commit in my current stack. Sorry about
that.
llvm-svn: 362956
This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector
predication.
To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need some new
addressing modes.
The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Reviewed By: samparker
Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62667
llvm-svn: 362953
Arm v8.1-M supports the VMOV instructions that move a half-precision
value to and from a GPR, but not if the GPR is SP or PC.
To fix this, I've changed those instructions to use the rGPR register
class instead of GPR. rGPR always excludes PC, and it excludes SP
except in the presence of the HasV8Ops target feature (i.e. Arm v8-A).
So the effect is that VMOV.F16 to and from PC is now illegal
everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A
cores (which I believe is all as it should be).
Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60704
llvm-svn: 362942
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.
Patch by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60700
llvm-svn: 362097
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).
Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.
Patch mostly by Simon Tatham.
Differential Revision: https://reviews.llvm.org/D60694
llvm-svn: 362088
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.
Differential Revision: https://reviews.llvm.org/D62326
llvm-svn: 361718
Make it possible to TableGen code for FCONSTS and FCONSTD.
We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.
There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.
llvm-svn: 358063
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.
In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.
(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)
So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.
I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.
Reviewers: SjoerdMeijer, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57823
llvm-svn: 354768
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2
patches, that is trying to achieve the same for VFMA. The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:
--------------------------------------------------------
| Opt | < rL342874 | >= rL342874 | |
|------------------------------------------------------|
|-O3 | vmla | vmul | vmul |
| | | vadd | vadd |
|------------------------------------------------------|
|-Ofast | vfma | vfma | vmul |
| | | | vadd |
|------------------------------------------------------|
|-Oz | vmla | vmla | vmla |
--------------------------------------------------------
This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2. This also fixes a typo in the regression test added in rL342874.
Differential revision: https://reviews.llvm.org/D53314
llvm-svn: 344671
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D45204
llvm-svn: 333093
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.
Differential Revision: https://reviews.llvm.org/D44675
llvm-svn: 330034
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.
Differential Revision: https://reviews.llvm.org/D44941
llvm-svn: 328691
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.
This patch also restores using fpcmp with immediate 0 under fp-armv8.
Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.
Differential Revision: https://reviews.llvm.org/D44518
llvm-svn: 327695
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).
Differential Revision: https://reviews.llvm.org/D42973
llvm-svn: 324456
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.
Differential Revision: https://reviews.llvm.org/D42954
llvm-svn: 324360
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.
Differential Revision: https://reviews.llvm.org/D42580
llvm-svn: 323861
Create and use FP16Pat FullFP16Pat helper patterns to make the difference
explicit.
Differential Revision: https://reviews.llvm.org/D42634
llvm-svn: 323640
This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.
When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.
This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.
Thanks to Sam Parker and Oliver Stannard for their help and reviews!
Differential Revision: https://reviews.llvm.org/D38315
llvm-svn: 323512
These are pre-UAL syntax, and we don't support any other pre-UAL instructions,
with the exception of FLDMX/FSTMX, which don't have a UAL equivalent. Therefore
there's no reason to keep them or their AsmParser hacks around.
With the AsmParser hacks removed, the FLDMX and FSTMX instructions get the same
operand diagnostics as the UAL instructions.
Differential revision: https://reviews.llvm.org/D39196
llvm-svn: 318777
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.
test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.
Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.
llvm-svn: 314169
For the following function:
double fn1(double d0, double d1, double d2) {
double a = -d0 - d1 * d2;
return a;
}
on ARM, LLVM generates code along the lines of
vneg.f64 d0, d0
vmls.f64 d0, d1, d2
i.e., a negate and a multiply-subtract.
The attached patch adds instruction selection patterns to allow it to generate the single instruction
vnmla.f64 d0, d1, d2
(multiply-add with negation) instead, like GCC does.
Committed on behalf of @gergo- (Gergö Barany)
Differential Revision: https://reviews.llvm.org/D35911
llvm-svn: 313972
These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.
llvm-svn: 312724
Armv8.3-A adds instructions that convert a double-precision floating
point number to a signed 32-bit integer with round towards zero,
designed for improving Javascript performance.
Differential Revision: https://reviews.llvm.org/D36785
llvm-svn: 311448
This reverts r310243. Only MVFR2 is actually restricted to v8 and it'll be a
little while before we can get a proper fix together. Better that we allow
incorrect code than reject correct in the meantime.
llvm-svn: 310384
This patch addresses two issues with assembly and disassembly for VMRS/VMSR:
1.currently VMRS/VMSR instructions accessing fpsid, mvfr{0-2} and fpexc, are
accepted for non ARMv8-A targets.
2. all VMRS/VMSR instructions accept writing/reading to PC and SP, when only
ARMv7-A and ARMv8-A should be allowed to write/read to SP and none to PC.
This patch addresses those issues and adds tests for these cases.
Differential Revision: https://reviews.llvm.org/D36306
llvm-svn: 310243
The VFNM[AS] instructions did not have scheduling information attached, which
was causing assertion failures with the Cortex-A57 scheduling model and
-fp-contract=fast, because the Cortex-A57 sched model claims to be complete.
Differential Revision: https://reviews.llvm.org/D34139
llvm-svn: 305288
The scalar VFMS instructions did not have scheduling information attached (but
VFMA did), which was causing assertion failures with the Cortex-A57 scheduling
model and -fp-contract=fast.
Differential Revision: https://reviews.llvm.org/D34040
llvm-svn: 305064