FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
Add support for DestructiveBinaryComm DestructiveInstType, as well as the lowering code to expand the new Pseudos into the final movprfx+instruction pairs.
Differential Revision: https://reviews.llvm.org/D73711
Summary:
Implements the @llvm.aarch64.sve.index intrinsic, which
takes a scalar base and step value.
This patch also adds the printSImm function to AArch64InstPrinter
to ensure that immediates of type i8 & i16 are printed correctly.
Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin
Reviewed By: cameron.mcinally
Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74550
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.
Reviewers: t.p.northover, samparker, dmgreen
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D74147
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.
Differential Revision: https://reviews.llvm.org/D73201
Rename Destructive enumerator in preparation for a larger set of patches to
support prefixing destructive oeprations with MOVPRFX.
Differential Revision: https://reviews.llvm.org/D73212
Some housekeeping for the DestructiveInstType enum before a larger set of patches to support prefixing destructive oeprations with MOVPRFX.
Differential Revision: https://reviews.llvm.org/D73141
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:
%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.
This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:
int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]
In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.
We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)
This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.
These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.
Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.
This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71469
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).
Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71215
Summary:
Several SVE intrinsics with immediate arguments (including those
added by D70253 & D70437) do not use the ImmArg property.
This patch adds ImmArg<Op> where required and changes
the appropriate patterns which match the immediates.
Reviewers: efriedma, sdesmalen, andwar, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72612
Add a predicate to MCInstDesc that allows tools to determine whether an
instruction authenticates a pointer. This can be used by diagnostic
tools to hint at pointer authentication failures.
Differential Revision: https://reviews.llvm.org/D70329
rdar://55089604
This adds support for selecting a large chunk of the load/store *roW patterns.
This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO
into GISel. The code is very similar to the XRO code. The main difference is
that in the *roW patterns, we want to try and fold in an extend, and *possibly*
a shift along with it. A good portion of this patch is refactoring the existing
XRO code.
- Add selectAddrModeWRO
- Factor out the code from selectAddrModeShiftedExtendXReg which is used by both
selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL.
This is similar to the function of the same name in AArch64DAGToDAGISel.
- Add support for extends to the factored out code in selectExtendedSHL.
- Teach getExtendTypeForInst how to handle AND masks that are intended to be
used in loads/stores (necessary for this addressing mode.)
- Make getExtendTypeForInst not static because moving it made an annoying diff
and I wanted to have the WRO/XRO functions close to each other while I was
writing the code.
Differential Revision: https://reviews.llvm.org/D72426
Summary:
Adds intrinsics for the following:
* cmphs, cmphi
* cmpge, cmpgt
* cmpeq, cmpne
* cmplt, cmple
* cmplo, cmpls
Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the
scalable flag being dropped.
Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70889
Summary:
In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias
for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and
also will not be accepted by assemblers that haven't been updated yet.
So it would be better not to emit it as the preferred disassembly for
that instruction.
This change lowers the EmitPriority of the new alias so that the more
explicit syntax `[x0, #0]!` is preferred by the disassembler. The new
syntax is still accepted by the assembler.
Reviewers: ab, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70813
RETA always implicitly uses LR, unlike RET which merely has an
alias that defaults it to LR.
Additionally, RETA implicitly uses SP as well, which it uses as
a discriminator to authenticate LR.
This isn't usually noticeable, because RET_ReallyLR is used in most
of the backend. However, the post-RA scheduler, if enabled, will
cause miscompiles if the imp-uses are missing.
While there, fix a typo in the lone affected testcase.
The instruction definition has been retroactively expanded to
allow for an alias for '[xN, 0]!' as '[xN]!'.
That wouldn't make sense on LDR, but does for LDRA.
This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB
and UQSUB from the existing target independent sadd_sat, uadd_sat,
ssub_sat and usub_sat nodes.
It does not attempt to replace the existing int_aarch64_neon_uqadd
intrinsic nodes as they are apparently used for both scalar and vector,
and need to be legal on scalar types for some of the patterns to work.
The int_aarch64_neon_uqadd on scalar would move the two integers into
floating point registers, perform a Neon uqadd and move the value back.
I don't believe this is good idea for uadd_sat to do the same as the
scalar alternative is simpler (an adds with a csinv). For signed it may
be smaller, but I'm not sure about it being better.
So this just adds some extra patterns for the existing vector
instructions, matching on the _sat nodes.
Differential Revision: https://reviews.llvm.org/D69374
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
This teaches GISel to select patterns which fold an extend plus optional shift
into the addressing mode. In particular, adds and subs.
Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td
and create GISel equivalents.
Add some equivalent functions to the ones in AArch64ISelDAGToDAG:
- `selectArithExtendedRegister`
- `narrowExtendRegIfNeeded`
- `getExtendTypeForInst`
`getExtendTypeForInst` includes the checks for loads and stores. This will be
used for WRO addressing modes in loads + stores.
Teach selectCopy to properly handle subregister copies on the same bank in
order to support `narrowExtendRegIfNeeded`. The extended register must be a
GPR32, so we need to support same-bank subregister copies.
Fix a bug in getSubRegForClass which would cause registers on things like
GPR32common to end up getting ssub. Just change the check to look for FPR32
rather than GPR32.
For tests:
- Add select-arith-extended-reg.mir
- Update addsub_ext.ll to include GlobalISel checks
Differential Revision: https://reviews.llvm.org/D66835
llvm-svn: 370410
Instead of using custom C++ in `earlySelect` for loads and stores, just import
the patterns.
Remove `earlySelectLoad`, since we can just import the work it's doing.
Some minor changes to how `ComplexRendererFns` are returned for the XRO
addressing modes. If you add immediates in two steps, sometimes they are not
imported properly and you only end up with one immediate. I'm not sure if this
is intentional.
- Update load-addressing-modes.mir to include the instructions we can now
import.
- Add a similar test, store-addressing-modes.mir to show which store opcodes we
currently import, and show that we can pull in shifts etc.
- Update arm64-fastisel-gep-promote-before-add.ll to use FastISel instead of
GISel. This test failed with GISel because GISel folds the gep into the load.
The test checks that FastISel doesn't fold non-pointer-width adds into loads.
GISel on the other hand, produces a G_CONSTANT of -128 for the add, and then
a G_GEP, which must be pointer-width.
Note that we don't get STRBRoX right now. It seems like the importer can't
handle `FPR8Op:{ *:[Untyped] }:$Rt` source operands. So, those are not currently
supported.
Differential Revision: https://reviews.llvm.org/D66679
llvm-svn: 369806
Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM
SDNodeXForms in AArch64InstrFormats.td.
- Add select-logical-imm.mir, which contains tests for each imported pattern.
- Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now
select instructions of this form.
Differential Revision: https://reviews.llvm.org/D66162
llvm-svn: 369465
This adds GlobalISel equivalents for the following from AArch64InstrFormats:
- arith_shifted_reg32
- arith_shifted_reg64
And partial support for
- logical_shifted_reg32
- logical_shifted_reg32
The only thing missing for the logical cases is support for rotates. Other than
the missing support, the transformation is identical for the arithmetic shifted
register and the logical shifted register.
Lots of tests here:
- Add select-arith-shifted-reg.mir to show that we correctly select add and
sub instructions which use this pattern.
- Add select-logical-shifted-reg.mir to cover patterns which are not shared
between the arithmetic and logical cases.
- Update addsub-shifted.ll to show that we correctly fold shifts into
adds/subs.
- Update eon.ll to show that we can select the eon instruction by folding xors.
Differential Revision: https://reviews.llvm.org/D66163
llvm-svn: 369460
Add an equivalent ComplexRendererFns function for SelectNegArithImmed. This
allows us to select immediate adds of -1 by turning them into subtracts.
Update select-binop.mir to show that the pattern works.
Differential Revision: https://reviews.llvm.org/D65460
llvm-svn: 367700
There doesn't seem to be a practical reason for these instructions to have
different restrictions on the types of relocations that they may be used
with, notwithstanding the language in the ELF AArch64 spec that implies that
specific relocations are meant to be used with specific instructions.
For example, we currently forbid the first instruction in the following
sequence, despite it currently being used by clang to generate a global
reference under -mcmodel=large:
movz x0, #:abs_g0_nc:foo
movk x0, #:abs_g1_nc:foo
movk x0, #:abs_g2_nc:foo
movk x0, #:abs_g3:foo
Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations
that they currently accept individually.
Differential Revision: https://reviews.llvm.org/D64466
llvm-svn: 366461
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486
llvm-svn: 358963
The latest MTE specification adds register Xt to the STG instruction family:
STG [Xn, #offset] -> STG Xt, [Xn, #offset]
The tag written to memory is taken from Xt rather than Xn.
Also, the LDG instruction also was changed to read return address from Xt:
LDG Xt, [Xn, #offset].
This patch includes those changes and tests.
Specification is at: https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60188
llvm-svn: 357583
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.
The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60064
llvm-svn: 357395
Not sure this is the best fix, but it saves an instruction for certain
constructs involving variable shifts.
Differential Revision: https://reviews.llvm.org/D55572
llvm-svn: 351768
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
--
It was reverted in rL348249 due a build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386
The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.
llvm-svn: 348493
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
llvm-svn: 348121