Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.
This is based on patches from Mark Murray and Victor Campos.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D90765
This adds support for -mcpu=cortex-r82. Some more information about this
core can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-r/cortex-r82
One note about the system register: that is a bit of a refactoring because of
small differences between v8.4-A AArch64 and v8-R AArch64.
This is based on patches from Mark Murray and Mikhail Maltsev.
Differential Revision: https://reviews.llvm.org/D88660
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1
processors for AArch64 and ARM.
In detail:
- Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang
- Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm
details of the CPU can be found here:
https://www.arm.com/products/cortex-xhttps://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev
Reviewers: t.p.northover, dmgreen
Reviewed By: dmgreen
Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D83206
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.
In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm
details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77
and a similar submission to GCC can be found here:
e0664b7a63
The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev
Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer
Reviewed By: dmgreen
Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D82887
This patch puts the _ZERO pseudos and corresponding patterns
under the predicate 'UseExperimentalZeroingPseudos', so that they
can be enabled/disabled through compile flags.
This is done because the zeroing pseudos use MOVPRFX to do merging of
the inactive lanes, but it depends on the uarch whether this operation
is actually merged with the destructive operation. If not, it may be
more profitable to use a SELECT and to give the compiler the freedom to
schedule these instructions as normal, rather than keeping them bundled
together. Additionally, this feature is not yet fully implemented and
there are still known bugs (see D80410) that need to be resolved before
the 'experimental' can be dropped from the name.
Reviewers: paulwalker-arm, cameron.mcinally, efriedma
Reviewed By: paulwalker-arm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82780
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.
Reviewers: rengolin, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80384
To make sure that no barrier gets placed on the architectural execution
path, each
BLR x<N>
instruction gets transformed to a
BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
BR x<N>
<speculation barrier>
Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.
The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.
As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.
As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.
Differential Revision: https://reviews.llvm.org/D81402
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.
Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.
On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.
These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.
Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.
The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.
Differential Revision: https://reviews.llvm.org/D81400
This is the first checkin to support Marvell ThunderX3T110.
Initial definition of the micro-ops of the instructions in ThunderX3T110
is included.
Differential Revision: https://reviews.llvm.org/D78129
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)
No IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin
Reviewed By: kmclaughlin
Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77871
Summary:
Similar to the CallLowering class used for lowering LLVM IR calls to MIR calls,
we introduce a separate class for lowering LLVM IR inline asm to MIR INLINEASM.
There is no functional change yet, all existing tests should pass.
Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette
Reviewed By: aemerson
Subscribers: gargaroff, wdng, mgorny, rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78316
Summary:
This patch upstreams support for the ARMv8.6A Enhanced Counter Virtualization
(ECV) extension, which adds 6 new system registers.
See ARMv8.6-ECV in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, pcc, ab, chill
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77094
Summary:
This patch upstreams support for the ARMv8.6A Fine Grain Traps (FGT)
extension, which adds 5 new system registers.
See ARMv8.6-FGT in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, momchil.velikov
Reviewed By: SjoerdMeijer
Subscribers: LukeGeeson, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76991
Summary:
This patch upstreams v8.6A activity monitors virtualization
assembler support, which consists of 32 new system
registers (two groups, each with 16 numbered registers).
See ARMv8.6-AMU in the Arm Architecture Reference Manual Armv8 for more
information.
Reviewers: t.p.northover, rengolin, SjoerdMeijer, ab, john.brawn, ostannard
Reviewed By: ostannard
Subscribers: LukeGeeson, dnsampaio, ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76998
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
Apple's CPUs are called A7-A13 in official communication, occasionally with
weird suffixes which we probably don't need to care about. This adds each one
and describes its features. It also switches the default CPU to the canonical
name for Cyclone, but leaves legacy support in so that existing bitcode still
compiles.
Summary:
The PMMIR_EL1 register is present in Armv8.4 with PMU extension.
This patch adds support for it.
Reviewers: t.p.northover, dnsampaio
Reviewed By: dnsampaio
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68940
llvm-svn: 375228
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 374205
This caused severe compile-time regressions, see PR43455.
> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address. Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table. Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 373060
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address. Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.
This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table. Thus, it is now renamed to `-max-jump-table-targets`.
Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.
Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 372893
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.
AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.
Differential Revision: https://reviews.llvm.org/D65984
llvm-svn: 368652
This feature instructs the backend to allow locally defined global variable
addresses to contain a pointer tag in bits 56-63 that will be ignored by
the hardware (i.e. TBI), but may be used by an instrumentation pass such
as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD
sequence that sets bits 48-63 to the corresponding bits of the global, with
the linker bounds check disabled on the ADRP instruction to prevent the tag
from causing a link failure.
This implementation of the feature omits the MOVK when loading from or storing
to a global, which is sufficient for TBI. If the same approach is extended
to MTE, assuming that 0 is not configured as a catch-all tag, we will most
likely also need the MOVK in this case in order to avoid a tag mismatch.
Differential Revision: https://reviews.llvm.org/D65364
llvm-svn: 367475
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
Embedded Trace Extension and Trace Buffer Extension are optional
future architecture extensions.
(cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)
Their system registers are documented here:
https://developer.arm.com/docs/ddi0601/a
ETE shares register names with ETM. One exception is the ETE
TRCEXTINSELR0 register, which has the same encoding as the ETM
TRCEXTINSELR register (but different semantics). This patch treats
them as aliases: the assembler will accept both names, emitting
identical encoding, and the disassembler will keep disassembling
to TRCEXRINSELR.
Differential Revision: https://reviews.llvm.org/D63707
llvm-svn: 367093
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Adding a default "no information" subtarget implementation
Only a handful of targets use these interfaces currently: AArch64,
Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget
implementation, so its custom TTI implementation is migrated to use
the new facilities in BasicTTIImpl to invoke its custom subtarget
implementation. The custom TTI implementations continue to exist for
the other targets with this change. They are not moved over to
subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to
the system model defined by the target. With this change, the default
subtarget implementation essentially returns "no information" for
these interfaces. None of the existing users of TTI will hit that
implementation because they define their own custom TTI
implementations and won't use the BasicTTIImpl implementations.
Once system models are in place for the targets that use these
interfaces, their custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 365676
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.
Differential Revision: https://reviews.llvm.org/D62180
llvm-svn: 361354
Summary:
This patch adds the following features defined by Arm SVE2 architecture
extension:
sve2, sve2-aes, sve2-sm4, sve2-sha3, bitperm
For existing CPUs these features are declared as unsupported to prevent
scheduler errors.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewers: SjoerdMeijer, sdesmalen, ostannard, rovka
Reviewed By: SjoerdMeijer, rovka
Subscribers: rovka, javed.absar, tschuett, kristof.beyls, kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61513
llvm-svn: 360573
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.
Patch by Philip Derrin!
Differential revision: https://reviews.llvm.org/D54685
llvm-svn: 356657
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This feature enables the fusion of some arithmetic and logic instructions
together.
Differential revision: https://reviews.llvm.org/D56572
llvm-svn: 351139
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang
Differential Revision: https://reviews.llvm.org/D56484
llvm-svn: 350702
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55921
llvm-svn: 350126
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
--
It was reverted in rL348249 due a build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386
The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.
llvm-svn: 348493
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64. This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function. This was detected
while trying to self-host clang on Windows ARM64.
llvm-svn: 348337
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.
Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html
Reviewers: olista01, samparker, aemerson
Reviewed By: samparker
Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54629
llvm-svn: 348137