Update lambda function
static auto InitializeRegisterBankOnce = [this](const auto &TRI) {
with
static auto InitializeRegisterBankOnce = [&]() {
Capture reference instead of passing argument, as there are buildbot
compiling errors related when passing argument.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.
Differential Revision: https://reviews.llvm.org/D73947
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.
Differential Revision: https://reviews.llvm.org/D74037
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.
Differential Revision: https://reviews.llvm.org/D73616
Summary:
After following Simon's suggestion about additional testing posted at
https://reviews.llvm.org/D73906, I found several more places that
need to be updated.
Reviewers: simon_tatham, dmgreen, ostannard, eli.friedman
Reviewed By: simon_tatham
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73963
Summary:
This patch changes the underlying type of the ARM::ArchExtKind
enumeration to uint64_t and adjusts the related code.
The goal of the patch is to prepare the code base for a new
architecture extension.
Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen
Reviewed By: dmgreen
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73906
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.
Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
Followup to D73135. If the target doesn't have hard float (default
for ARM), then we assert when trying to soften the result of vector
reduction intrinsics. This patch marks these for expansion as well.
(A bit odd to use vectors on a target without hard float ... but
that's where you end up if you expose target-independent vector types.)
Differential Revision: https://reviews.llvm.org/D73854
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.
The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.
Differential Revision: https://reviews.llvm.org/D73194
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.
So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?
The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.
To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.
In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.
For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73786
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.
At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.
Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.
This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.
Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73357
Summary:
The unpredicated case of this is trivial: the clang codegen just makes
a vector splat of the input, and LLVM isel is already prepared to
handle that. For the predicated version, I've generated a `select`
between the same vector splat and the `inactive` input parameter, and
added new Tablegen isel rules to match that pattern into a predicated
`MVE_VDUP` instruction.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73356
This reverts commit e34801c8e6 and the followup due to multiple
problems.
I've tried to keep the tests and RDA parts where possible, as those
still seem useful.
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
fadd/fmul reductions without reassoc are lowered to
VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization
support. Until that is in place, expand these intrinsics on
ARM and AArch64. Other targets always expand the vector reduction
intrinsics.
Additionally expand fmax/fmin reductions without nonan flag on
AArch64, as the backend asserts that the flag is present when
lowering VECREDUCE_FMIN/FMAX.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44600.
Differential Revision: https://reviews.llvm.org/D73135
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.
This is continuing work of D73587.
Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos
Reviewed By: arsenm
Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73605
This adds some missing MVE opcodes to evaluateBranch, which results in
llvm-objdump being able to print the PC relative branch target as an
annotation.
Differential Revision: https://reviews.llvm.org/D73553
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt
And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove
Both of these have some wrappers too to make them more convienent to
use.
Differential Revision: https://reviews.llvm.org/D73460
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
The MVE_VLDRWU32_qi_pre gather loads, like the other _pre/_post mve
loads returns the writeback as result 0, the value as result 1. The llvm
ir intrinsic seems to have this the other way around though, and so when
lowering from one to the other we need to switch the first two outputs.
I've also fixed up the types of _pre/_post on normal MVE loads. There we
were already getting the values the right way around, just not for the
types. I don't believe this was causing anything to go wrong, but it was
very confusing to read in the debug output.
Differential Revision: https://reviews.llvm.org/D73370
We had support for runtime trip count values, but not constants, and this adds
supports for that.
And added a minor optimisation while I was add it: don't invoke Cleanup when
there's nothing to clean up.
Differential Revision: https://reviews.llvm.org/D73198
When expanding the LoopStart, we try to remove the iteration count
calculation. However, if part of the calculation was also used to
calculate the number of elements we could end up deleting
instructions that were required to feed DLSTP/WLSTP.
Differential Revision: https://reviews.llvm.org/D73275
The codegen for splitting a llvm.vector.reduction intrinsic into parts
will be better than the codegen for the generic reductions. This will
only directly effect when vectorization factors are specified by the
user.
Also added tests to make sure the codegen for larger reductions is OK.
Differential Revision: https://reviews.llvm.org/D72257
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.
-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:
```
.type foo,@function
.Ltmp0: # @foo
nop
foo:
.Lfunc_begin0:
# optional `bti c` (AArch64 Branch Target Identification) or
# `endbr64` (Intel Indirect Branch Tracking)
nop
.section __patchable_function_entries,"awo",@progbits,get,unique,0
.p2align 3
.quad .Ltmp0
```
-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):
```
(a) (b)
func: func:
.Ltmp0: bti c
bti c .Ltmp0:
nop nop
```
(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .
Differential Revision: https://reviews.llvm.org/D73070
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)
The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.
The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.
This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.
Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72934
Summary:
In NEON, the immediate forms of VBIC and VORR are each represented as
a single MC instruction, which takes its immediate operand already
encoded in a NEON-friendly format: 8 data bits, plus some control bits
indicating how to expand them into a full vector.
In MVE, we represented immediate VBIC and VORR as four separate MC
instructions each, for an 8-bit immediate shifted left by 0, 8, 16 or
24 bits. For each one, the value of the immediate operand is in the
'natural' form, i.e. the numerical value that would actually be BICed
or ORRed into each vector lane (and also the same value shown in
assembly). For example, MVE_VBICIZ16v4i32 takes an operand such as
0xab0000, which NEON would represent as 0xab | (control bits << 8).
The MVE approach is superficially nice (it makes assembly input and
output easy, and it's also nice if you're manually constructing
immediate VBICs). But it turns out that it's better for isel if we
make the NEON and MVE instructions work the same, because the
ARMISD::VBICIMM and VORRIMM node types already encode their immediate
into the NEON format, so it's easier if we can just use it.
Also, this commit reduces the total amount of code rather than
increasing it, which is surely an indication that it really is simpler
to do it this way!
Reviewers: dmgreen, ostannard, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73205
The hasSideEffect parameter is usually automatically inferred from
instruction patterns. For some of our MVE instructions, we do not have
patterns though, such as for the pre/post inc loads and stores. This
instead specifies the flag manually on the base MVE_VLDRSTR_base
tablegen class, making sure we get this correct.
This can help with scheduling multiple loads more optimally. Here I've
added a unittest as a more direct form of testing.
Differential Revision: https://reviews.llvm.org/D73117
This is a very basic MVE gather/scatter cost model, based roughly on the
code that we will currently produce. It does not handle truncating
scatters or extending gathers correctly yet, as it is difficult to tell
that they are going to be correctly extended/truncated from the limited
information in the cost function.
This can be improved as we extend support for these in the future.
Based on code originally written by David Sherwood.
Differential Revision: https://reviews.llvm.org/D73021